Week 3 – Search Engines

This week we covered search engines, the obvious example being Google.

Interesting side note – I used to sell search engines to companies around New Zealand, specifically the french EXALEAD CloudView from Dassault.

Google Search these days is founded around the idea of the Knowledge Graph. A good video that shows how people use the knowledge graph is at YouTube,. (2015) and is shown below:

The Knowledge Graph is explained at YouTube,. (2015) and is shown below:

The Knowledge Graph in effect means that Google understands more of the meaning behind your search terms, and their relationship to other things. This in turn allows Google to synthesise answers to your queries quicker, and sometimes by not even leaving the site at all.

The first unconventional way to use Google is to define attributes relating to the main thing you’re search for. https://www.google.com/search?as_st=y&tbm=isch&as_q=cats&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=.edu&safe=images&tbs=sur:f&gws_rd=ssl is a search for “cats” that is restricted to the .edu domain, and has been labeled for noncommercial reuse. Rather than searching for “cats edu creative commons”, Google has more accurate filters than a straight keyword search.

The second unconventional search is queries about attributes of a search topic. https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=nz475 searches for the flight NZ475:

nz475

 

 

 

 

Google has interpreted that NZ475 is a flight, and therefore has shown me flight related information in a card. After that are search results which are pages other than Google.

The third unconventional search is Google’s ability to crawl sensitive information. https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=password+filetype:xls+site:.edu is a link restricted to Excel files “xls” that contain the word “password” and are on an “.edu” domain. Not all of these files are sensitive, however, this goes to show that “security by obscurity” is not an acceptable practice, now that it is easy to use Google to surface sensitive information.

The foundation of the Google Search Engine is Pagerank, a page ranking algorithm. This is described in the paper
The PageRank Citation Ranking: Bringing Order to the Web (Page, Brin, Motwani, & Winograd, 1999). To summarise, links to pages are treated as endorsements of that page. Links themselves are weighted depending on if that webpage is trusted. Trust is defined by the number of links to a website. That way a link from the BBC carries more weighting than a link from this blog.

Finally, we briefly touched on spam. Google,. (2015) defines spam as:

irrelevant or unsolicited messages sent over the Internet, typically to large numbers of users, for the purposes of advertising, phishing, spreading malware, etc.

My definition of spam is:

Something tricking you into giving it undesired attention.

Whether that be email spam trying to get you to buy shares, Facebook spam trying to get you to click on pointless videos, link spam trying to escalate the importance of fake websites, or ads on websites trying get you to visit their clickbait articles, spam is trying to steal your attention in a dishonest way.

References:

YouTube,. (2015). Explore lists and collections with Google search. Retrieved 29 July 2015, from https://www.youtube.com/watch?v=mg91_trV4hY

YouTube,. (2015). Introducing the Knowledge Graph. Retrieved 29 July 2015, from https://www.youtube.com/watch?v=mmQl6VGvX-c

Google.com,. (2015). cats site:.edu – Google Search. Retrieved 29 July 2015, from https://www.google.com/search?as_st=y&tbm=isch&as_q=cats&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=.edu&safe=images&tbs=sur:f&gws_rd=ssl

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=nz475

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=password+filetype:xls+site:.edu

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Retrieved 29 July 2015, from http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=spam%20definition

2 thoughts on “Week 3 – Search Engines

  1. I like Graph.. it shows the true non-monitory (and monitory) value of meta data can have for a company and its customers..
    I think I read somewhere that a while back, gmail use to put a flag/indicator in the corner of your mail with flight info or something when you had booked a flight and used gmail email address.. so looks like they have been doing cool stuff for a while.
    Its a pity Wolfram alpha (www.wolframalpha.com/) cant be added into google search, all priority algorithms i think.

  2. The Dassault engine is not available to everyone, as far as I can see, but only available inside an enterprise that has purchased it.

    An interesting way to organize a “knowledge graph” or an “interest graph” is privded by French startup Pearltrees, see, http://www.pearltrees.com/.

Comments are closed.