Why does image of Donald Trump shows up on Googling ‘idiot’? How Google & other search engines work?
When the US congress men and women grilled Sundar Pichai, the CEO of Google, a very interesting question came up! If you Google the word ‘idiot’, under images, a picture of Donald Trump comes up, I just did that. How would that happen? How would such search work, so that that would occur? Even though Mr.Pichai gave an explanation, here’s a holistic view about why this happened! Search engines like Google don’t go into the world wide web and run the algorithm for a search result in real time. Because there are billions of web pages all around the globe and computing the result for your search on real time would take forever. So search engines scan the web-pages in advance with a process called crawling! In this process, they collect information and this generates a ranking for web-pages for more than a billion keywords in a database called ‘search index’. But, here’s the thing. Humans use a lot of ‘homonyms’, which are words which spell the same and are pronounced the same but have different meanings. For example, when a web-page has the term ‘bank’, the search engine should know whether the web-page is referring to a river bank or a financial institution like the Bank Of America. So, while crawling the web-page, the crawler or spiders use n-gram technique, where it creates indices to words adjacent to any word, also called Lexicons or Lexical words. This way, if some web-page uses the word ‘crocodile’ along with the word ‘bank’, the search engine would know that the web-page is referring to a river bank and not the Bank of America. Thus the search index is a huge network of keywords, the relation it has with other keywords and the relevant web-pages. Google also uses the page ranking algorithm, which takes into account more than 200 other parameters like the location in which the web-page was written, the popularity of the web-page, etc. to rank the search result and then recommend one to you when you search. This search indexing has a reverse effect. It can help in mimicking human intelligence to recognize the context and relevance of words, but at the same time, it might create relationships between words. You see where I’m getting to? When a lot of web-pages write the keyword ‘idiot’ along with the keyword ‘Donald Trump’, Google algorithm automatically assumes that these words are related. Since billions of people use the service, it’s more like collective human knowledge and Google doesn’t or cannot manually manipulate the search index, because it’s so huge. So, when you search ‘idiot’, the search index suggests Google that the word ‘idiot’ is related to the word ‘Donald Trump’ and a picture related to the word ‘Donald Trump’ pops up as a result. In short, why does this happen? Because, the majority of you, the average Joe, think that Donald Trump is actually an idiot!