1. Organising the world
After students Larry Page and Sergey Brin met in 1995, they had an audacious ambition – to "organise the world's information and make it universally accessible and useful". They’ve been phenomenally successful. The brand they created in a California garage – Google – has become shorthand for searching the web.
It is now a major global organisation with over 70 offices in more than 40 countries. The company uses 0.01% of the world's electricity supply running one of the world's most extensive computer networks, combining well over one million servers. Every second of every day, an average of 40,000 people are typing queries into Google's search engine. It handles over 100 billion searches a month.
So what’s going on behind the scenes when you type in your query? It starts with a spider…
2. INTERACTIVE: Crawling with spiders
When you Google something, you're not actually searching the web. You're searching Google's index of the web. Click on the icon labels to discover how Google finds and indexes all that content and returns it to your computer.
This content uses functionality that is not supported by your current browser. Consider upgrading your browser.
3. The clever bit: Ranking your results
Larry Page, Google's co-founder, described the perfect search engine as something which “understands exactly what you mean and gives you back exactly what you want." To try to achieve that, Google uses a set of rules to work out what to show you every time you search for something.
There are at least 200 different variables, which Google weighs up every time we search. The exact details of Google's ranking algorithm are secret, but it includes several core components.
Google ranks web pages by analysing which other web pages link to them. It assigns a score to every page based on the number of links it has acquired, viewing a link like a vote. But not all votes are equally weighted. A link from a relevant page that itself has a high PageRank score is worth more than a link from a page with a low PageRank score.
Content relevance and quality
Google mathematically models the words on a web page. It judges the relevance of a web page to your search query by counting the number of occurrences of the query words (and synonyms or variations) on the page. Greater weight is given to keywords in important parts of the text, such as the page title. Google also considers the rarity of those keywords – if the search term is not widely used on the web then pages containing those words are even more likely to be a good match.
Some words – like "the" or "and", known as stop words – occur more often than others, without being particularly useful in helping to discern a page's relevance to a given query. As a result, Google gives such words less importance. Google also looks for quality signals like the length of the content and if it has been duplicated from elsewhere.
Google takes into account the context of your search. If you search for "restaurants" in the UK then you're most likely to get results from UK pages because these are more likely to be relevant to you than restaurants in Brazil or the United States. Google also takes into account the device you're using. Smartphone users might be more likely to want a restaurant in their immediate vicinity than desktop users.
Google can also assess your own search history and internet behaviour, delivering personalised search results.
Fighting web spam
Web traffic is big business. Ever since Google started, people have tried to fool its system to improve their rankings. Webmasters have tried everything from stuffing their pages with popular search queries to buying links to their sites to increase their PageRank. As a result, search engines try to filter out this spam by honing their algorithms. It is a continuing battle between the two competing interests.
4. Understanding the web
Google and other search engines are always evolving. They are now moving towards a semantic understanding of the web. Instead of just matching keywords, they're trying to understand what those words mean.
At its simplest, this means webmasters 'tag' content on their sites with bits of information known as 'structured data' or 'semantic markup'. Webmasters tag particular bits of text, such as recipes, to make it easier for search engines to understand what their site actually contains. It can also be used to mark up content which search engines haven't historically been very good at understanding, like images and video.
Search engines don't simply rely on what site owners tell them about their pages. Google's Knowledge Graph is an enormous knowledge base that aggregates semantic data from many sources. Algorithms then analyse its information and connections to learn about the terms we search for, anticipating what answers you actually want. So if you search for "the Queen" Google recognises that you're probably looking for information about a particular person, Elizabeth II, and is able to suggest sites containing relevant information.
Using machine learning, search engines are also getting closer than ever to understanding what you actually mean when you search for something. If you type "How old was Queen Victoria when she died?", Google returns "81" at the top of the page. Without Knowledge Graph, it would just return pages with variations on this question and not the actual answer.
Google Brain is a deep learning research project that uses machine learning to try to understand everything from natural language to video, audio and image content. The recently launched Google PlaNet can work out where a picture was taken just by looking at it. Ultimately, Google's long-term objective is to develop a capacity for artificial intelligence.
5. Other search engines are available
As of January 2016, 86% of all searches in the UK were performed using Google. What proportion of searches do you think were made with these competitors?