Are search engine result figures accurate?
- 20 February 2012
- From the section Magazine
Can you measure your popularity - or that of anyone or anything - by the number of results that an internet search generates?
Enter the name Tim Harford into Google and you get 835,000 results.
Or 325,000, or 285,000… I got these widely differing results on computers within metres of each other in the same office at the same time.
For a few heart-stopping moments, we thought we had broken Google - with the name of the presenter of BBC radio's More or Less programme.
But no. The first lesson of search engine accuracy is that the number of results you'll get depends on the computer you're using and which copy of Google you're using. There are several copies of Google in the world, and your search query will be dispatched to whichever version is the least busy.
The results will also be personalised, according to what you've searched for previously and where you're based.
It's also worth mentioning that I searched for myself and got 68 million web page results.
I've written no books, starred in no films, and you've probably never heard of me. And yet I'm massive on the web.
Or, more likely, my secret to search engine success is to have a name made up of two popular first names.
So, search results can be misleading and the number of results won't mean much if you fail to use inverted commas around a name or a phrase, or if you use ambiguous terms. Ask a silly question, get a silly answer.
But even when you've got your enquiry honed to a fine degree, and consider yourself a champion in google-fu, don't believe the numbers.
You might think that if a search engine tells you it's returned, say, 68 million results, there are 68 million pages you could, in principle, view.
Not necessarily so.
A study comparing results from three search engines for queries that generated fewer than 1,000 results, found that even the best-performing search engine was pretty rough and ready in its calculations.
None of the search engines was providing exact document counting, only estimates.
The researchers found that the figures were pretty accurate when they searched for one word. But each time they added a word to the search, the numbers got less and less accurate.
"Eighty per cent of the time, the estimates were reliable - they had only 10% errors in their estimation," says Ahmet Uyar, the head of the computer engineering department at Mersin University, in Turkey.
"But when we tried two-word queries, then the accuracy was reduced almost by half."
And when the researchers submitted five-word queries, the percentage of accurate result estimates halved again - the best-performing search engine estimated the correct number of matching documents to within 10% of the actual number less than 20% of the time.
I tried it myself, searching for the first part of a limerick: "There was a young man from Darjeeling who got on a bus."
First of all, the search engine showed 15 results. It invited me to click to see "omitted results". I did, and it said it had found 29 pages in total but, in actual fact, it could only show 21.
Experts say this lack of precision is tolerated in the name of speed. The super computers behind the scenes have to work very quickly, mixing and matching lots of documents, throwing out spam and pages where the words surrounding your search term are the same - all this in less than a second or two.
And think how much worldwide web there is to search. Search engines probably cover only a fraction of it.
In 1999, researchers tried to assess exactly how much of the web was to be found in the indexes of major search engines - about 16%, they concluded, in a paper published in Nature.
And although search engines have developed a great deal since then, the internet has also been growing very rapidly, and it's likely that search engines still only cover a relatively small section of the information that is out there. And only ever will.
Search engines could crawl the internet forever and still not find all the web pages that exist, according to Professor Mike Thelwall, a cybermetrics researcher at Wolverhampton University in the UK.
"The reason is there's no one big list of web pages in the world, or even a single list of all the websites in the world," he says.
"Essentially all search engines start with a few big websites and then they try to find new websites, mainly by following the links on existing websites. So if you've got a fairly new website that no-one links to yet, the chances are Google won't know about you."
And many web pages don't exist until you request them, he adds.
"Google search results pages are an example, since the number of different queries that could be submitted to Google to create a results page is practically infinite.
"Also, many websites are created by content management systems now, with different variants for different users created when the user visits.
"These 'dynamically-created pages' also include many web 2.0 pages that are editable by users, such as social networking sites."