Magazine

What do Google, Ask and Bing search results mean?

Sheep
Image caption Exactly how many sheep in this picture?

It's easy to think search engine queries could provide a gold mine of data, but it's not easy to know how to exploit, says Michael Blastland in his regular column.

How many sheep? Come on, it's not hard. There are, after all, only two white fluffy objects in the picture. So how many sheep?

Did I mention that the ewe is pregnant? Now how many sheep? So is it one? Two? Three? Or maybe one and a half? Or maybe one and two halves?

Maybe some readers are saying: "Well, obviously, there is one sheep and one lamb. Duh!"

But that's not what I asked. I asked how many sheep? In which case your answer would be one, right? Anyone care to disagree with one?

That we could go on arguing about definitions in order to count up to three - or fewer - tells us that counting just ain't like it used to be on Sesame Street. As soon as you give a number, you insist on a definition, a definition that others might not share.

At which point they accuse you of being slippery with statistics. "Look, buddy, I'm just trying to count sheep," you reply. "Yeah, and what's your agenda?" says your critic. "You some kind of sheep-denier?"

Image caption Why do we search for who we search for?

The sheep problem has a parallel in the Christmas glut of internet search highlights.

So in the last week or so, we've had the most frequently asked questions of the year from Ask Jeeves (or Ask if you're outside the UK), that apparently included "Is the X Factor fixed?" and "Why are England so bad at football?". They are at numbers one and two.

AOL had the general election as the most searched-for news item, and reported that music fans searched for Lady Gaga more than any other artist. Bing US listed the most popular overall 2010 searches as Kim Kardashian first, Sandra Bullock second and Tiger Woods third.

On Google, it's possible to check up at any time on the frequency of any search request you care to think of.

But to come back to point about the sheep, what exactly are we counting when we count internet searches?

Image caption Google thinks certain search terms can be used to map flu spread

With searches for celebrities, are we measuring popularity, or just attention? Some people might appear high up a list because they are hated. In other words, how many of those interested in Lady Gaga think she's a sheep and how many a goat? Music sales tell us her music is popular, but do internet searches?

So public opinion is not the same as popularity. One is what people think, the other is loosely what people think about. Though we might wonder how much attention is spontaneous. How much searching is prompted by what's talked about on media web sites, TV and so on?

Then again, does it matter? Maybe. There are more heavyweight examples. Google thinks it can use the volume of internet searches to tell us how much flu there is before the medical authorities know themselves.

The study of patterns of disease is called epidemiology. Lately, the phrase "info-demiology" has appeared, along with "info-veillance" - using internet search volume to track or even forecast public health problems or economic trends like claims for unemployment benefit.

Politicians might be tempted to think it can tell them what they once used to assess from news coverage, what's known as "The Most Important Problem" to voters.

Some people think you can use internet searches to measure investor interest and so pick the shares that will rise.

But the counting problem is still there. For example, are we counting popularity - or attention, or interest or whatever we call it - only among internet users, who might be younger or richer than the population as a whole? In other words, is there bias? And if so, how much? Enough to matter?

Image caption Politicians want to get inside voters minds through search result analysis

Are people watching a share because they think it will rise or fall? You can make money both ways.

It's also quite hard to know that you've got all the potential search terms covered. If you want to measure public interest in the recession, for example, how many different terms might people use?

There's something epic in the potential of the internet to measure the interest of billions of people almost, by conventional standards, instantly. I'm not knocking any of it, and some of the research links here are fascinating.

The rise of statistics corresponds almost exactly with the rise of big urban populations and large-scale government. Understanding what's going on in a big, bustling society is still a huge problem. So what if you could put the whole lot online? Shades of big brother? Or data heaven?

But the bottom-line problem with the volume of internet searches is that - on its own - there's no qualitative data here. They count something, but we're often unsure what - or why.