Mining the web for 'big data'

Computer keyboard and mouse Scientists are looking at new ways to perfect search engines on the web

Mining so-called "big data" - huge masses of information on the web - is a challenge akin to looking for a needle in a gigantic haystack.

But some of computing science's brightest brains are on the case.

The Conference on Information and Knowledge Management (CIKM) brought many of them together in Glasgow this week.

They were looking at how to tackle some of the internet's big challenges.

They include perfecting search engines in order to help firms find out what customers really think of what they are selling.

Until now, market research has often involved asking people directly, in expensive focus-groups, or by using annoying tactics such as cold-calling and street surveys.

But their accuracy is open to question, with some critics doubting whether people are really prepared to say what they think.

Mining the web

Developments in computer technology point to another intriguing way to glean this feedback - mining the web.

Every day, millions of comments are fired across the internet - on Twitter and on other forms of social media. People say exactly what they think. So can these comments be sifted through somehow, to glean a useful insight?

Conference participant Gene Golovchinsky, from FX Palo Alto Research Lab, said: "It's a very powerful means to communicate with your customers, to get your message out and also to respond and to understand what their perceptions are.

Start Quote

We see how this evolution solves hard problems which were unsolvable some time ago”

End Quote Ilya Segalovitch Yandex

"And you see this evolving in an organic way, where people are tweeting about their positive and their negative experiences in a way which is much more natural than sitting in a focus group where the answers are sort of expected."

Beyond market research, mining for big data could bring other spin-offs for business - like the fraught challenge of preparing bids and contract tenders.

'Huge value'

David Hawking is the chief scientist for Funnelback, an industry leader in search technology which is looking at this very question.

He explained: "Organisations have to locate expertise: 'Who works on this? Who knows about this stuff?'

"The failure to find a single document could cost millions of pounds, so there's the opportunity to add huge value if the information tools work better."

The exponential rise in computer processing power certainly helps this process.

Ilya Segalovitch, the founder of Russia's biggest search engine, Yandex, believes the potential benefits of improving search engine technology are within our grasp.

He said: "We see how this evolution solves hard problems which were unsolvable some time ago.

"Fifteen years ago, the web search was not solved, right? Then machine translation was not solved, and I think now it's pretty much close to working. And voice recognition is getting better - you can actually use it daily."

Revolutions are usually built on ideas. The IT revolution is unlikely to be very different.

So the ideas mulled over at this conference may well have the power to re-shape the business environment, and a lot sooner than most of us realise.

You can hear more about web mining by listening to BBC Radio Scotland's Business Scotland programme at 10:05 on Sunday, and later by free download.

More on This Story

The BBC is not responsible for the content of external Internet sites

More Scotland business stories

RSS

Features & Analysis

Elsewhere on the BBC

  • Audio cassette Be kind, rewind

    The cassette is making a comeback, but can business capitalise on a trend without falling victim to a fad?

Programmes

  • Scene from the film TitanicHARDtalk Watch

    The film director 'appalled' at how the movies Titanic and Ironman have been re-cut for China

BBC © 2013 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.