Big data: Should it come with a big health warning?

Sneezing man Estimating who has the flu has shown up some problems with big data projects

Pick a number between 1 and 100.

Technology of Business

Got one? Good. Congratulations. Chances are that by plucking that number out of the ether you have done a better job than Google of predicting the percentage increase in the number of flu-like illnesses that will strike Americans over the next few weeks.

That's right. You, armed only with your puny brain, can outdo a multi-billion dollar corporation that employs some of the smartest people in the world.

This example might seem trivial, but many think it matters because of the status of Google Flu Trends (GFT), once seen as the shining example of the power of so-called big data.

The data it uses to make predictions about how many will be sneezing and wheezing a week or so ahead is drawn from search terms, blog entries and messages shared via social media - so-called unstructured data.

This is very different to the structured and slow stream of information gathered from forms filled in at surgeries and hospitals that, before the rise of big data, were how predictions were made.

And the problem is, GFT turned out not to be terribly accurate.

Start Quote

Often times the only reason why people believe their data is clean is because they have never looked at it”

End Quote Kaiser Fung Author and statistician

In a run of 108 weeks, GFT wrongly predicted the number of flu cases 100 times, revealed a recent study.

Sometimes its estimate was double the number of actual flu cases recorded by US doctors. Hence the reason anyone can do better by plucking a number out of thin air.

Yet this unstructured data humans put online is exactly the type of stuff that companies want to analyse when they kick off their own big data projects.

Many corporations are keen to use those garbled knots of human sentiment to monitor how their brands are faring online, and to tweak their operations accordingly when they spot commercial opportunities or potential PR disasters.

Before now, those giant data sets had been hard to unpick. GFT seemed to suggest that with the right tools it could unlock all kinds of useful predictions.

Not only that, but those predictions could be uncovered quickly and cheaply.

Dirty data

Why did GFT go so wrong and what implications does this have for other big data projects?

"There's no such thing as clean and stable data," said statistician Kaiser Fung who has written extensively about the pitfalls that can dog big data projects.

Close-up of hard drive There's no such thing as perfectly clean data, argues statistician Kaiser Fung

What he means by "clean and stable" is that it is a mistake to think that the data Google gathered for GFT today is the same as it gathered last week, last month or last year.

Google regularly tweaks the algorithms it uses to index online life and, as a result, may be sampling very different things month to month, adding a degree of instability - spots of dirt as it were - to that dataset.

The same is true of any big data set gathered by anyone, he said.

All will be tainted in some way as they will miss out something simply because of the quirks of the underlying code used to parse and index web pages, social media messages and blog posts.

Start Quote

There's a customer backlash about to happen - it's against the big part of big data”

End Quote Patrick James Ernst & Young

That will be particularly true if companies buy in their data from different sources and then treat it as all one corpus.

"I have never come across a complete data set," he said. "Often times the only reason why people believe their data is clean is because they have never looked at it."

Companies in possession of a huge corpus of data can assume that all the information they need is in it. Sadly, he said, this "N=all" assumption is wrong.

"It is much better to assume that the data has holes and flaws than it is to assume it is complete."

Any company starting a big data project would do better to look at the data they have gathered and clean it up before any analysis starts.

There are other good reasons for scrutinising that mass of information about customers, says Patrick James, a partner in consultancy Ernst and Young's consumer practice.

"There's a customer backlash about to happen," he says. "It's against the big part of big data."

More and more people are getting less and less happy about simply surrendering information and getting nothing in return, he maintains.

Increasingly, consumers and customers will attempt to hold back their data, limit what they share online or simply give the wrong answers when they sign up for a service or are quizzed about their life and habits, he believes.

Line of people People are getting more reluctant to share data about who they are and what they are doing

The tens of thousands of people who filled in a form to make Google expunge their data from its index was evidence of that growing desire to disappear, says Mr James.

If this trend grows, it could mean data sets get skewed and become less useful for those big projects.

These early days of big data might prove to be its golden age.

"Data has never been cheaper than it has been today and it's only going to get more expensive," says Mr James.

Fast response

So, if data is not the key to a good project, what is?

"Too many big data projects are started by the IT departments in companies that want to play with new technologies like Hadoop," says Dr Laurie Miles, head of analytics at big data specialist, SAS.

"That's led to scepticism, because in the history of IT projects a lot of them have been failures."

Instead of the technology coming first, anyone embarking on a big data project needs to know why they are doing it before they sign off on any expenditure by the IT folks, he argues.

British rowing team British Rowing has turned to big data to help fine tune coaching of its rowers

"A big data project is not going to deliver any benefit unless you focus on a specific problem."

That focus can stop a project running away with itself and ensure it produces results that impinge on a real business issue, he says.

Spotting fraudulent credit card use requires a very different approach to analysing the performance of elite rowers - SAS is helping with both.

"We analyse credit card data at the point of sale, and you need that quickly," says Dr Miles. "With British Rowing we have a couple of weeks to to give them answers."

Knowing the response can help define the technology needed to underpin that big data project.

"Often you do not need to spin up a massive IT infrastructure to make this work," he says. "That's just as well, as real time results are really expensive."

More on This Story

The BBC is not responsible for the content of external Internet sites

More Business stories

RSS

BBC Business Live

  1.  
    SHAGGY DOG TALE 11:15:
    People patting dog

    Way too much good stuff in this to summarise: The story of the railway collection dogs who from Victorian times to the 1950s went about the railways charming passengers into giving them money. Have to pick this out though: "Some dogs were less than honest... [it was] discovered in the 1860s that Brighton Bob was using some of his money to buy biscuits at a bakery."

     
  2.  
    SPANISH GDP 10:59:

    Spain's economy grow at 0.6% in the second quarter, according to official figures, up from 0.4% in the first three months of the year. Yesterday the government raised its growth forecast for the year, saying it would be close to 1.5% this year and could reach 2% next year. The reading was the fastest quarterly rise since the last three months of 2007.

     
  3.  
    ITV PROFITS 10:45:
    ITV still

    ITV's results proclaim the company is now the biggest unscripted independent production company in the US after buying 80% of Leftfield Entertainment. The studio produces reality programmes such as Pawn Stars, Counting Cars, American Restoration and Real Housewives of New Jersey.

     
  4.  
    TAX CREDIT EXTENSION 10:30: Via Email Kevin Peachey Personal finance reporter, BBC News

    "HMRC says deadline for hundreds of thousands of people to renew their tax credits has been extended from 31 July to 6 August, owing to PCS strike action"

     
  5.  
    BANKER BONUSES 10:17:

    Swift reaction to the new plans from the Bank on bonuses. John Cridland, CBI director general, says, broadly, it's a good idea to align performance, behaviour and pay, but: "As these new rules are amongst the toughest in world, we need to be careful we don't create uncertainty which might make it increasingly hard to attract talent to London."

     
  6.  
    BANKER BONUSES 10:13:

    The PRA and the FCA (Financial Conduct Authority) are consulting on the bonus proposals. Martin Wheatley, head of the FCA said: "How a firm conducts its business and treats its customers must be at the heart of how it operates. This has to start at the top. Today's consultations mark a fundamental change in the regulators' ability to hold individuals to account, which is what the public expects of us. It will also build on the cultural change we are beginning to see in the boardrooms of firms across the country."

     
  7.  
    BANKER BONUSES 10:10:

    More: "The PRA [Prudential Regulation Authority] has also today published final rules on clawback which introduce a seven-year minimum period for clawback from the date of award. These rules will come into force on 1 January 2015."

     
  8.  
    BANKER BONUSES 10:07:

    More from the Bank of England's plans to improve bankers' accountability. Some are proposals and the Bank is consulting on them. They are not all new, firm, rules.

     
  9.  
    BANKER BONUSES 10:05:
    Bank of England

    The Bank of England's plans for making bankers more accountable is now released. It says "Increasing the alignment between risk and reward over the longer term, by requiring firms to defer payment of variable remuneration (e.g. bonuses) for a minimum of five or seven years depending on seniority, with a phased approach to vesting."

     
  10.  
    TURTLE TROUBLE 09:53:
    Teenage Mutant Ninja Turtles

    The new Teenage Mutant Ninja Turtles movie in Australia has caused more excitement than usual. The poster shows the turtles jumping from an exploding skyscraper, standard for an action movie, nor necessarily that offensive. That is until you look at the release date at the bottom of the poster. Paramount Pictures has apologised: "We are deeply sorry to have used that artwork."

     
  11.  
    RUSSIA SANCTIONS 09:35:

    Credit and debit card company Visa is one that isn't bothered by tightening sanctions on Russia. "The new package of US economic sanctions is not influencing the operations of Visa in Russia and does not force Visa to halt or block operations of financial institutions who have fallen under the sanctions," it said in a statement. "We are continuing to process transactions in a normal way."

     
  12.  
    NINTENDO LOSS 09:20:
    Nintendo characters

    Videogames giant Nintendo reports a £57m net loss for the April-June quarter. Higher costs are to blame. Sales were down by 8.4%.

     
  13.  
    BARCLAYS PROFITS 09:03:

    Barclays Bank is currently top of the risers on the FTSE 100. Its share price is up 3.3% at 226.30p, so it looks like investors are responding positively to its half year results. They've a way to make up though. Shares are about 30% down compared with last year. Earlier, boss Antony Jenkins told Today: "We are ahead of the targets we have set and in the next few quarters the market will reflect this in the share price." It's made a start.

     
  14.  
    PETS PROFITS 08:50:
    People and pets

    Stock market debutant Pets at Home has released its first trading update. It covers the 12 weeks to 17 July. Like-for-like sales are up 4.1%. Total revenue is 10.4% higher at £210.8m, driven, it says, by new store openings and strong food, accessories and services trade.

     
  15.  
    MARKETS UPDATE 08:41:

    The markets aren't up to much this morning - seems to be the Russia sanctions issue that's weighing things down. Currently:

    • The FTSE 100 is up 4 at 6811.36
    • Germany's Dax is up 9 at 9663.02
    • France's Cac 40 is down 6 at 4359.61
    • The Pound is down a touch at $1.693 and at 1 euro 26.3.
     
  16.  
    ANA PROFITS 08:31:
    Japan scene

    Japan's All Nippon Airways (ANA) has reported a return to profits in the three months to June. The better result was thanks to expansion at a Tokyo airport and changes to its pension plan. Net profit came in at 3.5bn yen (£20m) against a loss of 6.6bn yen. Sales were 10% higher.

     
  17.  
    BANKER BONUSES 08:24: Radio 5 live

    Shadow chancellor, Ed Balls, says the previous Labour government should have been tougher on bankers' bonuses. "Most of the criticism of the Labour government from the banking sector and the Conservative party was that we were much too tough on the banks. Now in retrospect, those criticisms were wrong because we should have been tougher," he tells 5 live. He points out that no one at the time was pressing them to crack down further.

     
  18.  
    AMAZON INDIA Via Email Simon Atkinson Editor, India Business Report

    The editor of India Business Report in Mumbai emails: "Online shopping is still in its infancy in India but growing fast. The market's led by local players but Amazon is upping its presence - and today said it's investing $2bn in its India operations. But restrictions on e-commerce here mean it can't hold its own stock like it does in the UK and US. - for now at least it is only a 'platform' for others to sell through."

     
  19.  
    BAT PROFITS 08:07:
    A pile of cigarettes

    Tobacco giant British American Tobacco reports a fall in profits to £2.6bn in the six months to 301 June from £2.9bn a year earlier. It blames the strength of the pound but revenue is also lower, down 10% to £6.8bn in the period. Volume - which measures the number of actual cigarettes sold - fell 0.4%.

     
  20.  
    BARCLAYS PROFITS 07:55: BBC Radio 4

    Back to Barclays profits for a moment as Antony Jenkins also tells Today that staff at his bank are changing their ways: "Staff at Barclays are fully behind what we're trying to do with our culture change programme." He says staff know this is not only "the right thing to do" but also the way to better profits.

     
  21.  
    ELECTRICITY CUT 07:43:
    Pylons

    Regulator Ofgem has announced a cut in distribution charges that will mean a £12 a year average reduction in electricity bills. And, incidentally, offers us the opportunity to publish a nice picture of pylons.

     
  22.  
    BANKER BONUSES 07:32: BBC Radio 4

    Barclays boss Antony Jenkins tells the Today programme his bank can already take action against mis-behaving bankers: "If someone has done something wrong and performed badly we have the right too claw the bonus back today."

     
  23.  
    HEADLINES
  24.  
    BARCLAYS PROFITS 07:27: BBC Radio 4

    Antony Jenkins, chief executive of Barclays is on the Today programme: "These are an encouraging set of results... Our capital position has never been stronger."

     
  25.  
    ITV PROFITS 07:27:

    ITV boss Adam Crozier says the broadcaster's "share of viewing" improved during its second quarter helped by the World Cup. He says he is confident of ITV's Autumn schedule of both new and returning drama and entertainment will help keep audience figures high. Meanwhile, ITV has benefitted from the economic recovery - specifically an improved advertising market.

     
  26.  
    BARCLAYS PROFITS 07:19:

    Investment bank income at Barclays fell 18%, reflecting a fall in customers. This follows allegations about malpractice in "dark pool" trading. Essentially, these are private stock markets and are the latest area of banking to be probed by regulators.

     
  27.  
    ITV PROFITS 07:19:

    ITV says total external revenues rose 7% to £1.2bn in the six months to 30 June, while revenue from its online, pay and interactive TV unit was up 20% to £67m.

     
  28.  
    BARCLAYS PROFITS 07:13:
    Barclays logo

    More on Barclays: Statutory pre-tax profit was £2.5bn (2013: £1.7bn), reflecting the fact bank had to set aside another £900m for PPI redress Read the full release here.

     
  29.  
    ITV PROFITS 07:10:
    ITV logo

    A strong set of numbers from broadcaster ITV this morning. Annual pre-tax profits are up 40% to £250m in the six months to 30 June compared with £179m for the same period last year.

     
  30.  
    BARCLAYS PROFITS 07:03:

    Barclays profit before tax is down 10% at £3.84bn.

     
  31.  
    BRITISH GAS BOSS 07:03:
    Iain Conn

    In all the excitement over bankers' bonuses we nearly forgot this. British Gas owner Centrica has succeeded in its pursuit of Iain Conn, confirming he will become its new chief executive from January 2015, succeeding Sam Laidlaw who is retiring. Mr Conn joins from BP where he has been chief executive, of BP's refining and marketing division,for the past seven years.

     
  32.  
    BARCLAYS PROFITS 06:52: BBC Radio 4
    Pedestrians pass a branch of Barclays Bank in the rain in London

    Michael Hewson, chief market analyst at CMC Markets is talking to the Today programme about Barclays interim results - coming up imminently. He says the investment banking arm of Barclays is "not listening" to new boss Antony Jenkins who has been trying to clean up the bank's reputation and practices.

     
  33.  
    DRIVERLESS CARS 06:44:
    Nissan car

    It's going to be a bank-heavy day today let's face it, but just to provide a break from all that, the government will be announcing changes in the law that will pave the way for driverless cars to take to Britain's roads next year. The government wants the UK to become a leader in developing the technology. In December, the Treasury said it would create a £10m prize to fund a town or city to become a testing ground for the cars.

     
  34.  
    MUSLIM ACCOUNTS 06:36:

    HSBC has told three Muslim organisations it will close their bank accounts. These are the Finsbury Park Mosque in North London, a think-tank on Islamic issues called the Cordoba Foundation based in West London, and a Muslim charity in Bolton called the Ummah Welfare Trust, which works in 20 countries giving aid. HSBC says the decisions were "absolutely not based on race and religion".

     
  35.  
    BANKER BONUSES 06:30: Radio 5 live

    More from Ms Mangwana on Wake Up to Money. She says the proposed seven year rule may be more about changing culture in banking and the way in which bankers view their bonuses. But she also points out bonuses are generally paid in tranches that vest over a number of years, already (commonly anything between three and five years). "That's the current formula and there are [already] mechanisms to reclaim those bonuses," she says.

     
  36.  
    TWITTER SHARES 06:21:
    Twitter

    In case this happened too late for you, Twitter shares rocketed 30% on stronger-than-expected financial results. Revenue more than doubled in the second quarter. Shares rose to $50 in after hours trading. Still down on its high of $74.73, hit in December.

     
  37.  
    BANKER BONUSES 06:10: Radio 5 live

    Samantha Mangwana, employment lawyer at Slater Gordon told Wake Up to Money seven years is a long time to hold a bonus and regulators may well find it difficult to reclaim money. It is highly likely bankers will have gone and spent the money already, she says, and have nothing that the Bank of England can reclaim.

     
  38.  
    BANKER BONUSES 06:07: BBC World News
    Tom Stephenson

    Those new rules on bankers' bonuses are expected to recommend a claw-back period of seven years. Tom Stephenson from Fidelity Worldwide on BBC World News says they could have been tougher: "One of the suggestions was that bankers could be jailed for a significant fall in profits - that's quite something isn't it. Even so, being able to claw back bonuses for seven years is pretty draconian."

     
  39.  
    06:02: Matthew West Business Reporter

    Good morning folks. It's looking like a busy day today. We also have trading updates from ITV and house builder Taylor Wimpey. As always you can get in touch via email at bizlivepage@bbc.co.uk and on twitter @bbcbusiness

     
  40.  
    06:00: Rebecca Marston Business reporter, BBC News

    Welcome again to the Live page. We're going to be banking heavy. There's Barclays results - in about an hour - and later this morning the Bank of England will release new restrictions on bankers' bonuses, said to be the toughest in the world. We'll see.

     

Features

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.