#BBCDataDay: Sharing the scoops, skills and surprises of data journalism

is an editor of the BBC Academy blog

Trinity Mirror data journalism chief David Ottewell has a yardstick for embarking on a substantial piece of data crunching: “Can we get a front-page splash out of this?”

He should know. He and his small specialist regional team have had plenty. Their popular interactive guide to the performance of every GP surgery in the UK - extrapolated from the GP patient survey - is just one of the latest.

Ottewell was one of the ‘inspirers’ at the latest BBC Academy ‘Data Day’ - designed to help reporters break stories and hold the powerful to account. Others were Mark Baynes, indefatigable creator of LoveWapping.org, and Bella Hurrell of BBC News’s visual journalism team.

The Mirror man’s argument that genuine and realistic investigation - not “I’ll do this because I can” wonkishness - should drive any data journalism enterprise was one theme of the day, attended by regional journalists from a range of organisations at the BBC’s offices in Birmingham’s Mailbox.

He had other tips, like being “anal” about the format in which you want to receive Freedom of Information (FOI) responses: “spreadsheets, not impenetrable PDFs”. And setting up email alerts, filtered for certain locations, from websites like Stats.gov.uk and TheyWorkForYou - the source of literally hundreds of regional and national stories, he said, from numbers of missed NHS appointments to low rape conviction rates in Wales.

When it came to visualising data, charts/maps/graphics had to tell the story at a glance and be visually arresting, as in his team’s mansion tax-related mapping of £2m houses sold in the past five years - London’s Hyde Park v Newcastle.

The personal/local relevance of data was taken up by Bella Hurrell, as exemplified in the #nhswinter weekly tracker on the BBC News website, back for its second outing in 2014-15. Users enter their postcode to get latest info on their local hospital’s performance on A&E waiting times, ambulance stacking, cancelled operations, rates of bed blocking etc, and how it compares with the national average.

“People are interested in stories about themselves. National averages tell one story but not how it affects me,” she said. So two datasets were brought together to do the trick. Next up will be a BBC News online calculator on how much you pay for social care in your area.

The BBC News website's tracking of a month of deadly jihadist attacks

Tip two from Hurrell: collaborate. As in this tracking of a month of deadly jihadist attacks across the world, which involved input from BBC Monitoring, Kings College, plus other external expert groups, to both monitor and classify more than 5,000 deaths. The numbers attributable in November to Boko Haram was the story exposed here, she said.

On a day that started with an opener from David Holdsworth, controller of BBC English Regions, who bewailed the fact that “this is still an industry where innumeracy is almost celebrated”, the consensus from #BBCDataDay participants was overwhelmingly that “all journalists need to know this stuff”.

Stuff like ‘scraping’ (automating the repetitive extraction of data) was wrestled manfully for the regional delegates by online journalist, blogger and academic Paul Bradshaw and data innovator Mark Barrett.

My favourite example from Bradshaw was Patrick Scott’s scraping of song sheet companies’ data to discover which singer has the best range in the UK (it’s not who you think). Another useful Bradshaw takeaway: always ask for a “data dictionary” (a description of the kind of data an organisation holds) before you submit an FOI request: “Then you’ll know what you can ask for, and it might throw up new stories to pursue.”

There was a fairly stretching demo by the Open University’s Tony Hirst of a tool called OpenRefine which he used to explore, filter, clean up and generally wrangle into useable form a lorry load of payment data from Birmingham City Council. Because “real data is often dirty and messy”.

More advanced delegates kept pace with Hirst’s heroics, but as he freely admitted: “When I come across a new tool, I play with it to death… Learn the technique, start to live the technique!” So that’s what it takes…

Google’s Stephen Rosenthal had a more populist pitch, with a handy run through of Google search features including Google Trends, which he used to map spikes of online interest in the ice-bucket challenge craze against interest in ALS (amyotrophic lateral sclerosis), the degenerative disease the challenge set out to promote. Counter-intuitively, the algorithm spike was higher for the disease than the stunt.

The same trend comparison between ‘Movember’ and prostate cancer, which the moustache-growing effort is meant to support, showed the reverse: the good cause was losing out. Not a bad couple of stories, if not quite up to Ottewell’s front-page kitemark.

The creative challenge of making visualised data as eye-catching on mobiles as any other screen was, by agreement, a work in progress and a recurrent theme. More than one speaker questioned whether data-driven stories that didn’t work for mobile users were worth doing at all.

More data journalism tips and insights in these #BBCDataDay presentations:

Paul Bradshaw, Online Journalism Blog

David Ottewell, Trinity Mirror

Tony Hirst, Open University

Mark Baynes, LoveWapping.org

Data journalism for beginners

Data journalism: What’s new, what’s not and work in progress

Data Journalism Day: Holding power to account

Our data journalism section