BBC Genome update: Search, discovery & access

Wednesday 12 October 2011, 18:32

Helen Papadopoulos Helen Papadopoulos

Tagged with:

Navigating the BBC's Broadcast History

My dad is a physicist, working in quantum field theory, and he introduced me to the work of Richard Feynman at a very early age. Feynman is probably the most famous physicist after Einstein (though younger readers may prefer Brian Cox) and he managed to make some of the deepest mysteries of science accessible to us all.

In 1981 Feynman was featured in a BBC Horizon programme called 'The Pleasure of Finding Things Out', a title that was also used for a printed collection of his essays and interviews, and I'm sure my dad will be pleased that my current work with the Genome Project will bring this and many other treasures from the BBC's past to the attention of Feynman fans and others, and give all of us a way of finding out what the BBC has been broadcasting on television and radio, all the way back to 1923.

The Genome Project

Genome will create a complete database of the BBC's broadcast history, giving details of every programme the BBC broadcast - or at least intended to broadcast - on radio or television since the schedule was first published in 1923. Since the information is not available electronically for most of that period we're taking a brute force approach and digitising the one complete record we do have: printed copies of the Radio Times.

We started in 2010 with a small-scale pilot and its success convinced us that the project was technically feasible and would be value for money. As I write we've scanned over 4,500 magazines and have imaged more than 360,000 pages, so things are moving well.

Why Are We Doing This?

Genome will provide a database of BBC programming that will be used to support a wide variety of applications and services provided by the BBC and others, but it has a deeper value too.

Most people find pleasure and comfort in recalling programmes which we watched on television or listened to on the radio. The BBC's broadcast output and the Radio Times reflect events and society at home and abroad, and Genome is a gateway to that past. This is not just about finding out what was on television on the day you were born, but an historical record which can enrich our knowledge of the world, discovering our past and influencing our future.

How We Did It

The first step was assembling the copies of the magazines themselves. As you'd expect, the BBC holds several full sets of Radio Times, including preservation copies at the BBC Written Archives centre in Caversham, but these are contained in bound volumes for reference. Rather than disbind them, we tried to acquire as many loose issues of the Radio Times as possible so that they could be scanned easily.

Fortunately, we were able to borrow loose magazines from private collectors including an extensive collection from the 1920s. BBC Worldwide lent us the loose collection they acquired from television historian Wallace Grevatt after his death in 2003.

Extracting the data

Once the scanned images of the magazines had been produced, the mammoth task of capturing the text in the Radio Times could be begin.

This is largely being done automatically, but in order for this to be feasible we had to analyse the magazine formats, layouts and channel history over the 88 year period to create rules could be applied to capture the programme listings in a meaningful way.

We devised a schema which would house the various parts of the programme listings such as time, title, synopsis, cast, crew and so on, and doing this revealed just how complex the BBC's channel history is.

Here is a snapshot of the pre-war 'network and nations' radio services and how they merged or were replaced. It shows both the geographical transmitter history over this period and the complexity of the data sources we were dealing with.

Pre-war 'network and nations' radio services logs

We also discovered that the Radio Times itself is complex due to the changing layouts and formats, but it is reasonable that editors in 1923 would not have worried about making life difficult for a team of experts trying to scan the magazines more than 80 years later.

What We've Got

Using optical character recognition (OCR) software to recognise the text and semantic rules to segment the information uncovered in the magazines, the data is available to us as a collection of XML (eXtensible Markup Language) files. These are not reader-friendly, so we have developed a tool that can read them and present the information in a more accessible format for checking and validation as well as allowing us to show off the amazing details of the BBC's schedule over the decades.

A snapshot of the optical character recognition (OCR) software.

At the moment we have received XML and searchable PDF files for six decades of the BBC's programming, a total of 2.3 million programme listings and we expect between 3 and 3.5 million programme listings by final delivery in December. We will then make it available during 2012.

Helen Papadopoulos is the Project Manager of BBC Genome

Tagged with:


Jump to comments pagination
  • rate this

    Comment number 1.

    It looks like "damn fine work" to me, looking forward to being able to explore the data.

    The top diagram looks like the sort of thing I really like looking at (along with disused railway line on Google Earth and 1950s ITV company logos) so would it be possible to post a link to a full resolution version?

  • rate this

    Comment number 2.

    Interesting to read the comment on Horizon & Feynman - once we have the Genome next year we will be able to see how far science has dumbed down on the BBC over the years....

  • rate this

    Comment number 3.

    Very pleased to learn this project is still active and is well on the way to completion. Shame some form of content isn't available yet, but beggars can't be choosers.

    Whilst doing some Radio Times research of my own for a pet project a few years back, I discovered pretty chunky gaps in the schedules for local radio (some months in 1991/1992 was one instance I found, but there could be more), where the only information was frequency details. How are you getting round times where there would be a total lack of schedule information in the Radio Times, or are you accepting the fact that a 100% success rate is impossible?


This entry is now closed for comments

Share this page

More Posts

The future of British television comedy in the north

Wednesday 12 October 2011, 12:24

The new BBC Archive Centre at Perivale

Thursday 13 October 2011, 11:20

About this Blog

This blog explains what the BBC does and how it works. We link to some other blogs and online spaces inside and outside the corporation. The blog is edited by Jon Jacob.

Follow About the BBC on Twitter

Blog Updates

Stay updated with the latest posts from the blog.

Subscribe using:

What are feeds?

External links about the BBC

BBC Three online proposals set out as relaunch scheduled for autumn 2015 (Digital Spy)
"This is not moving a TV channel and putting it online. This is new. We are the first broadcaster in the world to propose something like this."

BBC Three to cut Don't Tell the Bride and other reality TV shows when channel moves online (Independent)

BBC theme park featuring Doctor Who and Top Gear set to open in 2020 (Telegraph)

JK Rowling's Cormoran Strike crime novel The Cuckoo's Calling will be turned into BBC series (Daily Mail)
"With the rich character of Cormoran Strike at their heart, these dramas will be event television across the world."

Yentob leads the BBC fightback: we're being smeared for exposing Fake Sheikh (Independent)

BBC Makes Unprecedented Counter-Attack To Sun Editorial Accusing It Of Left-Wing Bias (Huffington Post)

Serial podcast set to air on Radio 4 Extra (Radio Times)
"We know we already have tons of Serial listeners in the UK but we love that the BBC will help us reach many, many more than we ever could with podcast alone"

BBC iPlayer launches on Xbox One (Broadband TV News)

BBC ‘a great British company, not a government department’: Danny Cohen (Guardian)
"I ask you to stand by the BBC in the year ahead. Support it, make the case for it, speak up for it, celebrate its achievements and help us make sure we can keep offering such an extraordinary range of programmes for all audiences."

See Doctor Who, Miranda, more in BBC Christmas trailer (Digital Spy)

The BBC is right to point out failure on debt. Osborne is wrong to complain about it (The Spectator)

Chris Morris returns to airwaves with new sketch on BBC 6 Music on Sunday (Guardian)
"Blue Jam and On the Hour satirist’s first radio sketch in 15 years will be broadcast on Mary Anne Hobbs’ morning show"

BBC releases game maker kit for kids (Ariel)

BBC Music Sound of 2015 longlist revealed (Guardian)
"Solo artists such as James Bay, George The Poet and Raury make up most of this year’s list of musicians tipped for big things in 2015"

Why Gillian Anderson is the new Helen Mirren (Telegraph)

War and Peace to take over Radio 4: Ten-hour production of Tolstoy's novel to be broadcast on station on New Year's Day (Daily Mail)

Sherlock returns: BBC confirms special with picture of Benedict Cumberbatch and Martin Freeman back filming (Mirror)

The Reith Lectures explain why doctors fail (Telegraph)
"Dr Atul Gawande delivered an excellent first lecture on the fallibility of medicine, says Gillian Reynolds"

Nine-year-old Katie Morag star on winning BAFTA award and juggling TV series with school lessons (Scottish Daily Record)

Strictly Come Dancing 2014: Same-sex couple dance received positively (Metro)

Doctor Who, Andrew Scott and Sir Ian McKellen up for BBC Audio Drama Awards 2015 (Radio Times)
"Maxine Peake, Marcus Brigstocke and Toby Jones also scoop nominations for their work in audio drama"

Last updated Thursday 11 December 2014

Blogs from across the BBC

Selected by the About the BBC Blog team.

Making radio [BBC Outreach & Corporate Responsibilty]
Award-winning research [Media Action]
BBC Online Briefing Winter 2014: keynote [Internet]
Booking agents: how they can develop your act [BBC Introducing]
Introducing Emma Smith one of our new 2015 Fellows [BBC Performing Art Fund]

MatOf ThDay At 50: onic theme even has a banjo [TV]