BBC finishes Radio Times archive digitisation effort
The BBC has completed its effort to digitise programme listings from old copies of the Radio Times magazine.
The BBC Genome project is designed to help the organisation identify shows missing from its archive.
Most early output was not recorded and many later tapes were destroyed.
It will be used to create an online database allowing, where possible, the public access to old broadcasts - or available photos, scripts and other materials for missing shows.
The scheme was given its name because the corporation likens each of it programmes to "tiny pieces of BBC DNA" that will form a "data spine" once reassembled.
The project has involved scanning in the pages of about 4,500 copies of the Radio Times. They date from its first issue in 1923 to 2009. For later dates records generated by the iPlayer catch-up service are used.
The BBC archive development team has identified about five million programme records involving 8.5 million contributors.
That compares with roughly one and a half million shows listed in the current archive database - the numbers are not completely comparable as the listings include repeats.
The data must also be treated with care as the magazines only reveal what the BBC planned to broadcast and not late changes to the schedules.
Information is also missing for the first nine months of broadcast before the magazine was launched. Other records will be used at a later point to fill this gap.
The researchers hope the project will lead to shows being recovered if the public realises they have audio or video recordings of missing programmes.
"Clearly not all the material will exist out there anyway just because lots of the programmes in the early days weren't even recorded - they were just broadcast live," said project manager Helen Papadopoulos.
"Lots of things were also recycled or disposed of.
"Part of it is to recover some of the lost programmes but it's really about having a comprehensive history of the BBC and its schedules."
Part of the digitisation effort was outsourced to a French team that scanned in the magazines' pages and then used optical character recognition (OCR) software to extract the information.
It used specially designed software to make sense of the Radio Times's changing layouts so that the information could be presented in a uniform fashion in a database where it could be checked and validated by Ms Papadopoulos and a small team of workers dedicated to the project,
The work was originally due for completion by August 2011, but proved more complicated than envisaged because the team had not accounted for issues raised by listings showing stations broadcasting different material at the same time. Examples included when BBC Radio 4 split its schedule to put the news on its FM radio frequency, but the cricket on long wave.
Other one-off issues also had to be checked.
"One of the last few files that we checked showed that there was a whole day of listings missing - the date was Tuesday 28 January 1936," said Ms Papadopoulos.
"It was the King's funeral and so we kept thinking our suppliers had somehow missed all the listings or that there was a page missing from the Radio Times for that issue.
"But when we went back and looked at the magazine it said something to the effect that 'The King's funeral arrangements would be announced over the microphone'."
The BBC Genome database will initially be restricted to the corporation's staff, but the project team said if all goes well it could be accessible to the public online by the end of 2013.
It will then feed into another scheme called Project Barcelona, which plans to offer BBC archive content via an online shop.
The BBC Trust has still to decide whether to allow it to go ahead.
Other broadcasters may be concerned about the disruptive effect that providing so much content online would have on the market.