Main content

Introducing ADA: Automated Data Architecture. A new way of navigating the BBC’s programme archive

Jo Kent

Business Analyst - BBC Radio & Music Multiplatform

The BBC has a wealth of permanently available programmes across a wide range of subjects, but they can sometimes prove hard to find unless you already know the programme title. Jo Kent, Business Analyst in BBC Radio & Music Multiplatform, talks about a new project designed to find a way of pointing people to similar programmes, so they can discover some of this hidden content.

Traditional attempts to bring associated programmes to the attention of viewers would involve vast amounts of editorial time and effort in hand picking content to accompany individual episodes or collections. While production teams may well know every programme from a long running show like In Our Time, knowing every related programme the wider BBC has ever made is another matter entirely. And these episodes or collections often have a short life, typically around the time the episode or series is on air. There’s a clear audience benefit, but we’re talking about a huge editorial undertaking, that in many cases simply can’t take all programmes into account, and that may go stale in a relatively short amount of time.

Crowdsourcing has been tried with mixed results, one problem being that people will only look at the things they are interested in, giving uneven coverage. They also all have different opinions on which category things should be placed under, and even which categories there should be in the first place. We don’t have the time or the people to be able to manually overcome this shortfall any more than we do to curate the content ourselves.

A totally automated approach also has its downsides; while it’s easy to make connections between things of the same type, we wanted to supply fresh and interesting choices. We’ve all seen some bizarre and unhelpful recommendations which rely solely on an algorithm, or been offered the thing we’ve just bought by a poorly configured system. Where an algorithm is based on what other people have watched or listened to it tends to become a self-fulfilling prophecy, just leading people to the same content all the time, which doesn’t help us to open up the archive.

BBC Sport has long used linked data to specify the content of articles and audio visual content, which is then allocated to aggregations according to the sport ontology. This ontology requires a hierarchical and easily mapped domain, which is not something that applies to our programmes as they cover an enormous range (subjects include the Stanford marshmallow experiment, Rommel , lemongrass, the Boston Tea Party, celibacy, Ghurkas and filibustering) which don’t fit easily together.

Creating a hierarchy to include everything would mean having to create a worldview to describe the universe and everything in it, and to keep it up to date with changes. In doing so, and attempting to then shoehorn all of the content into the existing categories, the concepts applied would cease to have semantic meaning and would become merely organisational tools, meaning a shift in worldview leaves us needing to re-categorise everything. This would be too complicated and time-consuming to be practical and we didn’t feel the effort would be justified by the end result. What was needed was a pre-existing classification. All of the existing ontologies we looked at were either only partial or incompletely mapped, which would give an unsatisfactory user journey.

The Wikipedia categories, however, were promising and we quickly built a prototype around one brand. We chose In Our Time for this because it covers a wide range of subjects from the physical to the temporal and intangible. This prototype examined the ‘things’ tag applied to the programme and used the dbpedia link populate a SPARQL query to extract all of the categories attached to it, then filtered out some of the maintenance categories (such as Pages with reference errors) and connected up the programmes where they had a matching category.

It seemed to work well, with programmes grouped under interesting headings such as Social philosophy, Victorian era and Philosophers of language, so we decided to put it in front of an audience to see how they reacted, because if they didn’t like it the project was a non-starter.

It went live last spring and was promoted on BBC Taster where it got a 4.15 star rating, which was pretty impressive given that the taster audience is not necessarily the same as the In Our Time audience. We also asked for feedback through a form in the beta. This yielded some much more interesting feedback, also broadly positive, with many people making suggestions for how we could improve it.

We took as many of these into account as we could in the final build, which will be launched over the coming weeks, with a handful of archive programmes at first, including In Our Time of course, and will continue to expand over the coming months.

For programmes with the new archive features like this one on Altruism you should see a list of topics below the description of the programme, and up to three recommended programmes with a link to the topic which connects them. By following these links you can either go to a related programme or to the aggregation page which will show you all of the programmes under that brand on the same topic, and there are also aggregation pages such as this one which are not restricted to a single brand and will allow you to see all of the available content. We hope you enjoy exploring the archive.