Posted by Yves Raimond on , last updated
Over this past summer we built a prototype that puts the BBC World Service radio archive on the web. The prototype lets you explore and listen to around 70,000 radio programmes covering 60 years of the World Service. Because it is such a large and diverse archive with sparse descriptive data, we have had to categorise and tag all these programmes with machines running speech-to-text, topic extraction and speaker identification algorithms. And now we want people to help us validate and correct this automatically generated data and improve the archive for everyone. Please sign up to the prototype and let us know what you think.
We previously wrote about our work on automated tagging of large archives, done within our ABC-IP project. Since then, we have been deriving more and more data automatically: topic tags, segmentations and speaker identifications. However automated tools will never be perfect, especially for something as subjective as tags. The World Service Archive prototype aims to test a new approach to publishing large archives online. First, automated processes are used to annotate the archive with tags, bootstrapping search and navigation for users. Then user feedback on these tags will make them better, improving the search and navigation, but also feeding back to improve our automated tools.
This approach is significantly different from the way BBC archives are currently published online, focusing on archive segments around particular brands (e.g. Desert Island Discs, or more recently Letters from America) or particular topics (e.g., World War II), manually annotating that segment of the archive and building segment-specific navigation using those annotations. However there are a number of questions we need to answer when testing our novel approach of combining automated metadata with crowdsourcing techniques. Is it acceptable to publish an archive where the metadata hasn't been comprehensively checked? What are the minimal features required to make such an archive proposition work? Is variable quality metadata acceptable to users? Does user feedback actually lead to increased accuracy? What are the best mechanisms to engage our users in helping us with improving that data?
Features of the prototype
After signing in, users are redirected to the homepage of the prototype. This page contains a set of manually curated programmes from the archive, a list of programmes recently listened to, and links to aggregations of topical content in the archive, generated from "on this day" and "in the news" information from Wikipedia.
On individual programmes in the archive, users are presented with data coming directly from the World Service archive database if it is present (e.g. synopsis, title, duration, broadcast date), an image, and a set of tags. Each tag can come from one of three different sources: it can be derived automatically from text associated with the programme in the World Service archive database, it can be derived automatically from the audio content itself using the framework described previously on this blog, or it can come directly from users. When logged in, users can upvote or downvote each individual tag. They can also add new tags through an auto-completed list, using Wikipedia and DBpedia as target vocabularies. When generating the page, the aggregate of all those edits along with the initial weights assigned to each tag by our automated tools will be used in ranking the different tags. Only tags that have been upvoted more than they have been downvoted will be pushed to the search engine. An image also gets automatically assigned to each programme, using images associated with the top tags from Ookaboo. Users can manually override this image with images associated with other tags, which gives us implicit information on what tag describes the programme best. Users can navigate from those programmes to aggregations of programmes around particular topics, and on to other programmes.
Users can also search through the archive using the search box at the top. This search indexes all textual content associated with programmes as well as tags that have emerged as 'good' tags from aggregated user interactions. Facets are displayed on the left-hand side to refine the results by e.g. year of broadcast.
For most From Our Own Correspondent episodes, we have also generated automated speaker segmentations so you can see the different voices in an episode, and jump between these segments in the programme (we'll write a bit more about it in another blog post). Our tools can identify the distinct voices in the programme, but can't identify who the speakers are. Therefore, we also give the ability for users to name those speakers and the name picked by the most users will be displayed to everyone. Our tools also enable us to recognise speakers across programmes. So when clicking on a speaker names users are directed to an aggregation of all programmes featuring the same speakers. And these user-contributed speaker names will be automatically propagated to all the other programmes, where we ask users to manually approve or correct them. We can then use the resulting data to evaluate our speaker identification algorithm. We are in the process of improving this interface and at some point we will roll this feature out to other programmes.
As we said at the beginning, the aim of this prototype is to test a new approach to publishing large archives with sparse or incorrect metadata. Our guiding principle has been to use algorithms and people, feeding off each other to make this metadata better. We use automated techniques to create the initial metadata and bootstrap a prototype, users to correct and improve this data, and then feed this information back to the algorithms to make them better. Hopefully this creates a useful feedback cycle that results in a better and better archive experience.
We are still in the process of gathering more and more data for this archive, and are aiming to use that data to improve the prototype, both by improving the overall quality of the archive metadata and by understanding the user needs and behaviours a bit more. The questions we asked in the introduction of this post still need to be answered, but we feel this prototype and the community we are trying to grow around it gives us a good mechanism to try and answer them.
We will soon publish two other posts focusing on this World Service archive prototype, one focusing on development, and one focusing on user experience.