Prototyping Weeknotes #96
This week's featured project is ABC-IP, a two year collaborative research project, part funded by the Technology Strategy Board, examining ways to automatically link together different sources of metadata around large video and audio collections.
One particularly interesting dataset we have access to is the entire World Service audio archive. This spans several decades, and consists of about 26,000 hours (three years in total) of audio content.
The data available is quite patchy: a number of programmes claim to have been first broadcast after now (e.g. 2099), or before the start of the BBC (e.g. 1900), for example. However, the actual audio content is high quality and the content itself is the usual excellent World Service mix of education, information and entertainment.
We worked on trying to make this archive searchable, and linked up with other datasets at the BBC and outside, by analysing the content of the programmes and automatically classifying them with DBpedia URIs.
First, we developed an algorithm allowing us to do that with reasonable accuracy. We're working on releasing a Python implementation of it on Github, which will be described in further detail on this blog.
The next stage was to apply this algorithm to the whole World Service archive. We developed an API to manage and distribute the processing across a large number of Amazon EC2 instances, and successfully used it for automatically tagging around 27,000 programmes in about a week, for a predictable cost.
Meanwhile, we've started to investigate the challenges around presenting very large, partially described archives, looking at the design challenges for tag based navigation and retrieval. We're very interested in how we can engage the audience to improve this data.
In other project news:
The diary study has started and whilst we're collecting data, Joanne and Penny are planning a questionnaire and materials for the lab study. Andrew's been building a dashboard showing the users' programme data using Knockout and d3.
Chris N. has been working with Barbara on discussing and defining technical enablers for the project and to start thinking about the scope of the prototype we'll need to demo at the end of the first phase of the project. This week has been filled with project deadlines, with a deliverable about large-scale testing of use cases. Preparations are underway to welcome all the project partners to London for the next pan-European project meeting we're hosting. Akua has been sorting out the important logisitical work of getting 30 people here and working out where to put them.
EBU Radio Week
Chris L., Dan, George, Libby and Sean were in Geneva for the EBU Radio Week Summit and RadioHack event. George gave an overview of our recent work on a RadioTAG trial to the assembled Radio Summit delegates (think - business atire) and Chris L gave a talk about RadioTAG aimed at developers, while everyone saw a lot of interesting presentations. Of particular interest was the developments in the RadioEPG specification. It brings programme information to radios, allowing people to listen to their favourite stations even if they have to switch between IP and Brodcast streams.
Dan and Chris also enjoyed seeing how open-source sofware was being used to make community radio affordable in Denmark and France. The open-source DAB transmission stack allows Kanal Plus in Copenhagen to broadcast on behalf of 43 local stations for a hardware outlay of about €500 each.
W3C Audio Working Group
Olivier spent a chunk of the week on W3C Audio Working Group business, especially on editing the document for Use Cases and Requirements and then matching up use cases and requirements.
Finally, here's a round-up of interesting links from the team.