BBC R&D

Posted by Kat Sommers on , last updated

Room 101 was quiet this week as the summer holidays took hold, but those in the office made progress on a number of projects.

Yves has been deriving statistics and graphs from the World Service archive. There's not much in the way of metadata for these audio files, so he's been writing a Python library for extracting keywords from recorded speech, using CMU Sphinx as a backend, a language model based on the Gigaword corpus, and the HUB4 acoustic models, and matching the results against a set of DBpedia tags used by the BBC to describe programmes. He is currently running it on a part of the World Service archive to evaluate its results.

He has also been liaising with the BBC Pronunciation Unit, which has a massive database spanning 80 years of proper names (places and people, mainly) with associated pronunciation. He met with MetaBroadcast and is planning our first deliverable within the ABC-IP project - a data availability report.

The News Linking project is now properly transitioning from research phase to prototyping. Our colleagues in News were pleased with the results of the research, and Duncan has been putting together the scaffolds of a prototype by crawling and indexing news sources. He and Chris L then used his clustering algorithm to group "big stories" together, and see if it's possible to find comment or opinion about news stories.

Chris N has been looking at how to synchronise real-time delivered events to a live audio or video stream as part of the LIMO work for P2P-Next. He's been investigating what support currently exists within HTML5, and in particular the WebVTT specification. Here's a video of Silvia Pfeiffer explaining WebVTT to an audience at Google.

Theo and I presented a concept for the News Companion project that allows us to explore what it means to follow the news across devices and in today's fast-paced, ubiquitous news environment. After interest from colleagues working in News and second screen technology, work begins on Monday and will eventually dovetail with the synced service Chris N is working on.

The RadioTAG prototype is finished, bar a few minor tweaks and some monitoring that needs to be put in place before the audience trial begins in September. We're not interested in seeing how many visits or users use the site, but how people use the service end-to-end - do they tag radio? When? How often? Do they follow up those tags online? Do they continue elsewhere? Usual site traffic stats won't cut it, so Matt and Sean have set up some specific analytics.

Pete and Joanne began the week reviewing the use cases for FI-Content, an EU project we're involved in. They did a lot of creative work to understand the design problem we need to address, and worked out next steps, which will involve a two-day trip to Salford next week to work with Liz, Theo and Vicky next week on scenarios and test some lo-fi prototypes.

We have two honourable mentions in this week's notes, as on Friday we were joined by Andrew McParland and Ian Forrester. Andrew was over with Sam Davies to talk to Yves and Chris L about speech recognition and the ABC-IP project, and Ian sat with us while he sorted out aspects of BarCampMediaCity and wrote up a tech paper on a new concept called Perceptive Media (he promises more will be revealed once the paper if finished!). He's up to some very interesting stuff with emerging technologies and trends, but more on that soon...