Research & Development

Posted by Rob Cooper on

A short one this week!

For the last two weeks the Data Team Speech-to-text group have been digging into a new regional pronunciation database. They’ve tested their system on a range of regional accent ground-truths, with thought-provoking results about the differences in accuracy. More work is likely to come from this. They’ve also been continuing their work on the latest update to their speech-to-text system and their work in making get-audioset an independent tool. There’s also been a fair bit of debugging, patching and even messing about with 30 year old C code.

The Recommendations team have been busy presenting their thinking on Personalization and Public Service Recommendations, consolidating the results from the Introducing Prototype and evaluating their recommender system performance for under-35’s (a key demographic for the BBC).

The Natural Language Processing team have been continuing their work in non-news Tagging, extracting data from the NewsLabs Slicer database to help their work on segmentation and processing a trove of subtitles from factual programmes to help train new tagging models. They’ve also been using subtitles to train napkinXC, an interesting new multi-label classifier, and finding the best ways to visualise and evaluate the results.

The Data Science Research Partnership Team have been continuing their legal sign-off efforts around the Extol project, continuing their outreach plans by organising presentations to potential academic partners around interesting BBC strands like Spring Watch and BBC Bitesize and working out some new data storage and distribution workflows.

The Internet and Society team have also had a busy fortnight. David and Tristan are planning on taking their “stock photos for AI” workshop to other teams around the BBC - we think it helps people think through how they want to communicate their work with AI and ML. And we’re starting to think more about how to communicate the work we’ve done so far on understanding ML and, as such, we have been writing and talking to lots of people. Meanwhile, Libby has been thinking about those times when ML goes wrong, in preparation for Galen joining our team. Welcome Galen!

Libby and David continue to develop their futuring framework and have been introducing the concepts to some contacts. Alicia continues to design and develop the Introducing prototype interface that the Data Team have been working on while David has been interviewing users and capturing their feedback. Libby has also been preparing for an internal talk on BBC Together and Alicia’s been writing up her work on her chatbot.

This post is part of the Internet Research and Future Services section