What we're doing

The BBC absorbs and produces a large amount of news and other textual material in many different languages. To help journalists and other staff make sense of all this information BBC R&D is working on a number of tools which use state-of-the-art Natural Language Processing techniques to extract semantic data, classify content and analyse sentiment. We are also exploring ways to extract higher level, structured information, such as quotes and statistics from documents. When used in conjunction with subtitle streams or speech-to-text systems the tools can also be applied to TV and radio.

How it works

We use semantic data from Wikipedia, DBpedia and Wikidata to build indexes that allow entities such as people, places and organisations to be identified in unstructured text. These indexes target specific languages and include interlanguage links. Ranking and disambiguation of the raw results is then achieved using statistical techniques such as vector space word models.


Tweet This - Share on Facebook

BBC R&D - AI Production

BBC R&D - Data Science Research Partnership

BBC R&D - The Business of Bots

BBC R&D - Editorial Algorithms

BBC R&D - Using Algorithms to Understand Content

BBC R&D - Artificial Intelligence In Broadcasting


BBC R&D - Content Analysis Toolkit

BBC News Labs - Bots

BBC Academy - What does AI mean for the BBC?

BBC Academy - AI at the BBC: Hi-tech hopes and sci-fi fears

BBC R&D - Natural Language Processing


BBC iWonder - 15 Key Moments in the Story of Artificial Intelligence

Wikipedia - Artificial Intelligence

BBC News - Artificial Intelligence

BBC News - Intelligent Machines

This project is part of the Internet Research and Future Services section

This project is part of the Content Analysis Toolkit work stream


People & Partners

Project Team