Natural Language Processing and Machine Learning techniques can be used to spot connections, improve understanding and avoid information overload.
Project from - present
What we're doing
The BBC absorbs and produces a large amount of news and other textual material in many different languages. To help journalists and other staff make sense of all this information BBC R&D is working on a number of tools which use state-of-the-art Natural Language Processing techniques to extract semantic data, classify content and analyse sentiment. We are also exploring ways to extract higher level, structured information, such as quotes and statistics from documents. When used in conjunction with subtitle streams or speech-to-text systems the tools can also be applied to TV and radio.
How it works
We use semantic data from Wikipedia, DBpedia and Wikidata to build indexes that allow entities such as people, places and organisations to be identified in unstructured text. These indexes target specific languages and include interlanguage links. Ranking and disambiguation of the raw results is then achieved using statistical techniques such as vector space word models.
This project is part of the Internet Research and Future Services section
This project is part of the Content Analysis Toolkit work stream