BBC collaboration to build multilingual media monitoring system
Language Technology Producer
Keeping up to date with the latest news and trends is a key aspect of any organisation with a global reach and outlook. Language Technology Producer Susanne Weber explains the journey that led to a new partnership to investigate and create a new way to monitor this wealth of content.
BBC News Labs and R&D are partnering up with the University of Edinburgh, UCL, Deutsche Welle and others to build an automated media monitoring tool. We want to build an automated system that will not only monitor hundreds of international TV channels, radio stations, online articles and social media; this system will also be able to observe trends and detect news stories and events in several languages.
Research groups from across the globe are joining up to help build this platform; Priberam (Portugal), LETA (Latvia), Idiap (Switzerland) and the Qatar Computing Research Institute. The project SUMMA (Scalable Understanding of Multilingual Media) has been granted over €6 million by the EU as part of the Horizon 2020 project programme.
Partners working together at the #newsHack event
It all began during a #newsHack event held in December 2014, which was organised by BBC Connected Studio, the World Service and News Labs as part of the World Service Africa project. The focus was on Language Technology, including speech recognition, machine translation and voice synthesis. Teams came from across the world, including Qatar, Bulgaria, Latvia and Scotland – and not long after that, the idea for SUMMA was conceived (over a cup of tea!) to build a multi-lingual media monitoring platform.
Monitoring the international news media is of critical importance, and not just to the BBC, but also to news agencies and journalists and many industrial sectors, including advertising, finance and sports. Monitoring the global media, spotting trends, tracking people in the news and identifying differences in reporting on the same events is a crucial activity for organisations with a global outlook.
The aim of SUMMA is to significantly improve this process through the creation of a platform to automate the analysis of media streams across many languages; to aggregate and distil the content, to automatically create rich knowledge bases and to provide visualisations to cope with this deluge of data.
The scale of the task is increasing massively year-on-year because of the rapidly growing number of internet broadcast and text portals and the increasing number of broadcast media sources.In March 2015, BBC Monitoring had access to over 13,000 sources. The European Media Monitor at the European Commission’s Joint Research Centre ingests textual material from over 10,000 RSS feeds and HTML pages, 3,750 key news portals world-wide, plus 20 commercial news feeds - in up to 60 languages.
A recent analysis by the Arab Advisors Group determined that there were 658 fully launched and operational Free-to-Air satellite channels targeting Arabic countries in May 2013 - an increase of 600% since 2004, with over 100 extra channels in the previous year. This is a tremendous volume of data and current approaches to monitoring, in particular for audio and video content, simply cannot cope because current media monitoring systems are severely limited in terms of the number of streams that may be simultaneously monitored, the support needed for multiple languages, the ability to ingest and process multiple media types and the richness of the automatic analysis that they supply.
Teams from across the world meet for the project organised by BBC Connected Studio, the World Service and News Labs
BBC Monitoring undertakes one of the most advanced, comprehensive and large scale media monitoring operations, providing news and information from media sources around the world. Around 300 people monitor TV, Radio, internet and Social Media sources to detect trends and changing media behaviour and to flag breaking news events. Media monitoring journalists look for emerging themes – political, societal and economic – and aim to anticipate certain stories and events. The expertise of monitoring analysts and journalists is required to understand a change in behaviour of particular media sources.
The media landscape has become too large to maintain the traditional approach. In SUMMA we shall address this through the development of a scalable platform for intelligent media monitoring featuring:
- multilingual stream processing including speech recognition, machine translation and story identification
- the automated construction of knowledge bases based on entity and relation extraction
- natural language understanding including deep semantic parsing, summarization and sentiment detection
- rich visualisations based on multiple views (eg topic, person or timeline)
For the latest news on the SUMMA project, please visit the BBC News Labs website.