BBC R&D

Posted by Tim Cowlishaw on , last updated

Since it's my turn to write weeknotes, I'd like to spend a little bit of time talking about one of the Disco team's 'Public Service Internet' projects which we've mentioned before, but not really gone into detail on - the 'Citron' Quote Attribution pipeline that Chris Newell has been hard at work on for the last few months.

Chris has already written in some detail about the technical challenges involved in this work, but I'd like to take a slightly higher level view and talk about our aims for this work, and why we think it's important.

When Olivier first wrote about our new 'Public Service Internet' strand of work back in May, he identified 'fake news' and the 'problem with facts' in the media and in online discourse (borrowing Tim Harford's words) as one of the main challenges for us to address. This is a really thorny problem and one that many really smart people are aiming to tackle in different ways - from the Trust Project and Credibility Coalition's work on standards and kitemarks for trustworthy news, to Factmata and TrustServista's algorithmic approaches to assessing the veracity of online content.

One common idea that comes up when this problem's discussed, is the notion that a machine learning algorithm could be trained to fact-check an article automatically - taking in some prose and outputting a classification of 'true' or 'fake'. Now, this is clearly a massively ambitious technical challenge. However, we also think it also poses a difficult philosophical question too; and a question which puts into doubt the very idea that such an algorithm could both exist, and exist in such a way as to provide any value in this particular use-case. To see why, we need to examine more closely what it means to say a statement is 'true' or 'false'.

Perhaps the idea of 'truth' that we understand most intuitively is the Correspondence theory of truth of Plato and Aristotle, and more recently of Bertrand Russell, among others. In this version - a statement is true if it corresponds to an actually existing state of affairs - if the words of the claim correspond in an orderly manner to the state of some things in the world. This seems like the sort of thing a suitably advanced computer might be able to do for us. However, to do so for the entire domain of news and current affairs, it would need first-hand, unambiguous knowledge of the entire state of the world. In addition, this theory can't speak to claims about hypothetical events at all. In practical terms, we can imagine a machine that fact-checks the claim that "EU membership costs £350 million a week", but it can't, by definition, make a judgement on the claim that "we will spend that on the NHS instead", without the ability to see into the future, which sadly is beyond our current technology. Based on this, it seems like our all-knowing Robotic Knight, come to rescue us from misinformation, is doomed to be an impossibility.

However, even claims about the nature of truth itself are subject to controversy. There are many competing philosophical schools of thought on this subject: The Coherence theory, the Constructivist theory, the Consensus theory and the Pragmatic theory to name the main ones. However, all of them in some sense define the truth of a statement relative to some other entity: a formal system, a community, or a culture. This doesn't necessarily pose a problem for the existence of our truth-telling machine, but it does call into question its usefulness. If we wanted to fact-check an article on tax rates and GDP, it wouldn't necessarily be particularly useful to identify that the Laffer curve theory is 'true' among the community of supply-side economists, as this essentially just begs the question. It may be that that a particular community considers a statement to be true by consensus, but how much weight do I, as a reader, place on the judgements of that community?

However, this example does go someway to helping us understand how a machine might help us judge the veracity of a news article, even if it can't do the whole job itself. In order to understand the veracity of a claim, it is helpful to know who is making it and what their interests are, and this is something we think we can build machines to assist with. We're moving away from treating fake news as a logic problem, and starting to view it through the lens of agnotology - the study of ignorance, and how it is constructed and manipulated. When someone makes an untrue, questionable or subjective claim in the media, they are doing so in order to manipulate the knowledge of their audience, so by examining their motivations for doing so, we can hopefully start to see for ourselves what claims might require closer critical scrutiny. Viewing 'Fake News' in this way also neatly accounts for the problem of how accusations of 'fakeness' themselves can be used as a tool for disinformation.

This is precisely the role we're hoping our quote attribution tools will fulfil. Claims in news articles are backed up by first hand quotes, and the quotes are provided by actors with interests in putting a particular viewpoint across. If we can identify the claims made on a particular subject, and the sources of those claims, we can start to understand the incentives and viewpoints which gave rise to the claim, and easily discover alternative perspectives. This then enables journalists and audiences alike to identify and critically evaluate untrustworthy claims and sources, by highlighting the provenance of claims, and giving the ability to easily identifying anomalous or suspect ones. David's been doing some excellent work developing UX concepts using this technology, and we hope to have some working demonstrators finished in the coming months.

undefinedIRFS and friends hard at work inventing the future of radio

Elsewhere in the team, the Experiences gang have been immersed in user testing and research on several projects: Libby, David, Joanne, Joanna and Kate Towsey (who we've welcomed back after her sterling work on Freebird last year) ran a workshop for the Better Radio Experiences project along with The Rusty Squids - using improvisation and physical prototyping techniques for idea-generation and co-creation, while the Talking With Machines team have been running interviews with users of our Inspection Chamber interactive drama. Tristan, Zoe, Thomas and Mathieu have been wrapping up their work on Phase 1 of NewNews, and getting ready to kick off phase two in the new year. Tim and Henry have also been busy plotting some possible interesting future voice-device work - looking into the possibilities of the technology for live music radio, along with the production team of one of our favourite radio shows. More on that imminently, hopefully!

The Data team have been continuing their work on face recognition - Ben and Matt have been working with Newslabs to integrate our face recognition technology into the new production pipeline, while Denise has been looking at using subtitles for person identification, as a comparison to the more computationally-intensive face recognition approach.

Interesting links:

  • The Fake News Fallacy - interesting essay from Adrian Chen in the New Yorker on the similarities between the current concern over Fake News and the early days of Radio.
  • What we talk about when we talk about fair AI - from IRFS friend-and-alumni and newslabber Fionntán, which is a great introduction to the study of fairness, accountability and transparency in machine learning.
  • The Librarians Saving the Internet - fascinating read on the challenges and importance of digital archiving.
  • Google Maps's Moat - How Google's ability to combine disparate sources of data leads to an explosion in the types of knowledge they can capture about the world.