Main content

July BBC Machine Learning Fireside Chat: The Battle Against Disinformation

Sinead O'Brien

Lead Project Manager, TS&A

 “All around the world, fake news is now the poison in the bloodstream of our societies – destabilising democracy and undermining trust in institutions and the rule of law” - Speech by Tony Hall, Director-General of the BBC - Lord Speaker Lecture - Wednesday 20th March 2019.

Propaganda, deception, suppression of free speech, have all been enduring issues for every society, but in recent years terms like ‘fake news’ and disinformation have been heard in public discourse with alarming regularity. So, what is happening to make it a live issue for news organisations? Can anything be done to push back against the wave of disinformation? What type of interventions are needed? Can ML help tackle disinformation?”

The latest fireside chat was hosted by BBC Technologist Ahmed Razek. The panel line-up for the evening featured Sam Jeffers (Who Targets Me?), Dr. David Corney (Full Fact), Magda Piatkowska (Head of Data Solutions, BBC News), and Jon Lloyd (Mozilla Foundation).

Ahmed Razek kicked off by setting out First Draft News’s seven categories of misinformation.

The world of misinformation is complicated. Do people actually care about having real news that challenges them?

Sam Jeffers feared that we only think about disinformation a bit, not enough. Who Targets Me is trying to normalise people’s understanding. We see strange things from time to time that deserve explanation. There is a growing community of people being confronted with misinformation. There is a need to help people find trust signals to help them differentiate between trustworthy and untrustworthy content. If we can be more transparent, we can make more of the trustworthy content more trusted. Madga Piatkowska stressed the need for developing data solutions without hurting people. The intent behind publication and content is an important aspect. Satire is not true and “facts” are not always facts - not everything is intended to misinform.

Jon Lloyd, referring to his advocacy work at Mozilla, thought it is all too easy to fall into the trap of talking about fake news. Disinformation is affecting every aspect of our daily lives now. This is a sociological problem, spanning human rights, political and health arenas, and so on. Companies behind tech need to be looked at closely. The public is coming along with Mozilla on disinformation as a term. In the US, a recent survey showed that people are more concerned about disinformation than terrorism.

We are discussing how ML can tackle disinformation. Jon has advocated for one simple tech change - The Guardian’s data labelling feature.

Jon shared his relevant experience of proactive media action in the face of disinformation. The Guardian noticed a lot of traffic on a 2013 article quite suddenly (it was an old article). Traffic was coming from a Facebook group which was posting a lot of Islamophobic content. The Guardian knew that people were not paying attention to the date of the article so they tweaked the metadata to make the date immediately noticeable to the reader. Taking a human-centric approach to do what was in their power - to change what was happening. Lots of blame is reflected on the media for not doing enough. There are more sophisticated threats now, more authentic accounts spreading misinformation. We need more transparency on organic content (user-generated content). It is necessary to work with researchers to set a baseline of what excellent looks like and to assess against that baseline. Jon encouraged Technologists to support transparency efforts to get to excellent.

The nature of elections is changing. What do technologists and journalists need to prepare for going forward?

Sam thinks that we regulate tightly in the UK. Who Targets Me is interested in people being able to prove who they are, particularly if they are running large amounts of political advertising. Some special cases deserve anonymity but an individual, group or organisation should generally be able to stand behind what they put out. Do people really understand why they see a particular message or content, based on the data collected on them? Democracy is about debate and collective decision - we need to explain modern campaigning approaches and raise faith in how elections are run. Facebook doesn’t expose information about targeting - what data is used to reach particular people. Social media tools allow for the circumvention of conventional electoral practice.

Can the panel share some insight into the fact-checking process?

Magda shared observations of BBC News’ work with Reality Check journalists. BBC News has a role in transparency, in explaining to the audience what happens. Most people don't understand what targeting actually is. It is very important that we do explain. Sam maintains that Facebook is an interesting dilemma as they have done more than other platforms, but take multiple-times more money for this type of advertising. Google and YouTube transparency tools are polluted; they are not clear on how often they are updated and they are messy.  

David Corney shared useful insights into Full Fact’s fact-checking carried out by journalists - checking claims by influential people that may be misleading or easily misinterpreted by the audience. The fact-checking journalists publish a fact check; a piece summarising the full story after doing the research that the audience does not have time to do. A smaller communications team checks when these claims are being re-shared.

Newspapers are asked to publish corrections but regularly decline the invitation. Full Fact’s automated fact-checking team is a team of technologists working to support the fact-checkers and the organisation’s communications team, using ML. Prototype software to do fact checks of full sentences is being developed and refined. Algorithms will find a series of data and check if a claim is true or false for more straightforward claims. Full Fact recently received Google funding to build a better claim detection system. Concrete claims will be stored, labelled and tagged. This will allow a wider range of media and free up fact-checkers.  The potential dangers of disinformation are making the BBC risk-averse, and in journalism, this is a problem as the speed of publication is important. Increasingly, it is not that we have a problem with the process, but that we have a problem with the competition.

Recent BBC News research suggested that, “In India, people are reluctant to share messages which they think might incite violence, but feel duty-bound to share nationalistic messages”. What does the global view of the impact of disinformation look like? What is the non-western perspective?

Magda said that different patterns and preferences are witnessed across the globe. Long journalistic tradition is not the case everywhere. Literacy challenges, accessibility to online content, and ability to scan and consume it are prevalent in certain regions. We must also consider the impact of government and propaganda in certain areas. Jon added that we could end up creating policies that are difficult to enforce on a global scale. Magda thought that we needed to pay attention to where the tech is going to grow, e.g. China, as data will impact the way in which disinformation will spread in those regions. Are scoring systems a viable tool to rate content? David felt that 20-30 years ago when content was primarily garnered from newspapers and TV news, editorial teams acted as gatekeepers. That role has been somewhat demolished by social media and citizen journalists who spread their stories. We need something to point us to the stories that are worth paying attention to. If the algorithm gets it wrong, automation will be damaging.

What role do algorithms have to play?

Ahmed moved the conversation along to the subject of recommendation algorithms. Sam pondered, “When it strikes you that you see a Facebook ad and you click through and then you are recommended other pages, how quickly can that send you in more radical directions than you were expecting - to some strong content?”.  Regarding recommendation engines built a while ago, we don’t really know where the accountability lies. Do we understand people’s information diets?  People are consuming lots of stuff from a particular perspective and wonder how they got there. Magda argued that if you really rely on ML you have to take into account that your algorithm learns from people’s behaviour. That behaviour is not always good for them; they sometimes have poor information diets. This is when we analyse what it means to be informed by editorial and policy strategy as well as tech.Start simple so recommenders are not too complicated, and so we can assess if we are hurting the audience. If you put more interesting content in front of people, they do engage. Take the audience and journalists on a journey.

David thought that algorithms have a tendency to go towards most extreme content. Algorithms do give us relevant recommendations but sometimes get it wrong. Recommendations systems can look to the authority of sources rather than recency. Jon reflected that ultimately we need transparency. Platforms say they are making tweaks and fixes that cannot be proven. We are supposed to take at face value these companies who have profit and expansion at their core. Sam agrees - if a business model is totally dependent on the algorithm, and platforms are optimising for engagement and the scale is huge, switching them off is a massive decision. Magda reminded us that this should be the responsibility on the supply side of the content also. 

In the UK, a recent report on misinformation by the Commons Select Committee suggested that a new category of tech company should be formulated, “which tightens tech companies’ liabilities, and which is not necessarily either a ‘platform’ or a ‘publisher’”.

There has to be a system, according to Magda. There is no one object of regulation - not platforms, or media, or government. It is the responsibility of the system. All parties have their part to play. Media has a role to educate. Sam restated Who Target Me’s interest in radical transparency around political advertising. There may need to be a product suite for solving all the different problems. Different tools are required to do different jobs, and a market created in tools to help you to understand.

A team of researchers at the Allen Institute of AI recently developed Grover, a neural network capable of generating fake news articles in the style of human journalists. They argue they are fighting fire with fire because the better Grover gets at generating fakes, the better it will get at detecting them.

The ability to generate text did not worry David, the problem is getting the content into a platform where people start believing it. The story is not a problem in itself. Madga argued that it depends on intent. Who is behind it, using for good or for bad?

There has been some hype around both deep fakes and shallow fakes. A recent example of the shallow fake was the slowed-down video of Nancy Pelosi which made her appear to be disoriented. This video was subsequently retweeted by the President of the United States. There was no ML required here, this was basic video manipulation that has a profound effect.

Jon believed that a picture is worth a thousand words. Video even more so. Preparedness is better than panic. We should be more concerned about recommendation algorithms, methods of verification and systems to flag false content. To unfollow YouTube video is a long process. Changing policies is one thing. Responsible behaviour on the part of companies is not a zero-sum game. Sam thought that political video is shallow-fakey anyway. It’s telling a story via the use of selective information. David advocated that it is worth considering radical options like massive regulation. Magda thought trust will become such a big thing; the brand association with factual content. And she foresaw a decline of the not-so trusted-brands. Jon reflected upon transparency transformation in the food and fashion industry but also recognised that there is no silver bullet. It will require a coordinated effort offline as well as online, and not just tech. While financial incentive remains for companies, it won’t happen on its own. Sam added that we can use this tech to make good democratic strides forward also.

Huge thanks to Ahmed Razek and the panel for delivering another engaging fireside chat on a very hot topic. The conversation around fake news, misinformation and disinformation is multi-faceted. As the BBC, we need to keep reminding ourselves and others that the problem is not just about journalism. The impact of misinformation reaches far and wide and needs to be considered from societal, policy, tech, humanitarian and public trust perspectives. And so we, along with other organisations, are taking a deeper look at what is happening in these areas. There is lots of great ongoing work in BBC R&D, BBC News, and elsewhere in the organisation. The BBC provided feedback into the Disinformation and “Fake News”: Final Report (February 2019). Director of the BBC World Service Group, Jamie Angus, subsequently confirmed that the World Service would take the lead in addressing the ‘Fake News’ threat making use of its 42 language services, knowledge on the ground and BBC Monitoring to spot harmful examples and expose emerging patterns. To echo Magda, we must progress in a way that is not harmful to our audience.