Main content

'Deepfakes' have rightfully grabbed negative media attention, but is there a creative and editorially solid opportunity to exploit the underlying technology? BBC Blue Room's Ahmed Razek has been experimenting with this controversial technology.

Deepfakes - the ability to manipulate video with malicious intent - continues to be a technologically troubling force. A recent report by cyber-security company Deeptrace highlighted that from 14,698 Deepfakes found online, 96% were sexual in nature with women overwhelmingly the victims. In the battle against online disinformation, Deepfakes are thankfully still a side-line issue though there are troubling signs ahead. Last year doctored footage of Nancy Pelosi, speaker of the House of Representatives, sounding drunk, spread virally across social media causing significant reputational damage. Despite the many articles rebutting the content - the damage was done - a lie can travel halfway around the world before the truth can get its boots on. Strictly speaking, the fake Pelosi video isn't an example of a Deepfake; it's more like a shallow fake - a new term in the misinformation lexicon that describes doctored video produced with basic technology. Due to its simplicity in its creation, some researchers argue that the spread of shallow fakes poses a higher risk to the world of online disinformation than Deepfakes.

With any application of technology, it is all about the intent. I've been exploring whether the same audio-visual synthetic technology used to create Deepfakes can be harnessed to deliver content in new innovative ways. This experiment built on our learning from a synthetic media demo of BBC presenter Matthew Amoriwala reading a news item in several different languages – you can see the results here.

In preparation for the BBC's 2019 annual Media, Tech & Society conference, the BBC Blue Room (the BBC's internal consumer technology lab) was challenged to build a prototype that both highlighted the advances of synthetic media and demonstrated a scalable audience proposition.

Currently, one of the more popular user interactions on voice-enabled devices like the Amazon Alexa is asking about the local weather. Understanding this, we asked ourselves what could be a synthetic video response to a weather query from a celebrity personality look like? And what editorial issues would be raised?

Weather is a useful area to prototype as the content is factual and generally not a contentious content area. Considering that voice-enabled screens like Amazon Echo Show or Facebook Portal are increasingly making their way into people's homes, it won't be too long before we are met with a digital avatar responding to a query.

To create this experiment, we partnered with colleagues from BBC World Service who provided the editorial treatment for the piece and AI video synthesis company Synthesia, who provided the technical AI expertise.

We asked presenter Radzi Chinyanganya to read to the camera the names of 12 cities, numbers from -30 to 30 and several pithy phrases to explain the temperature. The finished script sounded like this:

"Welcome to your daily weather update, let's take a look at what's been happening. In "x", residents are expecting "x", temperatures are expected to be, on average "x" so if you're heading out, remember to "x."

We used the BBC's weather API to fill in the 'x' variable with accurate, up to date weather data from the twelve cities. You may ask at this point, why just twelve cities? To scale a demo such that a presenter can deliver a personalised weather report for any city/town/street in the world, would need advances in synthetic audio technology. When you listen to your sat nav giving you directions or you get a response to your query by a smart speaker you hear synthetic speech. Despite the explosion of investment and research using neural networks to simulate human voices, it is still challenging to replicate voices convincingly. That said, soon you won't be able to tell whether the sound of your favourite celebrity is synthetic or authentic. For our experiment, we decided to use Radzi's real voice, instead of a sub-optimal digital version that would've broken the illusion of the experience.

Take a look at the demo and see the results for yourself. Select your favourite city and get a personalised synthetic video report based on real-time weather data.  Please note this demo only works in Google Chrome and other Chromium based browsers such as Brave, Opera and the new Microsoft Edge.

Safeguarding Trust

Conducting experiments with such contentious technology for a responsible public service broadcaster is tricky. Thorny issues of trust and editorial norms quickly come to the surface.

Trust with audiences is foundational to the BBC. It is clear that viewers watching or listening to fake content that, on the surface, appears authentic risks reputational damage. However, that's not to say there are no circumstances where the use of synthetic media could improve the audience offer without sacrificing trust. A lot depends on us being honest and clear with the audience about what they are getting, an editorial principle that the BBC is used to applying in all sorts of contexts. The use of synthetic media in a news context has, as outlined above, the potential to be de-stabilising, especially in an era of 'fake news'. However, in a different context, like our weather report demo, it is unclear that audiences would be troubled if digital avatars were delivering a weather report. Given the growth of digital assistants and the industry drive for greater personalisation, perhaps there will be an expectation that a video response to a query will be digitally generated.

Another factor to consider that may help with trust would be audience markers. Similar to many online chatbots that use robot emojis to convey to the audience that they are speaking to a machine, not a human, it is entirely possible to use similar visual markers to communicate to viewers that a piece of content is computer generated. In this context, with the added safeguards in place, the growth of synthetic visual media seems plausible even for a responsible public service broadcaster.

The second and perhaps more intriguing issue that arises when thinking about synthetically generated media is editorial. Take the weather demo, even the most generous critic would concede that it's a bland weather report. The storytelling flair and creativity presenters bring to enrich a piece of content is completely lost in this dispassionate demo. One of the significant challenges in a world of computer-generated media will be working out how to create dynamic, imaginative content in a personalised way. Or perhaps to work out how to use technology to deliver the bits of the presentation that are bland but labour intensive and thereby give our talented storytellers more time and space to create valued content in tandem. That's not to say that bland content is an inevitability - the emerging field of AI personality designer could perhaps lead to hugely creative synthetic experiences.

So, back to our original question, can synthetic media drive new content experiences? Yes – I believe it can. Currently, the costs to deliver high-grade synthetic video are prohibitively high for ordinary consumers. As the tools become increasingly commoditised, consumers creating quality synthetic experiences at a low price could conceivably unleash a new model of storytelling. You soon imagine a future where a photorealistic human-like digital character can be made to do anything from reading out the football results to delivering a physics lesson.

At a time when the world is increasingly troubled by authentic false content, the challenge will be, in short order, to work out how to prepare for this storytelling paradigm shift.