Posted by Ben Clark on , last updated

As it’s my turn to write weeknotes, I’m going to try my best to explain how the data team are making content analysis algorithms which analyse video and audio in near real time. Previously I explained how we’re having to re-write the algorithms themselves to work in a different way; today I’m going to explain how we get the audio and video to those algorithms. This is quite specialist, but could be extremely useful for someone who encounters this problem in the future. If that's not you then you could skip to the end and read about the rest of the team.

After a bit of hacking around, we’ve settled (for the moment) on Gstreamer, a library for building media processing applications. Gstreamer applications look like a pipeline or graph of “plugins” components which pass audio or video “buffers” down the chain from one to the next, never knowing in advance how many buffers there are in the stream, exactly how we want our algorithms to work. So all we have to do (in theory) is to write our algorithms as a Gstreamer plugin. How hard could that be?

The answer is: Surprisingly difficult - Gstreamer is complex (probably necessarily so), and there are very few good examples on the internet showing how to do this in our language of choice (python).

Setup: You need a recent version of Gstreamer, at least 1.14. On Ubuntu this means compiling from source. And remember, when compiling gst-python, set --with-libpython-dir otherwise gstreamer will silently fail to find your plugins
Path: Put your plugin in a folder called python and set the GST_PLUGIN_PATH to the directory above the folder called python.

Once you’ve done that, all that remains is to write a class which implements your plugin. Here is a skeleton which we’ve used in our projects. The idea is to write a plugin which converts the gstreamer buffer into something we’re familiar with, a numpy array. Once it’s in that format we can use all sorts of awesome machine learning goodness.

First, import the required libraries:

Next, we create our class, extending GstBase.BaseTransform. This class needs some fields to be defined. The most important one is __gsttemplates__ which define the data format which the pads accept. In our case we want raw video frames in BGR format.

Next, the interesting bit. The method do_generate_output is called many times, and we only process the buffer once. If the buffer is new, we get the height and width of the frame from the caps, and convert the buffer into a numpy array before returning it. If we’ve seen the buffer before, we return None to tell the pipeline to move on.

Finally, outside of the class, register the plugin so that gstreamer knows about it

To use the plugin in Gstreamer, make sure that gstreamer can find it:


And call the plugin like this - we’re constructing a pipeline consisting of an element which plays a file (or stream), converts the video, our plugin, and a sink (which does nothing with the data):

gst-launch-1.0 uridecodebin uri=file:///test.mp4 ! videoconvert ! gstplugin_py ! fakesink

Of course, once you’ve got the data into python, you can analyse it in many different ways, for example: trying to detect faces (as in the screenshot above) in the video, or transcribing the speech from the audio.

Here's what is going on in the rest of IRFS:

Over in the discovery team, Tim, David and Jakub have continued working on the project with BBC Four.  They have been finalising the designs, working on ‘visual energy’ analysis and producing several demos to present to BBC4.

Chris has been working on Quote Attribution project and completed a Keras-based disambiguator for Mango.

Kristine has demoed her BBC radio station themed Introducing playlists to BBC Sounds.

In the Experiences team: Joanne and Joanna have been evaluating the outputs of the Cars workshop held in the last sprint with members of IRFS, Newslabs and others.

Libby has been wrapping up the Better Radio Experiences work and preparing a blog post to be published soon.

NewNews have started testing their latest prototypes with under-26s in London, developing the prototypes further, and have published a blogpost and video describing what came our of the first phase of this project from the Autumn last year. They are looking forward to a pilot of one of our first prototypes which should be published very soon.

Chris built an MPEG-DASH based Remote Playback API demo, to test whether browser implementations support Remote Playback API in combination with Media Source Extensions. He also gave a presentation to the News Labs team on why and how to go about open sourcing their work. The British Library are using our audio waveform code as part of their Save Our Sounds project, so Chris has been helping them get started.


This post is part of the Internet Research and Future Services section