Fast and efficient video compression is vital for the BBC, and we’ve written previously about how and why BBC Research & Development is using Machine Learning (ML) to optimise this process.
As part of this research, we are also investigating Rate-Distortion Optimisation (RDO) techniques for estimating the quality of a delivered video frame given a specific number of bits, using Convolutional Neural Networks (CNNs).
In addition to compressing video in as few bits as possible, the challenge of video encoding is to perform compression to provide the best possible viewing experience. This process gives the best results if it can estimate what the quality of a delivered frame would be, given a specific number of bits. This estimate can be achieved using Rate-Distortion Optimisation (RDO) techniques, so that we can select appropriate compression parameters. This enables delivery of the highest quality content for a given bit rate. To help streamline the process, we are investigating a novel approach based on Machine Learning.
What we're doing
Our aim is to achieve an efficient RDO design by focusing on the most bit-consuming part of the video: intra-predicted frames. Here, the compression comes from reducing redundant spatial information (neighbouring pixels in a frame which are typically similar). Compression of other frames, i.e. inter-predicted (motion compensated) frames, typically results in fewer bits than intra-prediction as neighbouring frames are very similar.
We have focused on addressing the challenge of deciding how many bits to spend on coding a given frame. This helps us to deliver the best video quality for a given bandwidth, without an excessive reduction of bits that could result in unnecessarily poor picture quality.
We are using machine learning that involves neural networks to estimate the number of bits needed to represent a compressed frame, and associated frame quality at that compression level. Typically, these values would only be known once a frame is encoded. Our approach aims to achieve faster speeds by estimating this for multiple compression (RDO) parameters, without actual encoding.
More specifically, our estimation of RDO parameters is achieved using two CNNs. CNNs have become increasingly popular in recent years for their performance in tasks such as video classification, segmentation and super-resolution. In our method, one CNN is used to estimate the number of bits, and the other is used to estimate the distortion (reduction in quality) that would be obtained after compressing an intra-frame.
The first CNN (CNN #1) takes an original frame as input and estimates how many bits are needed to save it at a certain compression setting (i.e. quality level, defined by Quantisation Parameter, QP). The second CNN (CNN #2) takes the same frame and produces estimated distortion maps, i.e. the pixel-wise difference between the original frame and compressed frame.
With this approach, the estimated results are close to those which would be achieved with real encoding, thanks to accurate predictions enabled by the CNNs. Overall, this means that CNN-based estimation can help video compression in choosing the best compression parameters.
The open-source software is now avaliable in the BBC GitHub:
More details about our approach can be seen in the paper Estimation of Rate-Control Parameters for Video Coding using CNN, presented at the IEEE International Conference on Visual Communications and Image Processing (VCIP 2018).
Our initial results demonstrate that new Machine Learning algorithms can be used to create advanced video coding tools. We are actively working towards further optimisations by applying ML to discover new compression solutions, especially those that enable better prediction of pixels. We are also researching ML and AI tools that are interpretable, explainable and predictable, that will allow us to create robust and simpler visual data processing solutions.
This work was co-supported by the EPSRC, through an iCASE studentship in collaboration with the School of Electronic Engineering and Computer Science of Queen Mary University of London.
This post is part of the Distribution Core Technologies section