At a loss with data-rate reduction?

Andy Jones describes how video and audio are compressed using predictive algorithms

Data-rate reduction, or 'compression', uses a large number of processes to reduce the need to send data for pictures and sound. Compression is used to encode media in a way that adapts it for storage or transmission, and is often built into codecs.

For pictures perhaps the most powerful technique is prediction – designing a decoder that can make a 'guess' at the next part of the video so that the encoder only has to send corrections to this guess. If it’s a good guess, the corrections will be small. Prediction works because there is a large amount of self-similarity, particularly in video.

Screenshot of a typical news programme

Look at the picture above; pixels don’t tend to be hugely different from those around them in large chunks of the image, and even then changes from one pixel to the other are often gradual. Many decoders exploit this using a mathematical technique called a transform, which can result in a smaller set of numbers that describe the smooth way that pixels vary in an area of the picture – rather than sending the absolute value of every single pixel in every single picture every single time. Many codecs divide the picture into blocks of pixels for this transform process. If the amount of data-rate reduction is increased (a large 'compression ratio') the smooth variation from one pixel to another won’t be perfectly reproduced by the codec and these blocks can become visible – one of the undesirable 'artefacts' of compression.

Another predictive technique that works well for video, stems from the fact that it’s comprised of a set of still pictures sent one after the other. The current picture is often very similar to the one that immediately precedes it, so the coder only needs to send data to describe the areas that have altered – the newsreader’s lips perhaps, or an additional area of background that is revealed as they move their head. This technique is enhanced dramatically if the codec employs 'motion vectors', meaning that it sends a description of the direction that blocks of pixels are moving. This allows the decoder to make a better guess at what the next picture looks like, assembling it from the previous blocks, each moved by the appropriate amount. It still won’t be a perfect guess, but it means that the coder only needs to send data to correct the guess, or prediction, that the decoder has made.

Prediction is a less successful strategy for sound. Data-rate reduction in audio often involves splitting the sound up into different frequency bands – chunks of sound of similar pitch – and uses a technique which aims to determine what the ear will be able to hear. A loud sound at a particular pitch may drown out a quieter sound at a similar pitch, so the decoder may decide to send the quieter element at reduced quality, or eliminate it all together. Leaving out sounds that are deemed to be below the threshold of hearing can make for a large reduction in data-rate, with little loss of perceived quality.

The amount of data-rate reduction depends on parameters set by the user, usually the person who sets up the coder part of the codec and originates the media file or stream. For television a reduction in data of up to one third can be completely 'lossless' – absolutely no difference in the picture. Most practical implementations are 'lossy' however, allowing some reduction in quality in return for a much larger reduction in data-rate. Reducing to a fiftieth of the original data-rate is quite achievable. Implementations of codecs and containers designed for streaming, like the one built into iPlayer, can allow for automatic variation of the bit-rate as the communication's channel quality, the speed of domestic internet connection and the number of people sharing it, varies.

Many other tools are used to reduce the amount of data to be transmitted; identifying sequences of numbers that occur frequently for instance, and replacing these with a shorter number or sequence. I’ve described some of these concepts and processes loosely. In fact they have to be defined very accurately in order to be used to write an algorithm – the very precise process description necessary to build a codec chip or computer programme. The International Standards Organisation (ISO) and the Motion Picture Experts Group (MPEG) have defined a number of popular codecs, each one of which has multiple levels of usage and many possible implementations. The description of their MPEG-4 standard, used for HDTV in many countries, would fill a large bookshelf if it were printed.

Andy Jones is a principal technologist at the BBC Academy