The Hitchhiker's Guide to Encoding: And Another Test...(Or PSNR and all that...)
This bit is all maths and I make no apologies for it! It is one of the methods used to evaluate the effect processing has on signals and ultimately picture quality. PSNR is a derivative of the Signal to Noise Ratio comparing the maximum possible signal energy to the noise energy .
PSNR has been shown to have a high correlation to subjective picture quality (eyeballs) when a single codec is used and cross-references between sequences are not made .
Equation 1- the Mean Square Error (MSE) of the mth frame is calculated . Yin and Yout represent the luminance of the input signal from the play out server and output from the encoder respectively, and Y(I,j,m) is the luminance value of the pixel in position (I,j) in the mth frame.
Equation 2 - the PSNR of the mth frame is calculated . B is the number of bits per sample used in representing the video. The test procedure uses 8-bit linear pulse code quantisation.
In accordance with industry recommendations, only the luminance PSNR is measured . Typical values for the luminance PSNR for emission encoding are between 30 and 40dB.
This is capped to a maximum figure because an 8 bit system cannot accurately represent the original, analogue video image. In practice, the industry recommendation uses a cap of 50 dB , almost 10dB lower than the theoretical maximum. Above 50dB the quality of the coded image is more than sufficient for all but the most critical applications.
The median PSNR is the value of the 50th percentile of the individual frame PSNRs of a sequence listed in ascending order. The accepted critical value for this type of measurement (as used by the Moving Picture Experts Group (MPEG) in deciding if a toolset should be included in an implementation) is 0.5dB.
0.5dB represents a visible difference in picture quality across the range of PSNR values. As the PSNR increases and coding errors become less visible, the visibility threshold increases above 0.5dB.
-  L. Hanzo, P. Cherriman, and J. Streit, Wireless Video Communications - Second to Third Generation Systems and Beyond, ser. Digital and Mobile Communication. 3 Park Avenue, New York, NY, USA: The Institution of Electrical and Electronic Engineering Press, 2001.
-  Q. Huynh-Thu and M. Ghanbari, Scope of validity of PSNR in image/video quality assessment, IET Electronics Letters, vol. 44, no. 13, pp. 800-801, June 2008.
- Objective perceptual multimedia video quality measurement in the presence of a full reference, International Telecommunications Union Telecommunication Standardization Sector - Pre-published Recommendation J.247, August 2008
PSNR of the current encoder setting compared to the old encoder.
The material used to test the encoders was a selection from the EBU test sequence and clips from the BBC HD Channel promotion.
All test material is copied to the playout server (100Mbs MPEG2 I-frame coding) and then onto the transmission encoder. The Final Cut Pro computer is used as a store for the transmission decoded material.
The results are displayed as curves on a cumulative graph. The x-axis is the measured PSNR and the Y-axis indicates the percentage of frames with a PSNR value less than or equal to that PSNR:
The new encoder has a median PSNR figure 0.5dB greater then the old, a just noticeably improvement in perceived picture quality for the majority of the test sequences. However the very easiest, least critical material, where coding artefacts are usually not visible, coded with a better PSNR on the Old Encoder. We are looking into this at the moment but one explanation could be the new encoder handles image noise differently to the old encoder.
Reading PSNR curves is not straight forward. A difference of about 0.3dB is just visible to an expert viewer at normal viewing distance while a non-expert viewer will see a difference of 0.5dB or more, as mentioned in the last paragraph of the PSNR explanation.
Where differences occur in the curve is important, at the lower end (the further left you go) the more critical the measurement. At normal viewing distance a non-expert may see a difference in quality for a change of 0.5dB or slightly less. At the far right of the curve the picture quality is much higher and differences are more difficult to see so an expert may not see a difference under 0.5dB and a non-expert may not see any difference below around 0.75dB or even 1dB.
We were aware of a problem with mixes before the new encoder went into action. During tests it only appeared in certain modes and wasn't severe. The overall improvement in quality outweighed the degradation it caused.
Unfortunately one of the first live programmes to be transmitted was also a programme that would highlight the mix/fade problem.
The Match of the Day, West Bromwich Albion vs. Newcastle United game kicked off with a very high contrast change almost dead centre of the pitch. As the game moved from bright sun to deep shadow the cameras had to be racked over several stops (opening and closing the iris).
A mix as you know is a transition between two different images. Coding errors caused by the mix tend to be hidden by the changing images however racking a camera is actually a mix between two different brightness levels of the same image so there's no where for the errors to hide and they become very visible. I apologised and explained we were applying a temporary fix.
Although the temporary fix is still in place we have now seen an update that improves mixes, fades and lighting changes and are just waiting for it to be incorporated into a software upgrade.
While we had the location recordings of the match to analyse the mix error, we had a chance to compare the PSNR curves through the new and old encoders:
For the majority of the sequence, the new encoder has a higher PSNR than the old with a median increase of about 0.2 dB (not a noticeable difference). The old encoder is better for approximately 8% of easy to encode scenes and 1% of difficult to encode scenes, but this is most likely due to the camera racking i.e. the mix/fade issue itself!
PSNR testing shows the new encoder is doing better than the old except where the source material has a significant amount of noise. To help this we are testing the encoder's noise reduction options to see if adding a small amount improves the look of noisy images. I will update the blog as soon as we have some results.
Subjective expert viewer evaluation
The second part of the testing process is all about looking at pictures. We use 42" plasma and LCD displays to do this, comparing the quality of the new and old encoders against the original material on the play-out server.
Expert viewing is a tricky business and as one of our experts discovered a risky one too! It involves watching the same set of images again and again and again and...
To minimise the risk of complete insanity, it is usually better not to have the audio on.
However even this didn't prevent someone coming in to work one day and asking to be taken off picture evaluation for a while. He said was on the train just looking at the country side passing by when he was convinced he saw compression blocking in the leaves of trees. This is not something you want to happen - so be warned!
Evaluating picture quality this way means a long time spent in darkened rooms. We watched a lot of images from the EBU test material and the BBC HD promo tape, comparing the new encoder with the old encoder and the play-out server on each sequence. It is important to have the EBU standard sequences to judge picture quality but we also have a test sequence made up from material that has known problems and shots that are difficult to code, to test the encoders to the "limit".
The new encoder produces images that correlate quite closely to the PSNR results. Programmes with low or no noise are noticeably better than they were on the old encoder. However, where the original images have noise we can see it on the new encoder's output but not on the old, suggesting that the new encoder is attempting to pass on more of the original image and confirming that a bit of noise reduction should be tested.
Dark pictures are inherently noisy, either because there is gain in the camera or the signal has been stretched too far in colour grading. We actually have a very noisy sequence that has too much camera gain and was stretched too far in post production. We used it during the tests to push the system, and even turning the bit rate up to just over 16Mbs made no difference to the image. We are trying a few new and different parameters that seem to improve noise handling and reduce the effect on screen. Again I will keep you posted.
Tomorrow is the last part of this epic, I will look at some of the techniques programme makers use that can have an impact on perceived picture quality.
Andy Quested is Principal Technologist, HD, BBC Future Media and Technology.
- Read part 1 of Andy Quested's HD guide: The Hitchhiker's Guide to Encoding: Before we start
- Read part 2: The Hitchhiker's Guide to Encoding: Life, Encoders and Everything (Or a brief history of HD encoding)
- Part 3: The Hitchhiker's Guide to Encoding: So Many Tests, and Thanks for All the Recommendations (Or the BBC and the EBU)
- Part 4: The Hitchhiker's Guide to Encoding: Mostly Testing (Or how to set up an encoder test)