High Frame Rate at the EBU UHDTV Voices and Choices Workshop
Members of BBC R&D recently attended the European Broadcasting Union's UHDTV: Voices & Choices workshop. This was an open event that gave public and private sector broadcasters, professional and consumer equipment manufacturers and movie makers the opportunity to meet and discuss the requirements for ultra-high definition (UHD) digital video, and share expectations about whether and when UHD might be adopted on different platforms and in different countries around the world. BBC R&D's Katy Noland presented recent work that asks how many frames per second are needed for UHD television.
The proposed standard for UHD-1, defined in the ITU's BT.2020, has 3840x2160 pixels. That's twice the resolution of high definition (HD) in both the horizontal and vertical dimensions, so it promises fantastic image detail that's suitable for very large displays. However, as Andrew Cotton explained in a blog post last June, UHD is not only about spatial resolution, it is also a potential opportunity improve on other aspects of quality, such as the number of light levels (dynamic range), the range of different colours (colour gamut) and the immersiveness of the audio (with object coding or multichannel surround).
One factor that can make a particularly big difference to the perceived quality is the frame rate. BBC R&D presented pioneering work on higher frame rates in 2008, and more recent subjective tests from the EBU's Broadcast Technology Futures group support their findings that, for the majority of viewers, increasing the frame rate will provide a greater subjective improvement over HD than increasing spatial resolution alone.
But if we would like to have a higher frame rate, how high should it be? There are many practical considerations to take into account such as flicker caused by beating between the frame rate and lighting frequencies, and the need to convert between formats, as well as the sheer amount of data that would need to be transported. Before addressing those issues, though, it would be useful to know what an idealised upper limit on the frame rate might be. I turned to measured data about the human visual system to try to find an answer.
Before diving into the analysis it's important to understand the different types of motion artefact that can affect the video quality. Large area flicker is the result of the whole display being refreshed at the frame rate, which, according to studies presented in the ITU's BT.2246, fuses to continuous light at around 80 Hz (at the brightness and screen size tested). Then there's motion blur and strobing, which can be traded against each other. Motion blur results from the camera shutter being kept open for a fixed amount of time, and, much like when taking a photograph, if an object moves during that time it appears smeared in the image to an extent that depends on how fast it was moving. If the shutter time is shortened, the individual images become sharper, but when played back as a video sequence a strobing effect can occur. Strobing may look like juddery motion or multiple imaging. Further studies presented in BT.2246 suggest that the frame rates needed for high quality motion without blurring or strobing are much higher than those needed to eliminate large area flicker.
Matters are made more complicated by the fact that the perceptibility of motion blur and strobing varies depending on whether a moving object is being followed by the viewer's eyes—we call this eye tracking. During eye tracking, motion blur is particularly visible, but strobing is not too serious. For untracked objects, such as football players running in several different directions, background scenery during a pan, or anything rotating, sensitivity to motion blur is reduced but strobing becomes much more visible.
The analysis I presented at the EBU looked at how many frames per second would be needed to reduce motion blur to a level that matches the low degree of spatial blur in UHD-1, assuming that all strobing is eliminated, firstly for untracked motion and then for tracked. The analysis was based on a model of the human contrast sensitivity function, which tells us about how perceptible blur is for patterns with different amounts of detail moving at different speeds. The more detail in an object, and the faster it moves, the harder it is to resolve the detail.
In order to decide which spatial frequencies at which velocities the system needs to be able to represent, I used the finest spatial detail representable by the system as a reference point. It is determined by the number of horizontal pixels, and works out as a little under 30 cycles per degree when a UHD-1 screen is viewed from 1.5 times the screen height (1.5H). This is equivalent to just under one pixel per minute of arc, which was measured as the limit of visual acuity in earlier work by BBC R&D. For a static pattern, the velocity is zero, so a static pattern of the finest possible spatial detail corresponds to a point on the left edge of the contrast sensitivity function plot shown above, just where the black line starts.
I then proposed setting the frame rate so that any point can be represented that has equal or higher contrast sensitivity compared to a static pattern of the highest possible resolution. The black line in the plot represents all the points with equal contrast sensitivity, so everything behind and to its left, coloured red through to cyan, should be representable by the system. Points in front of the line, coloured dark blue, were dismissed because we can't see them as clearly as the static pattern in real life anyway. The frame rate required to represent all the desired points can then be calculated using the velocity and spatial frequency, and was taken as the required value to perceptually match the spatial resolution.
Although I have concentrated on UHD-1, this perceptual matching approach can be used for any screen resolution. For a screen whose spatial resolution exceeds that of the human visual system, we will make motion blur imperceptible. For a display that does not quite match the human visual system, some motion blur can be tolerated, but it must be no worse than any spatial blur. Full details of the analysis will be published soon, with a link from my BBC R&D profile page.
So on to the key results, which show that to match the degree of motion blur to the spatial resolution for non-tracked motion, a frame rate of around 140 frames per second (fps) is needed for UHD-1, assuming the viewer is sitting at the recommended viewing distance of 1.5H. For tracked motion, where we are much more sensitive to blur, around 700 fps is needed.
Now, 700 fps is clearly not a practicable solution for today's hardware, and it's important to remember that the analysis has assumed zero-tolerance to strobing, so the only sampling artefact is blur. However, it's likely that such a strict limitation on strobing is not required.
We know that if the shutter time is shortened, we get less motion blur, at the expense of some strobing, but the strobing is mainly problematic only in non-tracked motion. That means that to get good quality tracked motion it's possible to reduce the blur using a very short shutter, and we will hardly see the strobing, so we may only need to worry about meeting the critical flicker frequency of 80 fps.
The problem comes with strobing in non-tracked motion that may be present in the same shot. A key question, that as yet we have no answer for, is how does the visibility of strobing in non-tracked motion vary with frame rate? An answer to this question would allow us to find a suitable balance of blur and strobing, and may well lead to a revised ideal frame rate that is significantly lower than 700 fps. It would of course also be useful to better understand the visibility of strobing in tracked motion, though experience tells us that this is much less critical.
So from this work we find that it should be possible to match spatial and motion blur using a frame rate of 700 fps if strobing is eliminated, but we would like to see more experiments on the visibility of strobing artefacts to allow the best balance of blur and strobing at a lower frame rate to be found. It's also essential to support any conclusions with subjective tests—given the complexities of the human visual system this kind of analytical approach can never be completely water-tight. Whilst I have not presented a final answer for the required frame rate, or addressed the many additional practical considerations involved in defining a television system, I hope that this work will serve as useful evidence when it comes to understanding what we see, and in interpreting those all-important subjective test results.