Explore a live event using video embedded in a 3D model of the venue
A collaborative project developing tools to combine live video with pre-generated 3D models of an area, to give the viewer a better understanding of large-scale events such as Wimbledon.
What we've done
In the coverage of large-scale live events such as Wimbledon or the London Marathon, it can be difficult to convey the ‘big picture’ of the event purely from camera imagery, as traditional TV simply provides a series of windows onto the scene rather than a broad overview. Coverage of events such as the London Marathon sometimes includes computer-generated fly-throughs to show the area, but without any way to illustrate how the view from a live camera corresponds to a region of the 3D model.
VSAR (Viewers Situational and Spatial Awareness for Applied Risk and Reasoning) was a 3.5-year collaborative project, funded by the Technology Strategy Board, which ran from November 2007 to October 2011. Its aim was to create tools to take live information and images gathered from multiple cameras and embed them in a 3D modelled environment.
Our work focused on applications in large-scale outside broadcasts, culminating in a successful live trial during the Wimbledon tennis coverage in 2011. We generated virtual ‘flights’ between cameras on different courts, blending seamlessly between the video and the 3D model, which were used when the broadcast programme switched its coverage between matches. We also developed tools to allow interactive navigation using a web browser.
Other partners in the project looked at applications in security and surveillance, such as overlaying footage from an array of CCTV cameras onto a 3D model of a busy town centre. This work included the extraction of data relevant for security monitoring, such as identifying people by the way they walked, known as gait analysis.
Interactive visualisation of live events using 3D models and video textures
Image-based camera tracking for Athletics
More project info
Our main developments in the project were as follows:
We developed methods to estimate the pan, tilt and zoom of a camera by tracking background features. This extended our previous work on line-based tracking (as used in the Piero sports graphics system) so that camera movement could be tracked in arbitrary scenes and registered with the 3D background model. This work was described in more detail in White Paper 181.
We developed real-time methods for segmenting people from video, to allow people to be extracted from live video feeds and placed into the 3D model. This included the development of a person detector based on the Histogram of Oriented Gradients (HOG) method.
We developed a web-browser plug-in that allowed a viewer to navigate around the scene themselves, seeing live video embedded into a 3D model of the area. We investigated two approaches: rendering the 3D model locally in the web browser, and pre-rendering a number of ‘fly throughs’ which where then played as required with live video embedded. This work was presented in the paper: B. A. Weir, G. A. Thomas, “Interactive visualisation of live events using 3D models and video textures”, presented at IBC 2010.
We also developed a real-time system to produce broadcast-quality images, using pre-rendered fly-throughs for use at the production side. The system, known as VenueVu, was trialled live on air in 2011 during the coverage of Wimbledon, in collaboration with BBC Sport, SIS LIVE, and Crystal CG and described in a blog post. The pre-rendered video included an alpha mask to allow the rendered model to occlude the video where needed. Camera parameters (position, orientation, zoom) were stored for each frame of the pre-rendered sequence, and also generated live from the video feeds. The renderer used these parameters together with geometry information describing the basic scene structure (ground plane, walls) to draw the video in the correct location with the right perspective.