HTML 5 and timed media
HTML 5 workHTML 5 is the next version of HTML, the markup language used on the web. Not all the details of the HTML 5 standard have been agreed, but many of the proposed changes and new features have already been implemented in existing browsers.
As part of our work on the P2P-Next project, we built a simple HTML5 demo that works in current versions of Firefox, Safari and Chrome: a sample of RAD's R&D TV with subtitles and chapter navigation. This will not work in current versions of Internet Explorer, nor earlier versions of Firefox etc.
What we built
Below we give more detail of the coding work done so far.
On a web page it is sometimes desirable to make something happen relative to the current time of a video or audio track.
For example, subtitles change as a video plays.
YouTube Annotations adds something more dynamic: speech bubbles, notes and linked areas that can be set to appear in specific locations on a video, at specific times. This enables quite complex interactivity and navigation - in YouTube's example, a video about World War I leads to other videos about individual subjects such as tanks and aircraft, with extra notes that pop up as appropriate.
There are several open technologies available, or in development, for adding timed text (subtitles), but not more complex page changes and interaction. Annodex and parts of the Axmedis project go further in developing ways to integrate timed markup for annotation and indexing with streaming media. Annodex uses the CMML markup language, which can be either incorporated with, or referenced from, a stream.
The SMIL presentation markup language provides rich features for timed interactive media, and has been in use for more than 10 years, but it still has some limitations:
- SMIL can be played in several media players, but browser support is patchy and the future of SMIL browser support is unclear
- SMIL uses a presentation paradigm, and each presentation needs to be complete in itself, whereas our 'engine' needs to be event-driven, able to cope with individual live page changes, with viewers either joining a presentation part way through 'broadcast' or viewing the entire presentation after it has been published
- existing SMIL implementations are oriented to elapsed (clock) time and user input events, rather than synchronising events with a 'time parent' such as the current time of a video2.
- build a list of page-change event objects, parsed from incoming JSON
- activate or deactivate each event, given its start and end time, relative to the current time of the event's time parent, which in our demonstrations is a video).
The HTML 5 audio and video elements remove the need for player plugins, work like any other HTML element in terms of styling and positioning, and standardise the programming interface for playback control. Less well known is that these elements emit a timeupdate event (at a frequency adjusted to fit available processing and memory) which removes the need to poll a player for the current time position. This makes media scripting far more efficient, since there is no need to run a loop or use setTimeout. In tests run on several machines we found that timeupdate events are emitted regularly and frequently (particularly in Firefox), whereas polling a media player for current video time is unreliable.
The HTML 5 addCueRange method will provide native support for callbacks (or events) at the start and end of a 'cue range'. However the specification is still under discussion and does not appear close to implementation.
HTML 5 video in the field
HTML 5 media elements are now supported by current versions of Firefox, Safari and Chrome.
However, the implementation of the video element has been dominated by the need to standardise codec support, in the same way that browsers currently support the JPEG, PNG and GIF image formats.
Google demonstrated their commitment to both the MP4 (H.264/AAC) and Ogg (Theora/Vorbis) with the HTML 5 media element at the 2009 Google I/O conference. Firefox 3.5 and Safari both support the video element, though for different codecs. Dailymotion has created a demo site with several hundred thousand videos encoded using Ogg Theora.
The biggest and least predictable change may come from technologies such as Comet or HTML 5 Web Sockets. These enable data to be 'pushed' to browsers from servers rather than vice versa. Push makes sense, in that enables updates without polling, but it challenges the HTTP request/response model used on the web and raises a number of security and editorial questions. In terms of our application, events could be pushed to the client browser as they became available. For example, a web page showing a broadcast from a live event could include live updated information about the festival or the band and song currently playing.
We've done this work in conjunction with the P2P-Next project. We're using HTML5 in the project to sync video and display extra information about the content - it's early days for us on this and there are a number of serious challenges before this becomes anything near mainstream - if ever. We hope it's a useful demo and look forward to feedback.