« Previous | Main | Next »

BBC R&D, the BBC Archive and digital public space: an overview of our work on the archive from preservation to multimedia classifications

Post categories:

John Zubrzycki | 11:51 UK time, Monday, 10 October 2011

The BBC has about a million hours of video and audio content, plus a wealth of documents, including the original scripts. Most of this content is still on magnetic tape, film, records or paper and so needs to be digitised and made searchable before it can be contributed to the Digital Public Space which was the subject of a recent technology podcast from the Guardian. BBC R&D has a long track record of developing innovative technology for the BBC’s archives, including the Ingex digitisation process for D3 videotape [BBC R&D White Paper WHP 155], and Reverse Standards Conversion, which reverse engineers the processes applied by pioneering standards converters of the 1960's to programmes of that era provided to broadcasters abroad and lost from our own archives. Research is continuing to extend the digitisation process to other types of video tape and to develop automated methods to detect picture and sound faults to ensure good quality digitisation and to assist restoration.

Finding interesting content in such a large archive is a key research challenge. If you already know the title, then it is simple, but there is likely to be interesting content that you don’t know about, perhaps programmes you’ve forgotten about or broadcast before you were born. The BBC has a formal catalogue of its archive that has been developed for professional media archivists to find content for professional programme makers. This catalogue uses the LONCLASS classification system, which is very powerful, allowing complex searches for specific items of content. However, this does depend on the content being manually catalogued in detail, which is not always the case for some of the early content in the archive.

In our Multimedia Classification project BBC R&D is researching automated techniques that can allow content to be searched by the general public. Most people will be familiar with the concept of genre, comedy, drama, news, sport, etc. This would be a good starting point for searching the archive, but these classifications are very broad and so could still produce too many suggestions. Also, it wouldn’t help with more unusual searches, such as finding light-hearted drama or the more exciting football matches. The approach we are taking is to analyse the content directly for video and audio features that could indicate its emotional mood. For example, detecting scenes with head and shoulders shots of two people in a brightly lit room suggests that this could be a current affairs or news programme. Adding speech recognition could identify the subject of discussion and detecting any non-voiced sounds may give a clue as to the type of location. Performing this analysis on a scene by scene basis enables searches for specific items, such as extracting the exciting parts of a football match.

Analysing music from content is proving powerful way to find something you like that fits with your mood. Musical Moods is a project with Salford University and the British Science Association (BSA) launched in March 2011 during the BSA‘s National Science and Engineering Week. One of the key aims of this project was to identify what types of mood or emotion, people identify with particular theme tunes. Participants were asked rate the theme tunes on a range of different scales to see if all programmes in the same genre have the same overall mood; scales used included Happy/ Sad, Dramatic/ Calm, Masculine/Feminine, etc. The data obtained is now being use to train computers to identify the mood of a programme from its theme tune.  Sam Davies from my team will go into more detail about our Multimedia Classification project in a post that will be published here tomorrow.

Storing content as digital files in large Petabyte storage systems to produce a permanent archive for long-term preservation of the content brings many new problems to a media archive. Film, records and tape can last many years on shelves in an environmentally controlled facility. However, digital storage systems need to be managed to avoid loss of files due to a variety of storage system failures, accidental deletion or overwriting of files, etc. and the short-term obsolescence of digital storage media. We are researching into the best way to store content together with all the other information associated with it, i.e. its metadata, to create packages that can be managed in the digital archive and exchanged with other archives.

BBC R&D is working together with several major European archives in the PrestoPrime European collaborative project that is developing techniques to manage media files in the digital world. The experiences and recommendations from the project are made publicly available through the PrestoCentre.

The BBC’s ambition to digitise the whole archive is an important enabling goal for the Digital Public Space. It is important to produce digital files at the right quality to ensure that future generations can see the content in pristine condition and so R&D is working on how best to apply video and audio coding to archive content.  We are also working on the automatic analysis of content to detect faults, such as tape drop outs, and on techniques for restoration. This is a challenging research topic that is expected to take several years; however, it has many similarities with the technologies used for automatically searching content.


  • Comment number 1.

    We are researching into the best way to store content.......

    You already answered this (for older content) with Film, records and tape can last many years on shelves in an environmentally controlled facility. The best thing to do as a fallback reserve if the space is available is to keep the originals.

    The same problem will occur as with obsolescent digital storage. Where will the playback machines be found and will there be the spare ICs etc to keep them working?

    Back in the 1980s most of the 35mm archive of British Movietone was telecined to 1" C format tape. The originals were destroyed. On one level this was a logical solution to preserve film that was deteriorating due to the acetate film base decaying (and some nitrate - even worse as dangerous too). Current technology could have handled shrunken and distorted acetate film by the use of contactless scanners so we could now have the full HD potential of the original film. But who would pay for it?

    I also remember some time ago hearing that as modern polyester based filmstocks are so stable it may be that digital media could be kept on film (think black and white dots!) with a resulting shelf life of centuries rather than decades. Shame about the lack of instant access!

    There is a lot of work to do to ensure that the dawning of the digital age does not become a future dark ages due to the loss of all our media. We have books going back millennia that have survived just by being on a shelf in the right place. I can pick one up and "view" it. What chance for our digital media?

  • Comment number 2.

    I consider this project of the upmost importance. We should be willing as a society to preserve our creative pursuits, they are the soul of humanity. It is somewhat describable as an obsession of mine.

    In the next hundred years we will have mastered the decryption of the data in a cell (probably in the next twenty). We will have digitalised a human mind (seriously). We will have transplanted one human brain into another body (successfully). We will have the ability to know what someone is thinking (visual and audio - we're already getting impressive detail with that through fMRI), forget Podcasts we'll be hooked on Dreamcasts (erm, again). We will be able to interact with our computers without all this typing nonsense comfortably (I hope).

    We are already digital beings. The first thing you do is to ask Google any question you have... we communicate and express emotion through digital means. We have entwined our identities within the digital world. It only stands to reason that we will attempt to fully intergrate ourselves, even if just for military application having a human mind in direct control of a device is a massive benefit.

    Imagine a biological, self repairing, fully solar powered (through photosynthesis), intelligent storage system that can actually think. It becomes feesible to reconstruct audible and visual data from donor cells... if we even need to be that invasive (and we most likely do not)... if anyone ever watched it or heard it then they can contribute a piece to the puzzle.

    We already mapped the entire planet, why not a mind?

    If only the world will allow the transfer of a human mind to a digital format (plausible, difficult, but in development) we could really start to achieve some incredible things. If run at maximum potential we would have one year of time in thirty two seconds (relative, of course). Sleep, eating, et cetera - all biological imperatives, a program ran in tandum with your digital mind could satisfy those impulses. You would have over sixty years of thought (with instant and perfect recollection) in less than an hour. You can copy digital information.

    I could explore the possibilities of that potential future but I won't as I have veered off course. The notion is breath-takingly beautiful though if you allow yourself to be inspired rather than worried. But what good is the future if we lose sight of who we are, who we were and who we thought we would become?

    I only hope to see the day...

    In this probable future our digital media/knowledge is as safe as houses (wait, I retract that). Just have a little patience.

    Grayson King


More from this blog...

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.