Developing better ways of visually representing audio content to aid navigation
Project from - present
What we're doing
We are combining digital audio analysis and human-computer interaction techniques to improve the way producers navigate audio recordings. Traditional audio visualization methods are poorly suited to real-world tasks, so we are creating new visual representations designed for a production environment.
Why it matters
Editing audio is a very common task at the BBC, and is often performed by a wide variety of people with different levels of skills and training. Many editing tasks can be unnecessarily tedious and time-consuming, such as finding a certain point in a long recording or removing exclamations (like 'umm' or 'err') from speech.
Audio waveforms have long been the de facto method of visually representing audio content. Despite their ubiquity, they display very little relevant information and are not well-suited towards most audio editing tasks. For instance, it is almost impossible to distinguish between people's voices with a waveform (known as speaker diarization), even if they sound very different.
With recent advances in audio analysis and data visualization techniques, it it now possible to develop audio visualizations which better reflect what people are hearing so that producers can navigate and edit with ease, and be more productive.
We aim to develop new visualization methods for a number of common tasks, such as speech/music discrimination and speaker diarization. In order to achieve this, we have created a software framework for generating audio visualizations and developed an online system for evaluating how well they perform for a given task. After developing and evaluating a number of prototypes, we hope to integrate the new visualizations into a live production environment so their impact can be realized and measured.
How it works
There are already a number of algorithms that can be used for some common production tasks. Traditionally the computer performs these tasks on its own, but sometimes it gets things wrong. This project follows a different approach of having a 'human in the loop'. The computer analyses the data and presents the user with all of the information they need, in a format they can easily interpret. The user can then perform the task without having to do the tedious work.
The challenge of this project is to identify exactly what the users need to know for a given task, designing algorithms to extract that information and to work out the best way to present that information to the user.
This project is part of the Immersive and Interactive Content section
This project is part of the Audio Research work stream