Audio separation for broadcasting
Improving the perceived quality of individual sounds separated from a mixture.
What we're doing
We have compiled a selection of broadcast audio and mixed it with common interfering sounds. These mixtures are then separated by a number of different techniques in order to assess the quality of separation. A good source separation technique will remove the interfering sound while preserving the quality of the target audio. Our findings will enable work to be steered towards developing systems that provide higher quality separations.
More project info
Why it matters
There are a number of scenarios in which audio separation can be a valuable tool for broadcasting. Within content production, it may be necessary to remove excessive background noise from a location recording. Within the home, audience members may wish to adjust the balance between speech and background music in a broadcast. In the future, it may be desirable to take apart old audio recordings and restructure them for a new sound format.
While there are other ways around some of these problems (recordings can be recaptured, and music and speech can be broadcast as separate files) being able to separate audio from a mixture allows the required flexibility without a significant change to current practices or the duplication of effort, and does so in a way that makes efficient use of computation power, transmission bandwidth and the time of BBC staff.
A number of methods for sound source separation have been suggested in academic literature but few produce audio of the quality necessary for broadcast.
- To establish the current state of the art in audio separation.
- To assess the perceptual quality of audio separated by current techniques.
- To find methods of improving the perceptual quality of separated audio.
- To establish whether these improvements produce audio of broadcast quality.
How it works
We are using a time-frequency approach to audio separation. At a series of frequencies at given points in time we measure how much target and interferer signal is present and use this information to calculate a time-frequency mask — a series of multipliers each mapped to a time-frequency point. The mask is applied to the mixture audio to recover the separated audio.
The way the mask is calculated has a large effect on the quality of the separated audio. There is often a trade-off between the removing more interfering audio and preserving more of the quality of the target.
Both the time-frequency approach and the metrics we are using to assess quality are perceptually motivated; they seek to emulate, to some extent, the way humans experience sound.
Initial experimental work was presented at the 134th Audio Engineering Society convention in Rome in May 2013, and demonstrated at the BBC Audio Research Partnership Showcase at MediaCity in September 2013.