Posted by Chris Baume on , last updated

Posted on behalf of Adib Mehrabi, PhD researcher at Queen Mary University of London, who conducted this work during his industrial placement.

When a podcast is made from a broadcast radio show, licensing restrictions state that only 30 seconds of each song can be used if it is made available for download. This means that often music is left out of the podcasts. When the music is included an editor or producer needs to manually edit each song, taking up valuable production time. To help solve this problem we investigated how music can be automatically edited to create a 30 second 'snippet' of the original song using a number of different approaches.

Examples of three different approaches used to automatically create a 30 second version of a song. Left: Intro and Outro, Middle: Intro, Chorus and Outro, Right: Many 1 bar sections.

The research was designed to answer two questions:

  1. What parts of a song do people want to hear in a 30 second clip?
  2. Can an algorithm automatically create a 30 second version of a song as well as a human?  

We conducted an online listening test where participants were asked to compare 30 second song clips made using a range of manual and automatic editing methods. The editing methods differed mainly in the parts of the song they included, and whether the edits were made in 'musical' places such as on bars or beats.

As part of the study, participants were asked to listen to different clips of the same song and give a 1-5 rating for the editing quality of 1) The transitions, 2) The selection of song parts and 3) Overall quality. They were also asked to select the clip most suitable for use in a podcast and most representative of the song. 120 participants completed the study over a week.

The results showed that there was a significant preference for particular editing methods. Clips that included lots of small sections of the song, and those that did not consider the musical content such as bar and beat positions were rated significantly lower than the others. Of the higher rated clips there were no significant differences. These included the clips with just intro and outro sections and those containing the chorus, both manually and automatically created.

From these results we cannot identify whether people prefer to hear the chorus or not in a 30 second snippet of a song, but we can conclude that the automatic editing methods performed at least as well as the manual methods. This is an encouraging result, and we are currently working to develop this technology and integrate it into the podcast publication system.