Pickin' up good vibrations
One of the universal appeals of music lies in its mysterious ability to manipulate and reflect our emotions. Even the simplest of tunes can evoke strong feelings of joy, fear, anger, sadness and anything in between. Music is a huge part of what the BBC does - in fact it broadcasts over 200,000 different tracks every week. With so much music to choose from, especially in the digital age, there is more and more interest in finding ways of navigating music collections in a more human way. Some of our colleagues are looking at new ways of finding TV programmes by mood, but can we do something similar for music?
The alliteratively-named 'Making Musical Moods Metadata' is a collaborative project between BBC R&D, Queen Mary University of London (QMUL) and I Like Music. Part of the project involves researching how information about the mood of music tracks can be added to large collections. I Like Music is a company that provides the BBC with an online music library called the 'Desktop Jukebox', which includes over a million songs. Labelling each of these by hand would take many years, so we are developing software that will do it automatically.
As you can imagine, getting a computer to understand human emotions has its challenges - three, in fact. The first one is how to numerically define mood. This is a complicated task as not only do people disagree on the mood of a track, but music often expresses a combination of emotions. Over the years, researchers have come up with various models, notably Hevner's clusters which define eight mood categories, and Russell's circumplex model, which represents mood as a point on a two-dimensional plane. Both approaches have their drawbacks, so our partners at QMUL Centre for Digital Music are developing a model which combines the strengths of both. The model will be based on earlier research conducted on the emotional similarity of common keywords.
Russell's circumplex model. [image source]
The next challenge is processing the raw digital music into a format that the computer can handle. This should be a small set of numbers that represent what a track sounds like. They are created by running the music through a set of algorithms, each of which produce an array of numbers called 'features'. These features represent different properties of the music, such as the tempo and what key it's written in. They also include statistics about the frequencies, loudness and rhythm of the music. The trick lies in finding the right set of features that describe all the properties of music that are important for expressing emotion.
Now for the final challenge. We need to find out exactly how the properties of the music work together to produce different emotions. Even the smartest musicologists struggle with this question, so - rather lazily - we're leaving it to the computer to work it out.
Machine learning is a method of getting a computer to 'learn' how two things are related by analysing lots of real-life examples. In this case, it is looking at the relationship between musical features and mood. There are a number of algorithms we could use, but initially we are using the popular 'support vector machine' (SVM) which has been shown to work for this task and can handle both linear and non-linear relationships.
For the learning stage to be successful, the computer will need to be 'trained' using thousands of songs that have accompanying information about the mood of each track. This kind of collection is very hard to come across, and researchers often struggle to find appropriate data sets. Not only that, but the music should cover a wide range of musical styles, moods and instrumentation.
Although the Desktop Jukebox is mostly composed of commercial music tracks, it also houses a huge collection of what is known as 'production music'. This is music that has been recorded using session artists, and so is wholly owned by the music publishers who get paid each time the tracks are used. This business model means that they are keen to make their music easy to find and search, so every track is hand-labelled with lots of useful information.
Through our project partners at I Like Music, we obtained over 128,000 production music tracks to use in our research. The tracks, which are sourced from over 80 different labels, include music from every genre.
The average production music track is described by 40 keywords, of which 16 describe the genre, 12 describe the mood and 5 describe the instrumentation. Over 36,000 different keywords are used to describe the music, the top 100 of which are shown in the tag cloud below. Interestingly, about a third of the keywords only appear once, including such gems as 'kangaroove', 'kazoogaloo', 'pogo-inducing' and 'hyper-bongo'.
A tag cloud of the top 100 keywords used to describe production music. The more common the keyword, the larger the font size.
Drawing a mood map
In order to investigate how useful the keywords are in describing emotion and mood, the relationships between the words were analysed. The way we did this was to calculate the co-occurrence of keyword pairs - that is, how often a pair of words appear together in the description of a music track. The conjecture was that words which appear together often have similar meanings.
We found that the keywords arranged themselves into a logical pattern, where negative emotions were on the left and positive emotions on the right, with energetic emotions on top and lethargic emotions on the bottom. This roughly fits Russell's arousal-valence plane, suggesting that this model may be a suitable way to describe moods in the production music library, however more research is required before a model is chosen.
We have been working with the University of Manchester to extract features from over 128,000 production music files using the N8 cluster. Once that is work is complete, we will be able to start training and testing musical mood classifiers which can automatically label music tracks. Watch this space for updates, and hopefully a working online demo.