Speakerthon: uploading voice samples from the Radio 4 archive to Wikipedia
Posted by Zillah Watson, Michael Smethurst on , last updated
An invite
On Saturday, 18th January 2014 between 10am and 5pm we're teaming up with the Open Knowledge Foundation, Creative Commons UK and the Wikimedia community to host an open event in the Media Cafe, New Broadcasting House, Portland Place, London (map here). Attendees will be given access to the Radio 4 permanent audio archive, the tools to take samples of voice recordings and the opportunity to upload them to Wikimedia Commons for inclusion into Wikipedia.
If you'd like to come along you can sign up for the day on EventBrite. The first 25 non-BBC sign ups will be given a free tour of Broadcasting House.
Some background
Back in 2012 Andy Mabbett published a blog post requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons. The request went on to become the Voice Intro Project and some examples can be found here. In September Andy talked about the project to the Wikimedia UK blog:
"The idea is to let Wikipedia readers find out what the people we write about sound like [..] It's great that we can hear the voices of people like Gandhi and Alexander Graham Bell, but what about all the other historic figures, whose voices are lost forever? We shouldn't let that happen when we have the technology and resources so easily available. Sure, some of our subjects are known for media appearances, but those aren't necessarily available globally nor under an open licence."
Andy's original post was spotted by Tristan and passed around R&D. Our first thought was, "we've got lots of voices". Our second thought was, with some adjustment this could be a useful hook for institutions like the BBC and beyond with large, digitised audio archives but sparse metadata and no way to know who's speaking in them.
Generating Linked Open Data from Open Content
As part of the ABC-IP project Yves built a speaker recognition algorithm that scales to large number of speakers, based on the LIUM speaker diarization toolkit. The software is able to recognise voice patterns and identify where the same voice box speaks across a large audio (or video) archive. Unfortunately, it doesn't identify an actual person and doesn't give us a name / identity for the person speaking.
The results can be seen on this episode of From Our Own Correspondent and this aggregation of episodes featuring the voice of Orla Guerin from the World Service archive (you'll need to signed in to see). The names have been provided by users of the archive and are just strings and not identifiers for "things".
The BBC makes extensive use of identifiers from the Wikimedia family (Wikipedia and Wikidata) and related projects (e.g. DBpedia) so it would be better if we could associate voice boxes with Wikipedia, DBpedia or Wikidata concept identifiers. This would allow us to surface programmes about and featuring person X.
As a piece of research we're looking to investigate whether voice samples on Wikimedia/pedia could be used to generate a voice box "fingerprint" which could then be used to identify speakers across a large archive. Which would close the circle of archive audio to speaker recognition to Wikimedia voice fingerprint to Wikipedia, DBpedia or Wikidata identifier to Linked Open Data for speakers in an archive.
To do that we'd need longer (duration) and higher quality samples than suggested by the Voice Intro Project. So we're looking to upload 30-40 second voice samples losslessly encoded as FLAC. We've created a few examples:
Any software we create to do this will be open sourced and (obviously) the voice samples will be openly licenced so other researchers and cultural institutions will be able to use the same methods to annotate audio / video with identified speakers. And hopefully contribute to the project by uploading voice samples from their own archives. By releasing small nuggets of their archives they'd be both improving Wikipedia and putting just enough in place to make the further contextualisation of their (and other) archives possible.
Details of the day
On the day we'll be giving out access to Snippets, an R&D tool built on top of Redux. Snippets gives access to everything broadcast by the BBC since ~2007. For rights reasons we'll only be uploading voice samples from the selection of Radio 4 news and factual programmes with permanent availability. Permanently available Radio 4 programmes are listed below.
Before we meet up it would be good if you could have a listen to some of these programmes and identify interesting people and suitable 30-40 second samples.
If you'd like to come along you can sign up to attend here. Please bring along a laptop and some headphones. Food and drink will be provided.
Permanently available Radio 4 programmes
Series
- A Good Read
- A Point of View
- All in the Mind
- Analysis
- Any Questions?
- Best of Four Thought
- Beyond Belief
- Beyond Westminster
- Bookclub
- Bringing Up Britain
- Broadcasting House
- China: as History is My Witness
- Costing the Earth
- Crossing Continents
- Cultural Exchange
- Decision Time
- Desert Island Discs
- Document
- Does Science Need the People?
- Face the Facts
- File on 4
- Fixing Broken Banking
- Food and Farming Awards
- Four Thought
- From Our Own Correspondent
- Front Row
- Frontiers
- Generation E
- Generations Apart
- Great Lives
- In Alistair Cooke's Footsteps
- In Business
- In Living Memory
- In Our Time
- In Pursuit of the Ridiculous
- In Touch
- Inside Health
- Inside the Ethics Committee
- iPM
- Journey of a Lifetime
- Key Matters
- Last Word
- Law in action
- Leader Conference
- Lives in a Landscape
- Living World
- Making Tracks
- Mastertapes
- Material World
- Meeting Myself Coming Back
- Midweek
- Mind Changers
- Money Box
- Money Box Live
- More or Less
- Nature
- News Review of the Year
- No Triumph, No Tragedy
- On Your Farm
- One to One
- Open Book
- Open Country
- PM
- Profile
- Ramblings
- Reith Lectures
- Saturday Live (including Inheritance Tracks)
- Saturday Review
- Saving Species
- Six o'Clock News
- Start the Week
- Stephanomics
- Sunday
- The Alien Birds Have Landed
- The Bottom Line
- The Call
- The Digital Human
- The EU Debate
- The Film Programme
- The Food Programme
- The Forum
- The House I Grew Up In
- The Infinite Monkey Cage
- The Invention of Spain
- The Life Scientific
- The Long View
- The Media Show
- The Moral Maze
- The New Elizabethans
- The Philosopher's Arms
- The Report
- The reunion
- The Spanish Ambassador's Suitcase: Stories from the Diplomatic Bag
- The Week in Westminster
- The World Tonight
- Things we forgot to remember
- Thinking Allowed
- Today (not permanently available on bbc.co.uk but acceptable for Speakerthon)
- Weekend Woman's Hour
- What's the Point of…
- World at One
- World this Weekend
- Witness
- Woman's Hour
- Word of Mouth
- You and Yours
One-off documentaries and short series
- 1913 - The Year Before
- 2012 - the End of Time
- 30 Years of the Bradshaws
- A Guide to Garden Wildlife
- A Menace to Society
- A Natural History of Me!
- A Room with a View: the Artist's Studio
- A Scottish Hotel in the Holy Land
- A Trip around Mars with Kevin Fong
- Absinthe Makes the Art Grow Fonder
- After Saddam
- Alice's Restaurant
- An Operating Manual for Spaceship Earth
- Analysing the Child Sex Offender
- And Calm of Mind
- And no Birds Sing: Rachel Carson and Silent Spring
- Archive on 4 - From Donald Winnicott to the Naughty Step
- Archive on 4 - Spoken Like a Woman
- Archive on 4: Writers and Radio
- Arthur in the Underworld
- Baroque in Britain
- Battle for the Airwaves
- Ben Goldacre's Bad Evidence
- Blackout Ballet
- Bridging the Gulf
- Care to be a Nurse?
- Climategate Revisited
- Constant Cravings: Does Food Addiction Exist?
- Creating Pitch-Perfect
- Crossing the Bay
- Daughters from Afar
- Decontaminating Halabja
- Deeds Not Words
- Disability: a New History
- Do I Have a Right to be Forgotten?
- Earworms
- Ebony: Black on White on Black
- Egypt's Challenge
- Ella in Berlin
- Europe Moves East
- Feel the Chant: the Brit Funk Story
- Football's Home Fans
- Forgetting a Revolutionary: Lawrence Durrell at 100
- Freedom Pass Special
- From Worcester with Love
- Grahame Dangerfield: Back to the Serengeti
- Grayson on His Bike
- Hallucination
- Happy Days: the Children of the Stones
- How Iraq Changed the World
- How to Have a Good Death
- How You Pay for the City
- Hunt/Lauda
- HV Morton: Travelling into the Light
- Hy-Brasil
- In Godzilla's Footsteps
- In Pursuit of Spring
- In Search of Originality
- In Search of the British Dream
- Inside Science
- Inside the Bonus Culture
- It's Fun but is it Theatre?
- Journeys Down My Street
- Just So Science
- Knowing Me, Knowing Autism
- Lady Gaga v heavy Metal: the Confusing World of Pop in Indonesia
- Land of the Rising Sums
- Lenin in Letchworth
- Letters from Germany
- Letting Out the Light
- Living with Lady T
- Mad Houses
- Making News
- Malala's Diary
- Margaret Thatcher: Potency and Paradox
- Marseille 2013
- Mother Tongue Interference
- Northern Ireland: Who Are We Now?
- Oblique Strategies
- On the Borderline
- On the French Fringe
- On the Trail of the American Honeybee
- One Billion Digitally Identified Indians
- One Man's War
- Open Air
- Open Sesame
- Our Language in Your Hands
- Out of the Ordinary
- Pension Off the Old Lady
- Phelophepa
- Playing Ping Pong with Henry Miller
- Poor Reporting
- Pop-Up Economics
- Pop-Up Ideas
- Postcode Profiling: Winners and Losers
- Privacy Under Pressure
- Prosperity Gospel
- Putting the Black Country on the Map
- Quarter Life Crisis
- Recycled Radio
- Reflections
- Remembering James Bulgar
- Return to Japan
- Rhymes of Passion
- Richard Wagner - Power, Sex and Revolution
- Roger, the Eagle Has Landed
- Sexual Nature: a Brief Natural History of Sex
- Shared Planet
- Shot in Belfast
- Sid James: Not Just a Dirty Laugh
- Solar Max
- South Africa Spits Back
- Speculating the Emerald Isle
- State of Play
- Stories from the Squeezed Middle
- Swimming through Chocolate
- Tax Avoidance: the Hidden Cost
- Technicolour
- The Actor's Gang
- The Art of Sequencing
- The Art of the Foreign Minister
- The Arthur Cravan Memorial Society
- The Beat Hotel
- The Big B at 70
- The Bishop and the Bankers
- The Blonde Women of India
- The Butterfly Effect
- The Concrete and the Divine
- The Cultural Exchange
- The Deprofessionals
- The Flower Fields
- The Forgotten Black Cowboys
- The Gaza Surf Club
- The Goddess of English
- The Hackers
- The Human Zoo
- The Listeners
- The Man Who Made Scotland
- The Meaning of Liff at 30
- The Most Troubled Families in Britain
- The Outsourced
- The Pedant's Progress: an Intimate History of the Arts Scholar
- The People's Thatcher
- The Physicist's Guide to the Orchestra
- The Science of Music
- The Search of the perfect Office
- The Secret Power of Trees
- The State of Welfare
- The Story of the Talmud
- The Truth and Nothing but the Truth
- The Unsent Letters of Erik Satie
- The Value of Culture
- The World Cup for Writers
- Tim Key and Gogol's Overcoat
- To Russia with Jung
- Trade-Plating round Britain
- Train Hopping in the USA
- Turkey: the New Ottomans
- UK Confidential
- Vertical Farming
- Walking on Planet C
- Was Dracula Irish?
- Was Gertrude Stein Any Good?
- What Does Ed Miliband Really Think?
- What Thatcher Did Next
- What's in a Name?
- When Washington Came to Brum
- Where Did All the Comrades Go?
- Who's the Pest?