A new way of editing audio and video content using transcripts
Project from - present
What we're doing
Discourse is a new way of editing audio and video content. It automatically generates transcripts from recordings of speech, then allows producers to navigate and edit the transcripts using a word-processor-style interface. Any edits made to the text are automatically made to the audio and video. This tool makes editing faster, easier, cheaper and more creative.
Why it matters
Producers of audio and video content often find it easier to make editorial decisions using a transcript of their recording. Currently, transcripts are created by the producer themselves or by contracting a third-party service. Discourse uses the latest speech-to-text technology to automate this process, making it significantly cheaper and faster.
Once they have used a transcript to decide how to edit their recording, the producer must then manually make each of the edits. Audio and video editing software uses waveforms, film-strips and timecode to help the producer navigate the recording, but this task is slow and tedious. Discourse allows producers to make their edits directly using the transcript, so removes the need for this step.
How it works
Discourse uses a speech-to-text engine to generate transcripts with precise timestamps for each word. Producers then use a web-based interface to select which of these words they want to include or exclude in their programme, and they can preview their edits directly in the browser. When they are finished, they can export these as an edit decision list (EDL) which they can open in their normal editing software. This allows them to make any final tweaks before finishing the programme.
A formal study of Discourse found that it is up to twice as fast as the current editing workflow, even when you exclude the time spent on transcription. The tool is currently being developed into a product which will shortly be made available internally within the BBC.
This project is part of the Immersive and Interactive Content section
This project is part of the Audio Research work stream