Automatic transcription of video files sig media

Automatic transcription of video files Carlos Turró Universitat Politecnica de Valencia

Transcript of Automatic transcription of video files sig media

Page 1: Automatic transcription of video files   sig media

Automatic transcription of video

filesCarlos Turró

Universitat Politecnica de Valencia

Page 2: Automatic transcription of video files   sig media

Agenda• Why automatic transcription• State of the art: The transLectures project• Automatic transcription of Lecture Recordings: The Opencast Project• Notes & the near future

Page 3: Automatic transcription of video files   sig media

Why automatic transcription of video files?• Accessibility

Page 4: Automatic transcription of video files   sig media

Why automatic transcription of video files?• Accessibility

• Searching into a video file• Searching into a video repository• Topic identification• …and much more

Page 5: Automatic transcription of video files   sig media

The transLectures project• Development of an engine for Automated Speech Recognition (ASR) for

lectures & educational content• Development of translation tools for that content

• Implementation• Case studies: Videolectures.NET & Polimedia (UPV video repository)• Real-life evaluation• Integration into Opencast


Page 6: Automatic transcription of video files   sig media

transLectures partners

12 Nov 2013

Name Country

1 Universitat Politècnica de València (MLLP) Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK

36 Months

November 2014

Page 7: Automatic transcription of video files   sig media

Statistical transcription (and translation)

Acustic Model



Sound ASR Engine

Page 8: Automatic transcription of video files   sig media

Statistical transcription (and translation)

Acustic Model


Manually transcriptedvoice Modeling Engine

Page 9: Automatic transcription of video files   sig media

Architecture of TransLecturesLecture

Language Model




Intelligent interaction

Transcription Translation

Page 10: Automatic transcription of video files   sig media


12 Nov 2013 10

• Transcription (ASR)• EN • SL • ES

• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Page 11: Automatic transcription of video files   sig media

Transcription and Translation Platform

Page 12: Automatic transcription of video files   sig media

Transcription and Translation Platform API

Page 13: Automatic transcription of video files   sig media

Transcription and Translation Platform• Post-editing web interface (in HTML5)

Page 14: Automatic transcription of video files   sig media

Example video•

Page 15: Automatic transcription of video files   sig media

Scientifical Evaluations• WER = Word Error Ratio

• The lower the better

• Usually, a human transcriptor has a WER around 12

Page 16: Automatic transcription of video files   sig media

Beyond transLectures

Page 17: Automatic transcription of video files   sig media

Beyond transLecturesWER

Language M10 M17Dutch 25.7 24.5Italian 21.2 17.7Portuguese 45.9 43.0Spanish 15.9 14.4Estonian N/A 27.1French N/A 22.7

Page 18: Automatic transcription of video files   sig media

Beyond transLectures

Page 19: Automatic transcription of video files   sig media

The Opencast Community is…Universities, companies and people:• concerned with academic video• attracted to the Opencast values of openly exchanging ideas,

experience, knowledge and code• committed to building and maintaining a robust, flexible, high-quality

open source lecture capture and academic video management solution.

Now also part of

Page 20: Automatic transcription of video files   sig media

Full-featured Lecture Recording ecosystem

Page 21: Automatic transcription of video files   sig media

Who uses Opencast?Around the world, with strong adoption in Europe especially.

43 Adopters with public information (May 2014)

30+ commercial partner clients

Page 22: Automatic transcription of video files   sig media

Yesterday’s tweet

Page 23: Automatic transcription of video files   sig media

Indexing in Opencast• Opencast has built-in OCR indexing capabilities

Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search server

• New operations can be addedVideo (slides) -> transcription (tL) -> Apache Lucene search serverorVideo (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache Lucene search server

Page 24: Automatic transcription of video files   sig media

Why do I need an indexing server?• Powerful, Accurate and Efficient Search Algorithms

• ranked searching -- best results returned first• many powerful query types: phrase queries, wildcard queries, proximity

queries, range queries and more• fielded searching (e.g. title, author, contents)• sorting by any field• multiple-index searching with merged results• allows simultaneous update and searching• flexible faceting, highlighting, joins and result grouping• fast, memory-efficient and typo-tolerant suggesters

Page 25: Automatic transcription of video files   sig media

Demo on searching•

Page 26: Automatic transcription of video files   sig media

Notes & the near future• ASR Technology is enough good for automated transcription of videos

… with enough good sound

• There are lecture recording systems that enables to plug transcriptions for searching

…like Opencast

• There are already things to solve• Transcription speed (in good progress)• Topic indentification• Adding more languages

Page 27: Automatic transcription of video files   sig media


Page 28: Automatic transcription of video files   sig media

Learning more ….transLectures

Video in a multilingual context (EMMA)

Opencast State of the Project