The optimisation of ALICE code Federico Carminati January 19, 2012 1.
W4A 2012-Federico-Furini_AutomaticCaptioning
description
Transcript of W4A 2012-Federico-Furini_AutomaticCaptioning
Enhancing Learning Accessibility through
Fully Automatic Captioning Maria Federico Marco Furini
Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia
W4A 2012 Lyon, April 17 2012
The traditional learning scenario
VideoAudio
Disabled students:• Hearing-impaired• Dyslexic• Motion impaired
remoteclassroom
Able-bodied students
Traditional solutions:- Sign interpreters- Stenographers- Student note takers- Respeaking
An accessible learning scenario
VideoAudio
Disabled students:• Hearing-impaired• Dyslexic• Motion impairedAble-bodied students
OUR SYSTEM
VideoAudio
Textual transcript
remoteclassroom
Automatic speech transcriptionAutomatic speech transcription
System Architecture
Architecture for the automatic production of video lesson captions based on Automatic speech recognition (ASR) technologies A novel caption alignment mechanism that:
Introduces unique audio markups into the audio stream before transcription by an ASR
Transforms the plain transcript produced by the ASR into a timecoded transcript
Markup Insertion
1. Identification of silence periods (i.e., when the speaker does not speak)
2. Insertion of a unique markup periodically in silence periods
It is important to find resonable values for silence length and minimum distance between two consecutive markups in order to have no truncated words in transcript and enough timing information
Speech2text Transcription of the audio stream coupled with unique
markup into plain text (including the textual form of the markup)
Any existing automatic speech recognition technology can be used
In the system prototype we used Dragon NaturallySpeaking Support for Italian language Availability of speech-to-text transcription from digital audio file Easy access to product High accuracy (99% for dictation)
Caption AlignmentSpeech2text produced plain transcript
Transcript with timestamps
Timing information about wheremarkups have been insertedby the Markup Insertion Module
Caption Alignment Existing solutions:
1. Alignment of manual transcript with video
2. ASR runs twice
Our solution: Automatic: based on audio analysis Efficient: ASR runs just one time Technology transparent: any ASR can be used
High computational environment
Experimental study
Different Computer Science and Linguistics Professors of the Communication Sciences degree of the University of Modena and Reggio Emilia teaching in front of a live audience
To tune the parameters used to locate the positions where to insert audio markups
To find the most appropriate hardware (microphone) and software (ASR) products to build the recording scenario
To investigate the transcription accuracy
Transcription accuracy
The higher the values of silence length and minimum markup distance are, the better the accuracy is, but these parameters affect the length of the produced captions
Minimum Markup Distance (sec)
Caption lengthDesktop threshold = 375 char, ARIAL font family, 16 pt
The higher the values of silence length and minimum markup distance are, the longer the captions are
System Prototype
1024x80
Conclusions
VideoAudio
Disabled students:• Hearing-impaired• Dyslexic• Motion impairedAble-bodied students
OUR SYSTEM
VideoAudio
Textual transcript
remoteclassroom
Automatic
Efficient
Technology transparent
Contacts
Supported by
Servizio Accoglienza Studenti Disabili
University of Modena and Reggio Emilia
Further information:
Maria Federico, Ph.D.Maria Federico, Ph.D.