W4A 2012-Federico-Furini_AutomaticCaptioning

14
Enhancing Learning Accessibility through Fully Automatic Captioning Maria Federico Marco Furini Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia W4A 2012 Lyon, April 17 2012

description

 

Transcript of W4A 2012-Federico-Furini_AutomaticCaptioning

Page 1: W4A 2012-Federico-Furini_AutomaticCaptioning

Enhancing Learning Accessibility through

Fully Automatic Captioning Maria Federico Marco Furini

Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia

W4A 2012 Lyon, April 17 2012

Page 2: W4A 2012-Federico-Furini_AutomaticCaptioning

The traditional learning scenario

VideoAudio

Disabled students:• Hearing-impaired• Dyslexic• Motion impaired

remoteclassroom

Able-bodied students

Traditional solutions:- Sign interpreters- Stenographers- Student note takers- Respeaking

Page 3: W4A 2012-Federico-Furini_AutomaticCaptioning

An accessible learning scenario

VideoAudio

Disabled students:• Hearing-impaired• Dyslexic• Motion impairedAble-bodied students

OUR SYSTEM

VideoAudio

Textual transcript

remoteclassroom

Automatic speech transcriptionAutomatic speech transcription

Page 4: W4A 2012-Federico-Furini_AutomaticCaptioning

System Architecture

Architecture for the automatic production of video lesson captions based on Automatic speech recognition (ASR) technologies A novel caption alignment mechanism that:

Introduces unique audio markups into the audio stream before transcription by an ASR

Transforms the plain transcript produced by the ASR into a timecoded transcript

Page 5: W4A 2012-Federico-Furini_AutomaticCaptioning

Markup Insertion

1. Identification of silence periods (i.e., when the speaker does not speak)

2. Insertion of a unique markup periodically in silence periods

It is important to find resonable values for silence length and minimum distance between two consecutive markups in order to have no truncated words in transcript and enough timing information

Page 6: W4A 2012-Federico-Furini_AutomaticCaptioning

Speech2text Transcription of the audio stream coupled with unique

markup into plain text (including the textual form of the markup)

Any existing automatic speech recognition technology can be used

In the system prototype we used Dragon NaturallySpeaking Support for Italian language Availability of speech-to-text transcription from digital audio file Easy access to product High accuracy (99% for dictation)

Page 7: W4A 2012-Federico-Furini_AutomaticCaptioning

Caption AlignmentSpeech2text produced plain transcript

Transcript with timestamps

Timing information about wheremarkups have been insertedby the Markup Insertion Module

Page 8: W4A 2012-Federico-Furini_AutomaticCaptioning

Caption Alignment Existing solutions:

1. Alignment of manual transcript with video

2. ASR runs twice

Our solution: Automatic: based on audio analysis Efficient: ASR runs just one time Technology transparent: any ASR can be used

High computational environment

Page 9: W4A 2012-Federico-Furini_AutomaticCaptioning

Experimental study

Different Computer Science and Linguistics Professors of the Communication Sciences degree of the University of Modena and Reggio Emilia teaching in front of a live audience

To tune the parameters used to locate the positions where to insert audio markups

To find the most appropriate hardware (microphone) and software (ASR) products to build the recording scenario

To investigate the transcription accuracy

Page 10: W4A 2012-Federico-Furini_AutomaticCaptioning

Transcription accuracy

The higher the values of silence length and minimum markup distance are, the better the accuracy is, but these parameters affect the length of the produced captions

Minimum Markup Distance (sec)

Page 11: W4A 2012-Federico-Furini_AutomaticCaptioning

Caption lengthDesktop threshold = 375 char, ARIAL font family, 16 pt

The higher the values of silence length and minimum markup distance are, the longer the captions are

Page 12: W4A 2012-Federico-Furini_AutomaticCaptioning

System Prototype

1024x80

Page 13: W4A 2012-Federico-Furini_AutomaticCaptioning

Conclusions

VideoAudio

Disabled students:• Hearing-impaired• Dyslexic• Motion impairedAble-bodied students

OUR SYSTEM

VideoAudio

Textual transcript

remoteclassroom

Automatic

Efficient

Technology transparent

Page 14: W4A 2012-Federico-Furini_AutomaticCaptioning

Contacts

Supported by

Servizio Accoglienza Studenti Disabili

University of Modena and Reggio Emilia

Further information:

Maria Federico, Ph.D.Maria Federico, Ph.D.

[email protected]