Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic...
Transcript of Automatic Speech Recognition for Dialectal ArabicAutomatic Speech Recognition for Dialectal Arabic...
Automatic Speech Recognition for
Dialectal Arabic
PhD Examination
Mohamed Ahmed ElmahdyAugust 26, 2011
Dialogue Systems Group
Institute of Information Technology
University of Ulm, Germany
Page 2
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Largest still living Semitic language
250+ million native speakers
Arabic
Formal Dialectal
Modern Standard Arabic (MSA)
Standardized
A lot of ASR research
Not used in everyday life
Used in everyday life
Not standardized (mainly spoken)
Many different dialects
Very few ASR research
Significant differences between MSA and Dialectal Arabic
Considered as completely different languages
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Arabic Language
Page 3
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
MSA Versus Dialectal Arabic
Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect
Phonological
/t/, /s/ in ECA instead of /T/ in MSAe.g. /tala:tah/ (three) in ECA versus /Tala:Tah/ in MSA
Lexical
/t‘ArAbE:zA/ (table) in ECA versus /t‘awila/ in MSA
Syntactic
SVO in ECA versus VSO in MSA
Page 4
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Automatic Speech Recognition
High level diagram for a state-of-the-art ASR system
FeatureExtraction
Speech
DecoderFeatures
Words
SpeechCorpus
Acoustic Model
LanguageModel
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
For dialectal Arabic, sparse and low quality speech corpora are available
)()|(maxarg^
WPWOPWLW
)|( WOP )(WP
^
W
Training
O
Page 5
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Thesis Objectives
Acoustic modeling for dialectal Arabic where little speech data exists
To benefit from existing MSA speech data to improve dialectal Arabic speech
recognition
Acoustic modeling for dialectal Arabic where phonetic transcription is not
possible
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 6
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Outline
Introduction
Previous work
Approaches
Experiments and results
Conclusions and future directions
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 7
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Previous Work
Data Pooling Acoustic Modeling
A speech data pool of MSA and Egyptian Colloquial Arabic (ECA) is used to train the acoustic model
By adding more MSA data, less contribution of ECA data
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Data Pool
ECAcorpus
MSAcorpus
Training
Training
Kirchhoff, K., and Vergyri, D. (2005) Cross-Dialectal Data Sharing For Acoustic Modeling in Arabic Speech Recognition. Speech Communication, vol. 46(1), pp. 37-51
~3% Relative reduction in WER
Baselineacousticmodel
Finalacousticmodel
Page 8
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Outline
Introduction
Previous work
Approaches
Experiments and results
Conclusions and future directions
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 9
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Proposed Approaches for Dialectal Arabic ASR
Phonemic acoustic modeling
→ Dialectal speech data where phonetic transcription is available
Graphemic acoustic modeling
Unsupervised acoustic modeling
Arabic Chat Alphabet-based acoustic modeling
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 10
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonemic Cross-Lingual Acoustic Modeling
Benefit from existing large MSA speech corpora
Assumptions:
MSA is always a 2nd language for any Arabic speaker
Large amount of MSA speech data (large number of speakers) implicitly cover all the acoustic features of the different Arabic dialects
Approach:
Train an acoustic model using a large amount of MSA speech data
Adaptation of the MSA acoustic models with a little amount of dialectal speech data
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 11
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonemic Cross-Lingual Acoustic Modeling (cont.)
State-of-the-art AM adaptation techniques include:
Maximum Likelihood Linear Regression (MLLR)
Maximum A-Posteriori (MAP)
Requirement: adaptation data and the AM have to share the same
language and phoneme set
Egyptian Colloquial Arabic (ECA) is chosen as a typical dialect
INITIALLY: MSA and ECA do not share the same phoneme inventory
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
MSA ECA
)()|(maxarg
POPMAP
bAMLLR
Acoustic model adaptation is not possible
Page 12
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonemic Cross-Lingual Acoustic Modeling (cont.)
SOLUTION: Phoneme sets normalization
AM adaptation is possible
Phoneme sets normalization
Several phone mapping rules are applied
Map ECA phonemes to their origins in MSA
(even if they are acoustically different)
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
MSA
MSA
ECA
ECA
Normalizationphone
mapping rulesECA
MSA
/b/ /g/ /j/ /e/ /i/ /o/ /u/ /t/ …….
/b/ /dZ/ /i/ /u/ /t/ ………
جزر
(carrot) /g/ /A/ /z/ /A/ /r/ /dZ/ /a/ /z/ /a/ /r/
Page 13
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonemic Cross-Lingual Acoustic Modeling (cont.)
Block diagram for the proposed approach
The adapted ECA AM is evaluated against the ECA baseline AM
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
ECA corpus
MSAcorpus
PhonemesNormalization
PhonemesNormalization
NormalizedMSA
corpus
NormalizedECA
corpus
Training MSAacousticmodel
ECA baselinemodel
Training
MLLR adaptation
MAP adaptation
ECA final
model
Page 14
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Proposed Approaches for Dialectal Arabic ASR
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Phonemic acoustic modeling
→ Dialectal speech data where phonetic transcription is available
Graphemic acoustic modeling
→ Phonetic transcription is not possible/difficult
→ Short vowels are missing
→ Phonetic transcription is approximated to be word letters
Unsupervised acoustic modeling
→ Transcriptions are not available at all
→ Dialectal speech was automatically transcribed using a MSA model
Arabic Chat Alphabet-based acoustic modeling
→ Latin letters are used instead of Arabic ones
→ Include short vowels that are missing in traditional Arabic orthography
Page 15
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Outline
Introduction
Previous work
Approaches
Experiments and results
Conclusions and future directions
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 16
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Speech Corpora
Egyptian Colloquial Arabic (ECA) corpus
→ High quality read speech
→ Accurate phonetic transcription
→ Different speech domains to ensure
good acoustic features coverage:
time/date, spelling, restaurant,
numbers, reservation, etc
Levantine Colloquial Arabic (LCA) corpus
→ Spontaneous speech
→ No phonetic transcription
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Modern Standard Arabic (MSA) corpus
→ News broadcast speech
→ Accurate phonetic transcription
Page 17
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonemic Cross-Lingual Adaptation Results
ECA corpus:
→ 65% for training/adaptation
→ 35% for testing
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Word Error Rate (WER)
N
DelInsSubWER
0.00
5.00
10.00
15.00
20.00
25.00
30.00
ECA Phonemic AM
ECA baseline
MSA only
MSA+ECA data pooling
MSA+ECA adaptationWE
R (
%)
41.8% Relative reduction in WER
Page 18
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Effect of MSA Speech Data Amount
Varying the amount of MSA speech data
Effect on phonemic cross-lingual adaptation
0
2
4
6
8
10
12
14
16
18
0.5 1 2 4 8 16 32
MSA+ECA adaptation
MSA speech amount (hours)
WE
R (
%)
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Consistent decrease in WER
Page 19
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Graphemic Cross-Lingual Adaptation Results
Tested on two Arabic dialects:
→ Egyptian Colloquial Arabic (ECA)
→ Levantine Colloquial Arabic (LCA)
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
ECA graphemic AM LCA graphemic AM
Dialect baseline
MSA only
MSA+dialect adaptation
WE
R (
%)
Relative reduction in WER
22.5% for ECA
15.9% for LCA
Page 20
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Outline
Introduction
Previous work
Approaches
Experiments and results
Conclusions and future directions
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 21
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Conclusions and Future Directions
Conclusions
→ Supervised/unsupervised cross-lingual acoustic modeling for dialectal
Arabic
→ Improvements are observed in both phonemic and graphemic modeling
→ Consistent reduction in WER by adding more MSA data
→ The approach can be extended to other dialects e.g. Levantine
→ Novel technique using the Arabic Chat Alphabet for phonetic
transcription
Future directions
→ Extension to all the Arabic dialects
→ Cross-lingual language modeling for dialectal Arabic using existing
MSA text resources
→ Automatic dialect recognition
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 22
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Selected Publications
1. M. Elmahdy, R. Gruhn, S. Abdennadher, and W. Minker. Rapid Phonetic Transcription
using Everyday Life Natural Chat Alphabet Orthography for Dialectal Arabic Speech
Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), Prague, Czech republic, 2011
2. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Cross-Lingual Acoustic Modeling
for Dialectal Arabic Speech Recognition. International Conference on Speech and
Language Processing (Interspeech), Makuhari, Japan, 2010
3. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Effect of Gaussian Densities and
Amount of Training Data on Grapheme-Based Acoustic Modeling for Arabic. IEEE
International Conference on Natural Language Processing and Knowledge Engineering
(IEEE NLP-KE), Dalian, China, 2009
4. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Modern Standard Arabic Based
Multilingual Approach for Dialectal Arabic Speech Recognition. International Symposium on
Natural Language Processing (SNLP), Bankok, Thailand, 2009
5. M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher. Survey on common Arabic
language forms from a speech recognition point of view. International Conference on
Acoustics (NAG-DAGA), Netherlands, 2009
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 23
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Conclusions and Future Directions
Conclusions
→ Supervised/unsupervised cross-lingual acoustic modeling for dialectal
Arabic
→ Improvements are observed in both phonemic and graphemic modeling
→ Consistent reduction in WER by adding more MSA data
→ The approach can be extended to other dialects e.g. Levantine
→ Novel technique using the Arabic Chat Alphabet for phonetic
transcription
Future directions
→ Extension to all the Arabic dialects
→ Cross-lingual language modeling for dialectal Arabic using existing
MSA text resources
→ Automatic dialect recognition
Thank you for your attention
Introduction | Previous work | Approaches | Experiments and results | Conclusions and future directions
Page 24
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Arabic Dialects Classification
Backup slides
Page 25
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonetic Transcription in English
Backup slides
Page 26
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Phonetic Transcription in Arabic
Diacritics (short vowels) are not written (ambiguity)
High OOV rate due to morphological complexity
Backup slides
Page 27
Automatic Speech Recognition for Dialectal Arabic | Mohamed Elmahdy | Germany | Ulm | August 26, 2011
Unsupervised Acoustic Modeling
Backup slides