Nepali Speech Recognition

36
Supervisor Dr. Basanta Joshi Aavaas Gajurel (068/BCT/501) Anup Pokhrel (068/BCT/505) Manish K. Sharma (068/BCT/523)

Transcript of Nepali Speech Recognition

Page 1: Nepali Speech Recognition

SupervisorDr. Basanta Joshi

Aavaas Gajurel (068/BCT/501)Anup Pokhrel (068/BCT/505)

Manish K. Sharma (068/BCT/523)

Page 2: Nepali Speech Recognition

System Overview

Page 3: Nepali Speech Recognition

System Block Diagram - TrainingNoise

ReductionSplit Module (VAD Based) Training Set

MFCC Features Train HMM

Page 4: Nepali Speech Recognition

System Block Diagram - Recognition

Audio Input Noise Reduction

Split Module (VAD Based)

MFCC Computation

HMM

Audio ClassifierLanguage Model Output

Page 5: Nepali Speech Recognition

SYSTEM DESIGN METHODOLOGY

Page 6: Nepali Speech Recognition

NOISE REDUCTION

Page 7: Nepali Speech Recognition

Creating Noise Profile

BUILD NOISE PROFILE

Update the computed Noise Profile

AVERAGE OVER TIME1๐‘ [๐‘†๐‘ข๐‘š๐‘œ๐‘“ ๐น๐น๐‘‡๐ถ๐‘œ๐‘š๐‘๐‘œ๐‘›๐‘’๐‘›๐‘ก>10 ๐‘“๐‘Ÿ๐‘Ž๐‘š๐‘’๐‘  ]

FOURIER TRANSFORMFFT of 32ms Audio Samples

Page 8: Nepali Speech Recognition

Spectral Subtraction

INVERSE FOURIER TRANSFORM

Rebuild the Signal

SUBTRACT NOISE PROFILE (STATIC AND MUSICAL)Over Subtraction Short Segment Removal

FOURIER TRANSFORM OF SIGNALFFT of 32ms Audio Samples

Page 9: Nepali Speech Recognition

After Spectral Subtraction After Musical Noise Removal

Before Noise Removal Spectral Subtraction output

Page 10: Nepali Speech Recognition

VOICE ACTIVITY DETECTION

Page 11: Nepali Speech Recognition

Voice Activity Detection

Page 12: Nepali Speech Recognition

Voice Activity Detection Process I

CALCULATE THE TRIGGER

๐‘ก๐‘ค=๐œ‡+๐›ผ ๐›ฟ๐‘ค

COMPUTE mean AND variance

SAMPLE10 Frame Sampling

Page 13: Nepali Speech Recognition

Voice Activity Detection Process II

CLASSIFY

If greater than trigger then voice

COMPUTE CLASSIFICATION MEASURE

READ THE SAMPLERead the frame

๐‘Š ๐‘ 1 (๐‘š )=๐‘ƒ ๐‘ 1(๐‘š)(1โˆ’๐‘๐‘  1 (๐‘š ) )๐‘†๐‘

Page 14: Nepali Speech Recognition

Feature Extraction

Page 15: Nepali Speech Recognition

Audio Feature Extraction

Page 16: Nepali Speech Recognition

Feature Extraction Process I

APPLY MEL FILTERBANK Multiply Filterbank(20-40) by Periodogram Estimate

CALCULATE PERIODOGRAM ESTIMATE๐‘ƒ ๐‘– (๐‘˜ )= 1

๐‘ ยฟ๐‘†๐‘– (๐‘˜)โˆจยฟ2ยฟ

FRAMINGDivide Audio into Sections of 20ms-40ms

Page 17: Nepali Speech Recognition

Feature Extraction Process II

KEEP REQUIRED COEFFICIENTSKeep Required Number of Coefficients

DISCRETE COSINE TRANSFORM OF ENERGIESTake DCT of Coefficients of Above Step

SCALINGTake Logarithm of Filterbank Energies

Page 18: Nepali Speech Recognition

Language Model

Page 19: Nepali Speech Recognition

Using Language Model

Page 20: Nepali Speech Recognition

Language Model Training

Page 21: Nepali Speech Recognition

Language Model Based Classification

SELECT BEST

๐‘ƒ (๐‘Š ๐‘–|๐‘Š ๐‘–โˆ’ 1 )=๐œ†1 (๐‘ƒ (๐‘Š๐‘›|๐‘Š๐‘›โˆ’1 ) )+๐œ†2๐‘ƒ (๐‘Š ๐‘›)

GET POSSIBLE CANDIDATESFrom Acoustic Model

READ PREVIOUS WORD

Page 22: Nepali Speech Recognition

ACOUSTIC MODEL

Page 23: Nepali Speech Recognition

HMM Based Classification

Page 24: Nepali Speech Recognition

Training the Acoustic Model

TRAIN USING BAUM WELCH ALGORITHM

SELECT HMM MODEL

READ MFCC COEFFICIENTS AND WORD

Page 25: Nepali Speech Recognition

Using the Acoustic Model

OUTPUT WORD CORRESPONDING TO MODEL

SELECT MODEL WITH MAXIMUM PROBABILITY

FIND LOG PROBABILITY OF WORD FOR EACH MODEL

READ MFCC COEFFICIENTS OF WORD

Page 26: Nepali Speech Recognition

RESULTS

Page 27: Nepali Speech Recognition

Trained vs. Untrained Input

โ€ข 3 Speakers โ€ข 5X10 Words Eachโ€ข 5 Testing Set Each

Accuracy of System0

10

20

30

40

50

60

70

80

90

100

86.67

66.67

Using Trained and Untrained Input

Trained Set Untrained Set

Page 28: Nepali Speech Recognition

Noise Reduced vs. Not Noise Reduced

โ€ข 3 Speakers โ€ข 5X10 Words Eachโ€ข Untrained Input Files for Testingโ€ข 5 Testing Set Each

Accuracy of System0

10

20

30

40

50

60

70

80

46.67

66.67

Effect of Noise Reduction

Noise Not Reduced Noise Reduced

Page 29: Nepali Speech Recognition

Gender Based Results

โ€ข 7 Speakers โ€ข 3 Females, 4 Malesโ€ข Animal Names as Testโ€ข Untrained Input Files for Testing

Female Voice Training Male Voice Training Female and Male Voice Training

0

10

20

30

40

50

60

70

36

6459

66

44

5651

5458

Gender Based Result

Male Female Male + Female

Page 30: Nepali Speech Recognition

LIMITATIONS AND RECOMMENDATIONS

Page 31: Nepali Speech Recognition

Limitations

Limited Vocabulary

User Specific Noise Profiles

Static MFCC Coefficients Only

Training Data Storage Absent

Non-Continuous Recognition

Page 32: Nepali Speech Recognition

Recommendations

Using Dynamic Coefficients

Continuous HMM Model

Extensive Training

Better Phonemic Modeling

Dynamic Noise Modeling

Page 33: Nepali Speech Recognition

USAGE SCENARIO

Page 34: Nepali Speech Recognition

Usage Scenario I

Easy Nepali Input

Automated Telecom Assistance

Speech Controlled Interface

Automated Transcribing

Page 35: Nepali Speech Recognition

Usage Scenario II

Military Sector for Automated Wire Tapping

Public Guidance System

Automated User Support (banks, corporate houses,etc.)

Page 36: Nepali Speech Recognition

Thank You !