Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor
description
Transcript of Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor
![Page 1: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/1.jpg)
MIT Lincoln Laboratory
Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell
Processor
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
{nmalyska, smohindra, karen.lauro, reynolds, kepner}@ll.mit.edu
This work is sponsored by the United States Air Force under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
![Page 2: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/2.jpg)
MIT Lincoln Laboratory
Outline
• Introduction
• Recognition for speech applications using GMMs
• Parallel implementation of the GMM
• Performance model
• Conclusions and future work
![Page 3: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/3.jpg)
MIT Lincoln Laboratory
IntroductionAutomatic Recognition Systems
• In this presentation, we will discuss technology that can be applied to different kinds of recognition systems
– Language recognition– Dialect recognition– Speaker recognition
Who is the speaker?
What dialect arethey using?
What language arethey speaking?
![Page 4: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/4.jpg)
MIT Lincoln Laboratory
IntroductionThe Scale Challenge
• Speech processing problems are often described as one person interacting with a single computer system and receiving a response
![Page 5: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/5.jpg)
MIT Lincoln Laboratory
IntroductionThe Scale Challenge
• Real speech applications, however, often involve data from multiple talkers and use multiple networked multicore machines
– Interactive voice response systems– Voice portals– Large corpus evaluations with hundreds of hours of data
InformationAbout Speaker,
Dialect, or Language
![Page 6: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/6.jpg)
MIT Lincoln Laboratory
IntroductionThe Computational Challenge
• Speech-processing algorithms are computationally expensive
• Large amounts of data need to be available for these applications– Must cache required data efficiently so that it is quickly available
• Algorithms must be parallelized to maximize throughput– Conventional approaches focus on parallel solutions over multiple
networked computers– Existing packages not optimized for high-performance-per-watt
machines with multiple cores, required in embedded systems with power, thermal, and size constraints
– Want highly-responsive “real-time” systems in many applications, including in embedded systems
![Page 7: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/7.jpg)
MIT Lincoln Laboratory
Outline
• Introduction
• Recognition for speech applications using GMMs
• Parallel implementation of the GMM
• Performance model
• Conclusions and future work
![Page 8: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/8.jpg)
MIT Lincoln Laboratory
Recognition SystemsSummary
• A modern language, dialect, or speaker recognition system is composed of two main stages
– Front-end processing– Pattern recognition
• We will show how a speech signal is processed by modern recognition systems
– Focus on a recognition technology called Gaussian mixture models
Speech Front EndPattern
Recognition
Decision on theidentity, dialect,
or language of speaker
![Page 9: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/9.jpg)
MIT Lincoln Laboratory
Recognition SystemsFrame-Based Processing
• The first step in modern speech systems is to convert incoming speech samples into frames
• A typical frame rate for a speech stream is 100 frames per second
Speech FramesSpeech Samples
…
Frame NumberTime
![Page 10: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/10.jpg)
MIT Lincoln Laboratory
Recognition SystemsFrame-Based Processing
• The first step in modern speech systems is to convert incoming speech samples into frames
• A typical frame rate for a speech stream is 100 frames per second
Speech FramesSpeech Samples
…
Frame NumberTime
![Page 11: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/11.jpg)
MIT Lincoln Laboratory
Recognition SystemsFrame-Based Processing
• The first step in modern speech systems is to convert incoming speech samples into frames
• A typical frame rate for a speech stream is 100 frames per second
Speech FramesSpeech Samples
…
Frame NumberTime
![Page 12: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/12.jpg)
MIT Lincoln Laboratory
Recognition SystemsFrame-Based Processing
• The first step in modern speech systems is to convert incoming speech samples into frames
• A typical frame rate for a speech stream is 100 frames per second
Speech FramesSpeech Samples
…
Frame NumberTime
![Page 13: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/13.jpg)
MIT Lincoln Laboratory
Recognition SystemsFrame-Based Processing
• The first step in modern speech systems is to convert incoming speech samples into frames
• A typical frame rate for a speech stream is 100 frames per second
Speech Frames
Frame Number
Speech Samples
…
Time
![Page 14: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/14.jpg)
MIT Lincoln Laboratory
Recognition SystemsFront-End Processing
• Front-end processing converts observed speech frames into an alternative representation, features
– Lower dimensionality– Carries information relevant to the problem
Speech Frames Feature Vectors
Front End
Frame Number Feature Number
1 2{ , , , }KX x x x
1x 2x 3x 4xDim 2
Dim 1
![Page 15: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/15.jpg)
MIT Lincoln Laboratory
Recognition SystemsPattern Recognition Training
• A recognition system makes decisions about observed data based on a knowledge of past data
• During training, the system learns about the data it uses to make decisions
– A set of features are collected from a certain language, dialect, or speaker
Training Features
1x 2x 3x 4xDim 2
Dim 1
![Page 16: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/16.jpg)
MIT Lincoln Laboratory
Recognition SystemsPattern Recognition Training
• A recognition system makes decisions about observed data based on a knowledge of past data
• During training, the system learns about the data it uses to make decisions
– A set of features are collected from a certain language, dialect, or speaker
– A model is generated to represent the data
Training Features
Dim 1Dim 2
Dim 1Dim 2
Model
1x2x
( )p x
![Page 17: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/17.jpg)
MIT Lincoln Laboratory
Recognition SystemsGaussian Mixture Models
• A Gaussian mixture model (GMM) represents features as the weighted sum of multiple Gaussian distributions
• Each Gaussian state i has a– Mean – Covariance– Weight
Dim 1Dim 2
iμ
i
iw
Model
( | )p x
![Page 18: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/18.jpg)
MIT Lincoln Laboratory
Recognition SystemsGaussian Mixture Models
Parameters iμ
i
iw
Dim 1Dim 2
( )p x
![Page 19: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/19.jpg)
MIT Lincoln Laboratory
Recognition SystemsGaussian Mixture Models
Model States
Parameters
Dim 1Dim 2
( )p x
![Page 20: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/20.jpg)
MIT Lincoln Laboratory
Recognition SystemsLanguage, Speaker, and Dialect Models
Model States
Languages,Dialects,
or Speakers
Parameters
1Model
2Model
3Model
( | )Cp x
Dim 1Dim 2
In LID, DID, and SID,we train a set of target models
for each dialect, language, or speakerC
![Page 21: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/21.jpg)
MIT Lincoln Laboratory
Recognition SystemsUniversal Background Model
Model States
Parameters
( | )Cp x
We also train a universal backgroundmodel representing all speechC
Model C
Dim 1Dim 2
![Page 22: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/22.jpg)
MIT Lincoln Laboratory
Recognition SystemsHypothesis Test
• Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
0
1
: is from the hypothesized class
: is not from the hypothesized classtest
test
H X
H X
1 2{ , , , }test KX x x x
Dim 1Dim 2
![Page 23: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/23.jpg)
MIT Lincoln Laboratory
Recognition SystemsHypothesis Test
• Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
0
1
: is from the hypothesized class
: is not from the hypothesized classtest
test
H X
H X
1 2{ , , , }test KX x x x
0 ?H
1 ?H
1( | )p x
( | )Cp x
Dim 1Dim 2
Dim 1Dim 2
Dim 1Dim 2
![Page 24: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/24.jpg)
MIT Lincoln Laboratory
Recognition SystemsHypothesis Test
• Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
1 2{ , , , }test KX x x x
1( | )p x
( | )Cp x
Dim 1Dim 2
Dim 1Dim 2
Dim 1Dim 2
English?
Not English?
![Page 25: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/25.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Ratio Score
• We determine which hypothesis is true using the ratio:
• We use the log-likelihood ratio score to decide whether an observed speaker, language, or dialect is the target
( ) log[ ( | )] log[ ( | )]C CX p X p X
threshold, generated by ( )
threshold, generated by C
C
XX
X
00
01
threshold, accept ( | )threshold, reject ( | )
Hp X HHp X H
![Page 26: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/26.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Computation
• The observation log-likelihood given a model is:
( | )p x
Dim 1Dim 2 Dim 1Dim 2
log[ ( | )]?p X
![Page 27: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/27.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Computation
• The observation log-likelihood given a model is:
11 12
1 1
log[ ( | )] log exp ( ) ( )K M
Ti i i iK
i
p X C
x μ x μ
Dot product
( | )p x
Dim 1Dim 2 Dim 1Dim 2
log[ ( | )]?p X
![Page 28: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/28.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Computation
• The observation log-likelihood given a model is:
11 12
1 1
log[ ( | )] log exp ( ) ( )K M
Ti i i iK
i
p X C
x μ x μ
( | )p x
Dim 1Dim 2 Dim 1Dim 2
log[ ( | )]?p X
Constant derived fromweight and covariance
![Page 29: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/29.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Computation
• The observation log-likelihood given a model is:
11 12
1 1
log[ ( | )] log exp ( ) ( )K M
Ti i i iK
i
p X C
x μ x μ
( | )p x
Dim 1Dim 2 Dim 1Dim 2
log[ ( | )]?p X
Table lookup used tocompute this function
![Page 30: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/30.jpg)
MIT Lincoln Laboratory
Recognition SystemsLog-Likelihood Computation
• The observation log-likelihood given a model is:
11 12
1 1
log[ ( | )] log exp ( ) ( )K M
Ti i i iK
i
p X C
x μ x μ
( | )p x
Dim 1Dim 2 Dim 1Dim 2
log[ ( | )]?p X
Sum over all K features
![Page 31: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/31.jpg)
MIT Lincoln Laboratory
Outline
• Introduction
• Recognition for speech applications using GMMs
• Parallel implementation of the GMM
• Performance model
• Conclusions and future work
![Page 32: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/32.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMSummary
• We have developed an algorithm to perform GMM scoring on the Cell processor
• This scoring stage of pattern recognition is where much of the time is spent in current systems
• This section:– Describes the Cell Broadband Engine architecture– Summarizes the strengths and limitations of the Cell– Discusses step-by-step the algorithm we developed for GMM
scoring on the Cell
![Page 33: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/33.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMCell Architecture
• The Cell Broadband Engine has leading performance-per-watt specifications in its class
• Synergistic processing elements (SPEs)
– 256KB of local store memory– 25.6 GFLOPs per SPE– SIMD instructions
• PowerPC processor element (PPE)
• PPE and multiple SPEs operate in parallel and communicate via a high-speed bus
– 12.8e9 bytes/second (one way)
• Each SPE can transfer data from main memory using DMA
– PPE can effectively “send” data to the SPEs using this method
PPESPE 0 SPE N
Main Memory
High-Speed Bus
![Page 34: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/34.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMCell Design Principles
• Limitations of the Cell processor– Size of local store is small—only 256KB– All SPE data must explicitly be transferred in and out of local
store– The PPE is much slower than the SPEs
• Solutions to maximize throughput– Do computations on SPEs when possible– Minimize time when SPEs are idle– Keep commonly-used data on SPEs to avoid cost of
transferring to local store
![Page 35: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/35.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm: Background Scoring
• Begin with a background model and a single feature vector
1x
( | )Cp x
Dim 1Dim 2
On PPE
On PPE
![Page 36: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/36.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 1
• Broadcast the background model to the SPEs
– 616K model is split across SPEs since it will not fit on single SPE
– Kept on SPEs throughout scoring procedure
To SPE 0( | )Cp x
Dim 1Dim 2
To SPE 7
![Page 37: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/37.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 2
• Broadcast copy of the feature vector to the SPEs
x
To SPE 0
…
To SPE 7
![Page 38: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/38.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 3
• Score the feature vector against the background model states on each SPE
0log[ ( | )]
Cp x
SPE 0
SPE 7
7log[ ( | )]
Cp x
x
![Page 39: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/39.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 4
• Move the background scores to the PPE and aggregate
0log[ ( | )]
Cp x
7log[ ( | )]
Cp x
On SPEs
On PPE
7
0
log[ ( | )] log exp log[ ( | )]C Cr
r
p p
x x
![Page 40: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/40.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm: Target Scoring
• Begin with a target model and keep the single feature vector on the SPEs
x
( | )Cp x
Dim 1Dim 2
On SPEs
On PPE
![Page 41: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/41.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 5
• Distribute target model states to the SPEs
– Only a subset of states need to be scored (called Gaussian short-lists)
( | )Cp x
Dim 1Dim 2
To SPE 0
To SPE 7
![Page 42: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/42.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 6
• Score feature vectors against target models
0log[ ( | )]Cp xSPE 0
SPE 7
7log[ ( | )]Cp x
x
![Page 43: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/43.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMMAlgorithm Step 7
• Collect target scores from SPEs and aggregate
0log[ ( | )]Cp x 7log[ ( | )]Cp xOn SPEs
On PPE
7
0
log[ ( | )] log exp log[ ( | )]C Crr
p p
x x
![Page 44: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/44.jpg)
MIT Lincoln Laboratory
Parallel Implementation of the GMM Implementation Challenges
• We have begun implementing our algorithm on the Cell processor
• Implementing vectorization is a challenge– Concentrate on optimizing dot product and aggregation
algorithms
• Designing data transfers is another challenging problem– Subdividing and distributing the models to minimize transfer time– Timing transfers so that they overlap with computation (double
buffering)
11 12
1 1
log[ ( | )] log exp ( ) ( )K M
Ti i i iK
i
p X C
x μ x μ
![Page 45: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/45.jpg)
MIT Lincoln Laboratory
Outline
• Introduction
• Recognition for speech applications using GMMs
• Parallel implementation of the GMM
• Performance model
• Conclusions and future work
![Page 46: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/46.jpg)
MIT Lincoln Laboratory
Performance ModelCell Resources
PPESPE 0 SPE N
Main Memory
High-Speed Bus
8 SPEs used(25.6 GFLOPs each)
12.8e9 bytes per sec
![Page 47: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/47.jpg)
MIT Lincoln Laboratory
Performance ModelData Structures
Model States (2048)
Targets (10)
Parameters (77)Feature Vectors
1 2{ , , , }KX x x x
1x 2x 3x 4xUniversal Background Model
Target Models
Parameters (77)
Model States (2048)
38 Dimensional
100 Features per SecondFor Each Stream
![Page 48: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/48.jpg)
MIT Lincoln Laboratory
100 Features per SecondFor Each Stream
Performance ModelData Structures
Model States (2048)
Targets (10)
Parameters (77)Feature Vectors
1 2{ , , , }KX x x x
1x 2x 3x 4xUniversal Background Model
Target Models
Parameters (77)
Model States (2048)
38 Dimensional
152 Bytes Per Frame
![Page 49: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/49.jpg)
MIT Lincoln Laboratory
Performance ModelData Structures
Model States (2048)
Targets (10)
Parameters (77)Feature Vectors
1 2{ , , , }KX x x x
1x 2x 3x 4xUniversal Background Model
Target Models
Parameters (77)
Model States (2048)
38 Dimensional
6 MB for all TargetsOnly 5 States (15 KB)
Used Per Frame
100 Features per SecondFor Each Stream
![Page 50: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/50.jpg)
MIT Lincoln Laboratory
Performance ModelData Structures
Model States (2048)
Targets (10)
Parameters (77)Feature Vectors
1 2{ , , , }KX x x x
1x 2x 3x 4xUniversal Background Model
Target Models
Parameters (77)
Model States (2048)
38 Dimensional
616KB for Entire UBM77 KB Resides on Each SPE
100 Features per SecondFor Each Stream
![Page 51: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/51.jpg)
MIT Lincoln Laboratory
Performance ModelSimulation and Measurements
Computational Efficiency (Percent)
Con
curr
ent
Re
al-T
ime
Spe
ech
Str
eam
s
![Page 52: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/52.jpg)
MIT Lincoln Laboratory
Performance ModelSimulation and Measurements
Computational Efficiency (Percent)
Increasingoptimization
Con
curr
ent
Re
al-T
ime
Spe
ech
Str
eam
s
![Page 53: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/53.jpg)
MIT Lincoln Laboratory
Performance ModelSimulation and Measurements
Con
curr
ent
Re
al-T
ime
Spe
ech
Str
eam
s
Computational Efficiency (Percent)
Communication and synchronization more
important here
![Page 54: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/54.jpg)
MIT Lincoln Laboratory
Performance ModelSimulation and Measurements
• The effect of increasing the number of speakers, dialects, or languages (targets) was simulated
– Changing the number of targets varies the amount of data sent to SPEs and the amount of calculation per SPE
20% computational efficiency50% data-transfer efficiency
Con
curr
ent
Re
al-T
ime
Spe
ech
Str
eam
s
Number of Speakers, Dialects, or Languages
![Page 55: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/55.jpg)
MIT Lincoln Laboratory
Outline
• Introduction
• Recognition for speech applications using GMMs
• Parallel implementation of the GMM
• Performance model
• Conclusions and future work
![Page 56: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/56.jpg)
MIT Lincoln Laboratory
ConclusionsSummary
• Language, dialect, and speaker recognition systems are large in scale and will benefit from parallelization due to their need for high throughput
• GMM scoring is expensive both in terms of computing resources and memory
• We have designed and modeled an algorithm to perform GMM scoring in an efficient way
– Preserving often-used data on the SPEs– Performing most calculations on the SPEs
![Page 57: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/57.jpg)
MIT Lincoln Laboratory
ConclusionsFuture Work
• Optimization and measurement of the full algorithm to validate the model
• Compare our system against other state-of-the-art serial and parallel approaches
– Intel single processor– Intel multicore– Intel networked– Cell PPE
• Our results will become part of the PVTOL library
![Page 58: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/58.jpg)
MIT Lincoln Laboratory
Acknowledgement
• Cliff Weinstein
• Joe Campbell
• Alan McCree
• Tom Quatieri
• Sharon Sacco
![Page 59: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/59.jpg)
MIT Lincoln Laboratory
Backup
![Page 60: Language, Dialect, and Speaker Recognition Using Gaussian Mixture Models on the Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062519/56814c4f550346895db95d2a/html5/thumbnails/60.jpg)
MIT Lincoln Laboratory
Gaussian Mixture ModelEquation
• A Gaussian mixture model (GMM) represents features as the weighted sum of multiple Gaussian distributions
• Each Gaussian state i has a– Mean – Covariance– Weight
Dim 1Dim 2
iμ
i
iw
1/ 2/ 2
112(2 )
1
( | ) exp ( ) ( )iD
i
Mw T
i i ii
p
x x μ x μ
Model
( | )p x