Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition...
Transcript of Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition...
![Page 1: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/1.jpg)
Age and Gender Recognition
from Speech Patterns Based on
Supervised Non-Negative Matrix Factorization
July 2011 1
Mohamad Hasan Bahari
Hugo Van hamme
![Page 2: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/2.jpg)
Outline
� Introduction and Motivations
� Age and Gender Recognition
� Corpora
� Supervised Non-negative Matrix Factorization
2
� Supervised Non-negative Matrix Factorization
� Proposed Method
� Results
� Conclusions and Future Researches
![Page 3: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/3.jpg)
Introduction
� Confirming the identity of individuals
� Biometric Characteristics
� Fingerprint
� Face
� Iris
3
� Iris
� Hand Geometry
� Ear Shape
� Voice pattern
� +
� Choosing a characteristic
� Availability
� Reliability
![Page 4: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/4.jpg)
Motivation
� In many real world cases, only speech patterns are available(kidnapping, threatening calls, +)
� Speech patterns can include many interesting information
� Gender
� Age
4
� Age
� Dialect (original or previous regions)
� Membership of a particular social group
� +
To facilitates in identifying a criminal
To narrow down the number of suspects
![Page 5: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/5.jpg)
Goal
Goal:
To extract different physical and psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling).
Physical: Psychological:
5
Physical:
1. Gender
2. Age
3. Accent
4. +
Psychological:
1. Anxiousness
2. Stress
3. Confidence
4. +
![Page 6: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/6.jpg)
Age and Gender Recognition
Three approaches:
I. Directly from speech signal.
II. Modeling the speech generation
6
II. Modeling the speech generation
system.
III. Modeling the hearing system.
![Page 7: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/7.jpg)
I. Directly from speech signal.
� Different acoustic features vary with age.
1) Fundamental frequency
2) Speech rate
3) Sound pressure level
Age and Gender Recognition
7
4) …
� By Finding all acoustic features varying with age and their exact relation
to the speaker age.
� Conceptually simple and computationally inexpensive
x These features are affected by many other parameters, such as weight,
height, voice quality, emotional condition, …
![Page 8: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/8.jpg)
Effect of Age and Gender on speech (Fundamental frequency) [1]
Age and Gender Recognition
�Age is only one of inputs affecting
the speech and consequently acoustic
features.
8[1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.
�It is impossible to estimate the age
without considering the rest of inputs
�Perceptions of gender and age have a
significant mutual impact on each
other.
![Page 9: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/9.jpg)
II. Modeling the speech generation system.
� It is an input estimation problem.
x Modeling the speech generation system of the speaker is very
difficult.
Age and Gender Recognition
9
![Page 10: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/10.jpg)
Age and Gender Recognition
III. Modeling the hearing system
� To solve the speech recognition problem, the hearing system is
modeled using Hidden Markove Models (HMMs).
� Using the tools applied in speech recognition problems (HMMs) .
� Well established.
10
� Well established.
� Accurate in recognizing content.
x There exist a difference between the age of a speaker as perceived,
and their actual age.
x Computationally complex
![Page 11: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/11.jpg)
Corpora
Young Young Middle Middle Senior Senior
� 555 speakers from the N-best evaluation corpus [1]
� The corpus contains live and read commentaries, news, interviews, and
reports broadcast in Belgium
�Different age groups and genders
11
Category NameYoung
Male
Young
Female
Middle
Male
Middle
Female
Senior
Male
Senior
Female
Age 18-35 18-35 36-45 36-45 46-81 46-81
Number of Speakers 85 53 160 41 191 25
[1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 2571-2574, 2009.
![Page 12: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/12.jpg)
SNMF
� Non-negative Matrix Factorization (NMF) is a popular machine
learning algorithm [1]
� It is used in supervised or unsupervised modes.
� Supervised NMF or SNMF is a pattern recognition method [1]
12
� It is very effective in the case of high dimension input space.
� It is a generative classifier.
� It can directly classify patterns into multiple classes (no need to
change the problem into multiple binary classification).
[1] H. Van hamme, In proc. Interspeech, Australia, pp. 2554-2557, 2008.
![Page 13: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/13.jpg)
Problem Statement:
Given a training data-set: Str= {(x1, y1), . . ., (xn, yn), . . . , (xN, yN)}
xn is a vector of observed characteristics for the data item
yn denotes a label vector which represents the class that xn belongs to
SNMF
13
Goal:
Approximation of a classifier function (g), such that ŷ=g(xtst) is as
close as possible to the true label.
xtst is an unseen observation
![Page 14: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/14.jpg)
SNMF
SNMF in Training Phase:
First step: Second step:
[ ][ ]Ntr
B
N
tr
S
xxV
yyV
L
L
1
1
=
= tr
tr
B
tr
S
tr
B
tr
Strtrtr HW
W
V
VHWV
≈
≈
=
tr
B
tr
Str
V
VV
Extended Kullbeck-Leibler divergence:
Multiplicative updating formula:
14
( ) ( ) ( ) ( )∑∑ +−+
=
zn
zn
tr
mn
tr
mnmn
trtr
mn
trtr
tr
mntr
mn
trtrtr
KL HVHWHW
VVHWVD ρlog
[ ][ ]
[ ][ ]
[ ][ ]
[ ][ ]trtr
trTtr
NM
Ttr
trtr
Ttr
trtr
tr
Ttr
NM
trtr
HW
VW
W
HH
HHW
V
H
WW
)(1)(
)()(1
o
o
ρ+←
←
×
×
![Page 15: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/15.jpg)
SNMF
SNMF in Testing Phase:
First step: Second step:
( )tsttr
B
tst
KLH
tr
S
tst HWxDWxgytst
minarg)(ˆ ==
tsttr
B
tst HWx ≈ tsttr
S
tst HWxgy == )(ˆ
Extended Kullbeck-Leibler divergence:
Multiplicative updating formula:
15
B HWx ≈ S
( ) ( ) ( ) ( )∑∑ +−+
=
z
z
tst
m
tst
mm
tsttr
B
m
tsttr
B
tst
mtst
m
tsttr
B
stt
KL HxHWHW
xxHWxD ρlog
[ ][ ]
[ ][ ]tsttr
B
tstTtr
B
M
Ttr
B
tsttst
HW
xW
W
HH )(
1)( 1
o
ρ+←
×
![Page 16: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/16.jpg)
Proposed Method
1. Feature selection
2. Acoustic modeling
3. Supervector making procedure
4. Training phase
16
4. Training phase
5. Testing phase
![Page 17: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/17.jpg)
Proposed Method
1. Feature selection
• MEL Spectra
• Mean normalization
• vocal tract length normalization
• Augmented with their first and second order time derivatives.
17
• Augmented with their first and second order time derivatives.
Speech Signal
Feature selection
Feature Vectors
+.
![Page 18: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/18.jpg)
Proposed Method
2. Acoustic modeling
Speaker
Independent
Model
Speaker
Adaptation
Method
Model of
the
Speaker
18
Speaker independent Model:
• An HMM with a shared pool of 49740 Gaussians to model the observations in 3873 cross-word
context-dependent tied triphone states.
Adaptation Method:
• The speaker dependent mixture weights for each speaker result from a re-estimation of the
speaker independent weights based on a forced alignment of the training data for that speaker
using a speaker-independent acoustic model.
The result of this step is 555 speaker adapted models
Model Method Speaker
![Page 19: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/19.jpg)
Proposed Method
3. Supervector making procedure
Gaussian Mixture Model (GMM) of each speaker adapted HMMs is:
Three type of supervectors:
),,()(1
s
j
s
jt
J
j
s
jt owosf
s
∑∆=∑=
µ
19
Three type of supervectors:
1. Means
2. Variances
3. Weights
Weights supervectors:
The result of this step is 555 supervectors for each of 555 speakers
[ ][ ]TTSTsT
n
Ts
Q
s
q
sss wwwfr
)()()( 1
1
λλλχ
λ
LL
LL
=
=
![Page 20: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/20.jpg)
Proposed Method
4. Training phase
20
5. Testing phase
![Page 21: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/21.jpg)
Results
Evaluation Methodology
� 5-fold cross-validation (five independent run)
� In each of five run:
� Training set is speech data of 444 speakers
� Testing set is speech data of 111 speakers
21
� Testing set is speech data of 111 speakers
TST TR TR TR TR
Database
TR TST TR TR TR
Database
.
.
.
Run 1
Run 2
![Page 22: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/22.jpg)
Results
Gender recognition is 96%.
relative confusion matrix
CL
AC
YM YF MM MF SM SF
YM 13 03 58 0 26 0
YF 02 77 04 11 057 0
MM 06 01 44 01 47 0
MF 0 54 02 24 17 02
22
Age group recognition
MF 0 54 02 24 17 02
SM 03 01 19 0 76 0
SF 0 2 08 28 28 16
Category Name
Young Male Young Female Middle MaleMiddle Female
Senior Male Senior Female
Prior 15 10 29 7 34 4
Accuracy 13 77 44 24 76 16
![Page 23: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/23.jpg)
Conclusions and Future Researches
Conclusions:
1. A new age-gender recognition method based on SNMF
2. Supervectors of GMM weights were used
3. Evaluated on N-Best Corpus
4. Gender recognition accuracy is 96%
23
4. Gender recognition accuracy is 96%
5. Age group recognition accuracy is significantly higher than chance level
Future Researches:
1. Age estimation instead of age group recognition.
2. Using supervectors of GMM means and variances and combining these features
![Page 24: Age and Gender Recognition from Speech Patterns … · · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011](https://reader030.fdocuments.net/reader030/viewer/2022021513/5b02c0147f8b9ad85d900c33/html5/thumbnails/24.jpg)
g{tÇ~ lÉâ yÉÜ lÉâÜ TààxÇà|ÉÇ
24