Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
-
Upload
cynthia-lee -
Category
Documents
-
view
214 -
download
0
Transcript of Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Emasters School Leuven 2002 Voice Source Characterization 2
Voice research
To describe and model the properties of the vocal sound source from view points of:– Physiology– Acoustics– Perception
Emasters School Leuven 2002 Voice Source Characterization 3
Importance of the voice
• Speech synthesis– Towards natural sounding synthesis
• Speech recognition– Using source properties in recognition
• Speaker recognition/identification– Voice source characteristics are essential
• Diagnosis– Pathologies, voice classifications
Emasters School Leuven 2002 Voice Source Characterization 4
Voice possibilities
Limited use of voice in speech• Range of the fundamental
frequency• Vocal intensity range• Spectral variation
Emasters School Leuven 2002 Voice Source Characterization 5
Focus in this presentation
How do acoustic voice source characteristics vary as a functionof F0 and vocal intensity
Emasters School Leuven 2002 Voice Source Characterization 6
Voice profile measurement
Thirties: Intensity range as function of various pitches– manual measurement
Eighties: Automatic computation ofF0 and Intensity– computer measurement– visual feedback– additional parameters
Emasters School Leuven 2002 Voice Source Characterization 7
Measurement unit
• One decibel• One semi-tone
Emasters School Leuven 2002 Voice Source Characterization 8
Measurement procedure
• Subject in front of computer screen• Microphone on head set (30 cm)• Just phonate, sing, and see the result
immediately
• Best results with recording protocol• Feed back stimulates extreme
phonations
Emasters School Leuven 2002 Voice Source Characterization 9
Fundamental frequency (Hz)
Voca
l In
ten
sity
(d
B S
PL)
Sam
ple
den
sity
Voice profile / density
Emasters School Leuven 2002 Voice Source Characterization 10
Fundamental frequency (Hz)
Voca
l In
ten
sity
(d
B S
PL)
Sam
ple
den
sity
Voice profile / speech area
Emasters School Leuven 2002 Voice Source Characterization 11
Acoustic voice quality parameters
• Jitter– Stability of periodicity– Asymmetry in vocal folds
• Crest factor– Max amplitude divided by average
energy– Relates to spectral slope
• Many more …
Emasters School Leuven 2002 Voice Source Characterization 12
Crest factorV
oc a
l In
ten
sity
(d
B S
PL)
Fundamental frequency (Hz)
Cre
st f
act
or
Emasters School Leuven 2002 Voice Source Characterization 1353
Jitter
Fundamental frequency (Hz)
Vo
cal
inte
nsi
ty (
dB
SP
L)
regular
irregular
Emasters School Leuven 2002 Voice Source Characterization 14
Real time presentation
Screen presentation• One data point per F0-I cell
Advanced data storage [new]• Full audio signal • Full distribution of data per F0-I cell
• Data for screen presentation
Emasters School Leuven 2002 Voice Source Characterization 15
Advantages
• Reusability of recordings
• Statistical analysis per F0-I cell
• Study of time-varying behavior
Emasters School Leuven 2002 Voice Source Characterization 16
Crest factorV
oc a
l In
ten
sity
(d
B S
PL)
Fundamental frequency (Hz)
Cre
st f
act
or
Emasters School Leuven 2002 Voice Source Characterization 17
Median smoothing of crest factor
Voc a
l In
ten
sity
(d
B S
PL)
Fundamental frequency (Hz)
Cre
st f
act
or
Crest factor median smoothed
Emasters School Leuven 2002 Voice Source Characterization 18
Vocal Registers
Different movement patterns of the vocal folds
• Pulse register (creaky voice)• Modal register• Falsetto register
Emasters School Leuven 2002 Voice Source Characterization 19
Pulse register
• Less than 50 Hz• Irregular • Long closed period
Emasters School Leuven 2002 Voice Source Characterization 20
Fundamental Frequency (Hz)
Voc
al I
nten
sity
(dB
SP
L)
Pulse register
Emasters School Leuven 2002 Voice Source Characterization 21
Modal register
• “Normal” use of voice• Active role of M. Vocalis• Vocal folds thick and completely
vibrating
• Wide range in F0 and intensity
• Flat spectrum
Emasters School Leuven 2002 Voice Source Characterization 22
Fundamental frequency (Hz)
Voc
al I
nten
sity
(dB
SP
L)
Modal register
Emasters School Leuven 2002 Voice Source Characterization 23
Falsetto register
• Higher pitches• M. Vocalis passive, tense vocal
ligaments through M.Cricothyroidus
• Edge vibration of vocal volds• Sound poor in higher harmonics (in
untrained subjects)
Emasters School Leuven 2002 Voice Source Characterization 24
Fundamental frequency (Hz)
Voc
al I
nten
sity
(dB
SP
L)
Falsetto register
Emasters School Leuven 2002 Voice Source Characterization 25
Fundamental frequency (Hz)
Voc
al I
nens
ity
(dB
SP
L)
Register overlap
Emasters School Leuven 2002 Voice Source Characterization 26
Chest- en head voice
Refer to secundary vibratory sensations in the body
• Chest voice: loud modal register• Head voice:
– males: higher, softer modal register in overlap area with falsetto register
– women: falsetto register
Emasters School Leuven 2002 Voice Source Characterization 27
Fundamental frequency (Hz)
Voc
al I
nten
sity
(dB
SP
L)
Chest voice and Head voice
chest
head
Emasters School Leuven 2002 Voice Source Characterization 28
Registers and voice profiles
With a description using
• Iso-crest factor lines• Iso-jitter lines
Emasters School Leuven 2002 Voice Source Characterization 29
Iso-crest factor lines
4 dB
6 dB
Vo c
al I
nten
sity
(dB
SP
L)
Cre
st f
acto
r
Fundamental frequency (Hz)
Emasters School Leuven 2002 Voice Source Characterization 30
Vo c
al I
nten
sity
(dB
SP
L)
Fundamental frequency (Hz)
3 %
Jitt
er (
%)
Iso-jitter lines
Emasters School Leuven 2002 Voice Source Characterization 31
New representation
• Areas defined by iso-parameter lines– crest factor < 4 dB– crest factor > 4 dB, < 6 dB– crest factor > 6 dB– jitter < 3 %– [relative rise time < 6 %]
Emasters School Leuven 2002 Voice Source Characterization 32
Areas in the phonetogramV
o cal
Int
ensi
ty (
dB S
PL
)
Fundamental frequency (Hz)
Jitter > 3%, unstable
RRT < 6 %pressed-like Crest factor < 4 dB
sine-like
Emasters School Leuven 2002 Voice Source Characterization 33Fundamental frequency (Hz)
Vocal registers in the phonetogram
Falsettoupper boundary
Modallower boundary
Chest voiceboundary
Vo c
al I
nten
sity
(dB
SP
L)
Emasters School Leuven 2002 Voice Source Characterization 34
Comparison of voice profiles
Characterisation of
• Voice pathologies• Voice classifications
Reuse stored voice profiles of subjects with known voice history
Emasters School Leuven 2002 Voice Source Characterization 35
Important features
• Contour has limited value– but most research goes into that
direction (norm profiles)
• Distribution of acoustical parameters across the voice profile tells much more
Emasters School Leuven 2002 Voice Source Characterization 36
• Unit for comparison
Voice profile unit defined by small range of F0 and Vocal Intensity
• Distributions of acoustic voice parameters per unit
Probability density function per parameter• Model
Hidden Markov Model
We need
Emasters School Leuven 2002 Voice Source Characterization 37
IN OUT
two unconnected states per phonetogram unit
• vocal registers• start and end of phonetion
Unit model
Emasters School Leuven 2002 Voice Source Characterization 38
Speech Voice Profile
• phoneme model F0/I unit model
• not labeled labeled by F0 and I
• spectral envelope acoustic voice parameters• language model unrestricted transitions
“forced alignment
recognition”
Correspondences
Emasters School Leuven 2002 Voice Source Characterization 39
Crest factor distributions
training subject 1
0
500
4 5 6 7 8 9 10 11 12 13 14 15
test subject 1
0
500
4 5 6 7 8 9 10 11 12 13 14 15
training subject 2
0
500
4 5 6 7 8 9 10 11 12 13 14 15
test subject 2
0
500
4 5 6 7 8 9 10 11 12 13 14 15
Emasters School Leuven 2002 Voice Source Characterization 40
Fundamental frequency (Hz)
Voc
al I
nten
sity
(dB
SP
L)
Dis
tinc
tive
ness
Most distinctive states