Vol. Stop-consonantrecognition: Release burstsand formant ...
Long-Term FormantTerm Formant Distribution as a …...Long-Term FormantTerm Formant Distribution as...
Transcript of Long-Term FormantTerm Formant Distribution as a …...Long-Term FormantTerm Formant Distribution as...
Long-Term FormantLong-Term Formant Distribution as a forensic-phonetic featurephonetic feature
ASA 2 d P A i /Ib iASA 2nd Pan-American/Iberian Meeting on Acoustics
Cancún, México, Nov 15-19, 20102010
Michael Jessen and Timo BeckerMichael Jessen and Timo BeckerBKA, Department of Speaker Identification and Audio Analysis (KT54)
StructureStructure
1. Long-Term Formant Distribution: measurement methods and backgroundg
2. LTF and body height
3 LTF t i t3. LTF measurement consistency
4. Language dependence of LTF
5. Recognition performance based on LTF and automatic speaker recognition
6. Conclusions
Nov 17, 20102 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
Long-Term Formant (LTF) Distribution: t i lterminology
Long Te m Fo mant Dist ib tion (Nolan & G igo as 2005)Long-Term Formant Distribution (Nolan & Grigoras, 2005)is a global (as opposed to segment-based) representation of vowel formant frequencies over an entire recording of a speaker (or over a long stretch of speech from that speaker).
Formant frequencies are extracted with a formant tracker (LPC-based) and manually corrected. No segmentation into sounds is performed.into sounds is performed.
The resulting distribution of formant values (mainly F2 and F3) can be characterized in different ways Theand F3) can be characterized in different ways. The simplest way is to calculate the average. More advanced ways include modeling of the LTF distribution with Gaussian Mixture Models (GMM) (Becker et al 2008)
Nov 17, 20103 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 3
Gaussian Mixture Models (GMM) (Becker et al., 2008).
Speech-Datei Ungeschnitten geschnitten und Excel-Ausschnitt
Illustration of the method:Illustration of the method:
Step 1: Editing the signal in a way that only vowels with clear formantonly vowels with clear formant structure remain
Step 2: LPC-analysis and manual correction of the formant tracks
Nov 17, 20104 Long-Term Formant (LTF) Distribution as a forensic-phonetic featureWorkshop LTF - BKA 2010 - M.Jessen 4
Step 3: Exporting the formant tracks F1,2,3 for further processing
F1 of limited reliability in telephone speech; F4 unreliable or invisible
3500
4000
2000
2500
3000
3500
F1
F2
500
1000
1500 F3
Nov 17, 20105 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 5
01 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129
Formant values every 10 ms
Example of the raw LTF di t ib ti f kdistribution of a speaker
from freeware Catalina Forensic Expert opinion v1.0from Catalin Grigoras (U Colorado Denver) http://www forensicav ro/download/CatalinaManual3h pdf
Nov 17, 20106 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
http://www.forensicav.ro/download/CatalinaManual3h.pdf
Correlation between LTF and body h i ht
1800
height
F2Pearson's product-moment correlation
1400
1500
1600
1700
F2 [H
z]
F2 One-sided (less)rho=-0.315726857072528p=0.00204454743894922
1100
1200
1300
1400
LTF
1100150 155 160 165 170 175 180 185 190 195 200 205
Body height [cm]
2800
F3rho=-0.339139631480740p 0 00097693931875183
2400
2500
2600
2700
F3 [H
z]
F3Significant negative correlations between long-
p=0.00097693931875183
2000
2100
2200
2300LTF
LTF-means from 81 speakers in
term formant frequencies (F2, F3) and body height
Nov 17, 20107 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 7
2000150 155 160 165 170 175 180 185 190 195 200 205
Body height [cm]
LTF means from 81 speakers in Pool 2010 (telephone-transmitted) (thanks to Hanna Feiser for assistance)
Measurements consistency across h ti i LT F2phoneticians: LT-F2
1800
1600
1700F2
1400
1500
1600
2 [H
z]
JF
AK
Bay
1200
1300
1400
LT-F
2 Bay
B1
B2
1000
1100
1200
10001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
recordings of different speakers
Nov 17, 20108 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 8
Pearson correlations (two-sided) between 0.84 and 0.95LTF-means from 20 speakers in “Digs” dialect corpus under forensically realistic conditions
Measurements consistency across h ti i LT F3phoneticians: LT-F3
2800
F3
2600
2700F3
2400
2500
3 [H
z]
JFAKBay
2200
2300
2400
LT-F
BayB1B2
2100
2200
20001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
recordings of different speakers
Nov 17, 20109 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 9
Pearson correlations (two-sided) between 0.98 and 0.99
Language influence on LTFLanguage influence on LTF
2900
3000
RussianGerman probe1German probe2 For these data,
2600
2700
2800
[Hz]
German probe3Albanian
different languages do not differ in the LTF-space that th
2400
2500LT‐F3 [ they occupy
(one-way ANOVA [F(4,55) = 0.44; p= 0.77]).
2100
2200
2300
2000
2100
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
LTF-means from three German speakers in Digs dialect corpus and from Russian and Albanianspeakers in case data under
Nov 17, 201010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 10
LT‐F2 [Hz]p
analogous conditions (spont telephone)
Speaker recognition tests
37 target trials and 803 non-target trials, involving 21 speakers
Speaker recognition tests
g g , g pfrom casework, comparing:
- Baseline = a standard GMM-UBM automatic system- FGMM = GMM-modeled LTF
Nov 17, 201011 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
Target trials (same speaker)
Non-target trials (different speakers)New development at BKA:
DiSC-PlotDiscrimination, Scatter, Correlation
butio
nm
ant
Dis
trib
g-Te
rm F
orm
Long
Nov 17, 201012 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
Automatic speaker recognition system
logLR (lnLR)
Conclusions: LTF analysis in forensic h ti d ti (1)phonetics and acoustics (1)
☺ LTF (F2 and F3) correlates negatively with body height (relevant for voice profiling).
☺ LTF measurements have high consistency across phonetic experts.
☺ f f☺ Pending further tests and with some degree of caution, LTF statistics established for one language can be used across languages.
☺ LTF (F2 and F3) do not differ much between different vocal effort levels. Vocal effort differences are a common problem i f i t i lin forensic material.
Nov 17, 201013 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
Conclusions: LTF analysis in forensic h ti d ti (2)phonetics and acoustics (2)
Performance of LTF analysis with classical evaluation measuresPerformance of LTF analysis with classical evaluation measures (DET-plots, APE-plots, Cllr) is worse than performance of automatic speaker recognition and fusion does not increase overall performance. But: p
The tests so far are based predominantly on matching conditions; under mismatched conditions, the relative performance of LTF analysis might increase.analysis might increase.
☺ Detailed results in the DiSC plot shows that LTF and automatic speaker recognition can make different errors: using both methods is a good safeguard against false conclusions.methods is a good safeguard against false conclusions.
Quite limited LR values in same-speaker comparisons (max about LR=16 in case material for the tests so far): LTF cannot give very strong support for same-speaker hypothesisstrong support for same speaker hypothesis.
☺ Different-speaker comparisons can yield very low LR values: LTF can give very strong support for different-speaker hypothesis.
Nov 17, 201014 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
References
Becker, Timo, Michael Jessen and Catalin Grigoras (2008): Forensic speaker verification using formant features and Gaussian mixture models. Proceedings of Interspeech 2008, 1505-1508.
Kirchhübel Christin (2009): The effects of Lombard speech on vowel formant measurements MSc thesisKirchhübel, Christin (2009): The effects of Lombard speech on vowel formant measurements. MSc thesis, University of York, UK.
Moos, Anja (2008): Forensische Sprechererkennung mit der Messmethode LTF (long-term formant distribution) MA thesis Universität des Saarlandesdistribution). MA thesis, Universität des Saarlandes. www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286.
Moos, Anja (2010): Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech To appear in The Phoneticianspontaneous speech. To appear in The Phonetician.
Nolan, Francis and Catalin Grigoras (2005): A Case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12: 143-173.
Wagner, Katrin (2010): Der Einfluss der Sprechlautstärke auf die ersten drei Vokalformanten in mobilfunkübertragener Sprache: Forensischer Stimmenvergleich anhand der LTF-Methode“. BA thesis, Universität Frankfurt.
Nov 17, 201015 Long-Term Formant (LTF) Distribution as a forensic-phonetic featureWorkshop LTF - BKA 2010 - M.Jessen 15
Nov 17, 201016 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
Inter-speaker variation: Mean LTF for 71 d lt l k f G71 adult male speakers of German
Means of LT F2 and LT F3Means of LT-F2 and LT-F3
Moos (2008, 2010), based on GSM transmitted speech inGSM-transmitted speech in BKA corpus “Pool 2010”
Nov 17, 201017 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 17
Influence of vocal effort (Lombard diti ) LT F1condition) on LT-F1
800
LTF means from 31 speakers in Pool 2010 (telephone transmitted) based on Wagner
700
LTF-means from 31 speakers in Pool 2010 (telephone-transmitted), based on Wagner (2010); cf. also Kirchhübel (2009) and this conference
500
600
T-F1
[Hz]
normalLombard
400
LT
200
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 speakers
LT-F1 consistently higher in Lombard speech. Significant difference with paired t-
Nov 17, 201018 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 18
test, indicating substantial intra-speaker variation. But: LT-F1 is of limited forensic use anyway (due to the effect of telephone transmission on F1)
Influence of vocal effort (Lombard diti ) LT F2condition) on LT-F2
1700
1600
1400
1500
F2 [H
z]
normalL b d
1300
LT-F Lombard
1100
1200
Lombard effect on LT F2 inconsistent across speakers Non significant difference
11001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 speakers
Nov 17, 201019 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 19
Lombard-effect on LT-F2 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.
Influence of vocal effort (Lombard diti ) LT F3condition) on LT-F3
2700
2600
2400
2500
3 [H
z]
normal
2300
2400
LT-F
3
Lombard
2200
Lik ith LT F2 L b d ff t LT F3 i i t t k N i ifi t
21001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 speakers
Nov 17, 201020 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 20
Like with LT-F2: Lombard-effect on LT-F3 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.
DET-Plot
Automatic speaker recognition systemu o a p a og o yGMM-modeled Long-Term Formant Distribution
Nov 17, 201021 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature
APE-Plot
Cllrllr
Nov 17, 201022 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature