Long-Term FormantTerm Formant Distribution as a …...Long-Term FormantTerm Formant Distribution as...

Long-Term FormantLong-Term Formant Distribution as a forensic-phonetic featurephonetic feature

ASA 2 d P A i /Ib iASA 2nd Pan-American/Iberian Meeting on Acoustics

Cancún, México, Nov 15-19, 20102010

Michael Jessen and Timo BeckerMichael Jessen and Timo BeckerBKA, Department of Speaker Identification and Audio Analysis (KT54)

Geoff

Text Box

3aSC4 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net

StructureStructure

1. Long-Term Formant Distribution: measurement methods and backgroundg

2. LTF and body height

3 LTF t i t3. LTF measurement consistency

4. Language dependence of LTF

5. Recognition performance based on LTF and automatic speaker recognition

6. Conclusions

Nov 17, 20102 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

Long-Term Formant (LTF) Distribution: t i lterminology

Long Te m Fo mant Dist ib tion (Nolan & G igo as 2005)Long-Term Formant Distribution (Nolan & Grigoras, 2005)is a global (as opposed to segment-based) representation of vowel formant frequencies over an entire recording of a speaker (or over a long stretch of speech from that speaker).

Formant frequencies are extracted with a formant tracker (LPC-based) and manually corrected. No segmentation into sounds is performed.into sounds is performed.

The resulting distribution of formant values (mainly F2 and F3) can be characterized in different ways Theand F3) can be characterized in different ways. The simplest way is to calculate the average. More advanced ways include modeling of the LTF distribution with Gaussian Mixture Models (GMM) (Becker et al 2008)

Nov 17, 20103 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature 3

Gaussian Mixture Models (GMM) (Becker et al., 2008).

Speech-Datei Ungeschnitten geschnitten und Excel-Ausschnitt

Illustration of the method:Illustration of the method:

Step 1: Editing the signal in a way that only vowels with clear formantonly vowels with clear formant structure remain

Step 2: LPC-analysis and manual correction of the formant tracks

Nov 17, 20104 Long-Term Formant (LTF) Distribution as a forensic-phonetic featureWorkshop LTF - BKA 2010 - M.Jessen 4

Step 3: Exporting the formant tracks F1,2,3 for further processing

F1 of limited reliability in telephone speech; F4 unreliable or invisible

3500

4000

2000

2500

3000

3500

F1

F2

500

1000

1500 F3


01 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129

Formant values every 10 ms

Example of the raw LTF di t ib ti f kdistribution of a speaker

from freeware Catalina Forensic Expert opinion v1.0from Catalin Grigoras (U Colorado Denver) http://www forensicav ro/download/CatalinaManual3h pdf


http://www.forensicav.ro/download/CatalinaManual3h.pdf

Correlation between LTF and body h i ht

1800

height

F2Pearson's product-moment correlation

1400

1500

1600

1700

F2 [H

z]

F2 One-sided (less)rho=-0.315726857072528p=0.00204454743894922

1100

1200

1300

1400

LTF

1100150 155 160 165 170 175 180 185 190 195 200 205

Body height [cm]

2800

F3rho=-0.339139631480740p 0 00097693931875183

2400

2500

2600

2700

F3 [H

z]

F3Significant negative correlations between long-

p=0.00097693931875183

2000

2100

2200

2300LTF

LTF-means from 81 speakers in

term formant frequencies (F2, F3) and body height


2000150 155 160 165 170 175 180 185 190 195 200 205

Body height [cm]

LTF means from 81 speakers in Pool 2010 (telephone-transmitted) (thanks to Hanna Feiser for assistance)

Measurements consistency across h ti i LT F2phoneticians: LT-F2

1800

1600

1700F2

1400

1500

1600

2 [H

z]

JF

AK

Bay

1200

1300

1400

LT-F

2 Bay

B1

B2

1000

1100

1200

10001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

recordings of different speakers


Pearson correlations (two-sided) between 0.84 and 0.95LTF-means from 20 speakers in “Digs” dialect corpus under forensically realistic conditions

Measurements consistency across h ti i LT F3phoneticians: LT-F3

2800

F3

2600

2700F3

2400

2500

3 [H

z]

JFAKBay

2200

2300

2400

LT-F

BayB1B2

2100

2200

20001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

recordings of different speakers


Pearson correlations (two-sided) between 0.98 and 0.99

Language influence on LTFLanguage influence on LTF

2900

3000

RussianGerman probe1German probe2 For these data,

2600

2700

2800

[Hz]

German probe3Albanian

different languages do not differ in the LTF-space that th

2400

2500LT‐F3 [ they occupy

(one-way ANOVA [F(4,55) = 0.44; p= 0.77]).

2100

2200

2300

2000

2100

1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

LTF-means from three German speakers in Digs dialect corpus and from Russian and Albanianspeakers in case data under


LT‐F2 [Hz]p

analogous conditions (spont telephone)

Speaker recognition tests

37 target trials and 803 non-target trials, involving 21 speakers

Speaker recognition tests

g g , g pfrom casework, comparing:

- Baseline = a standard GMM-UBM automatic system- FGMM = GMM-modeled LTF


Target trials (same speaker)

Non-target trials (different speakers)New development at BKA:

DiSC-PlotDiscrimination, Scatter, Correlation

butio

nm

ant

Dis

trib

g-Te

rm F

orm

Long


Automatic speaker recognition system

logLR (lnLR)

Conclusions: LTF analysis in forensic h ti d ti (1)phonetics and acoustics (1)

☺ LTF (F2 and F3) correlates negatively with body height (relevant for voice profiling).

☺ LTF measurements have high consistency across phonetic experts.

☺ f f☺ Pending further tests and with some degree of caution, LTF statistics established for one language can be used across languages.

☺ LTF (F2 and F3) do not differ much between different vocal effort levels. Vocal effort differences are a common problem i f i t i lin forensic material.


Conclusions: LTF analysis in forensic h ti d ti (2)phonetics and acoustics (2)

Performance of LTF analysis with classical evaluation measuresPerformance of LTF analysis with classical evaluation measures (DET-plots, APE-plots, Cllr) is worse than performance of automatic speaker recognition and fusion does not increase overall performance. But: p

The tests so far are based predominantly on matching conditions; under mismatched conditions, the relative performance of LTF analysis might increase.analysis might increase.

☺ Detailed results in the DiSC plot shows that LTF and automatic speaker recognition can make different errors: using both methods is a good safeguard against false conclusions.methods is a good safeguard against false conclusions.

Quite limited LR values in same-speaker comparisons (max about LR=16 in case material for the tests so far): LTF cannot give very strong support for same-speaker hypothesisstrong support for same speaker hypothesis.

☺ Different-speaker comparisons can yield very low LR values: LTF can give very strong support for different-speaker hypothesis.


References

Becker, Timo, Michael Jessen and Catalin Grigoras (2008): Forensic speaker verification using formant features and Gaussian mixture models. Proceedings of Interspeech 2008, 1505-1508.

Kirchhübel Christin (2009): The effects of Lombard speech on vowel formant measurements MSc thesisKirchhübel, Christin (2009): The effects of Lombard speech on vowel formant measurements. MSc thesis, University of York, UK.

Moos, Anja (2008): Forensische Sprechererkennung mit der Messmethode LTF (long-term formant distribution) MA thesis Universität des Saarlandesdistribution). MA thesis, Universität des Saarlandes. www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286.

Moos, Anja (2010): Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech To appear in The Phoneticianspontaneous speech. To appear in The Phonetician.

Nolan, Francis and Catalin Grigoras (2005): A Case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12: 143-173.

Wagner, Katrin (2010): Der Einfluss der Sprechlautstärke auf die ersten drei Vokalformanten in mobilfunkübertragener Sprache: Forensischer Stimmenvergleich anhand der LTF-Methode“. BA thesis, Universität Frankfurt.

Nov 17, 201015 Long-Term Formant (LTF) Distribution as a forensic-phonetic featureWorkshop LTF - BKA 2010 - M.Jessen 15

Inter-speaker variation: Mean LTF for 71 d lt l k f G71 adult male speakers of German

Means of LT F2 and LT F3Means of LT-F2 and LT-F3

Moos (2008, 2010), based on GSM transmitted speech inGSM-transmitted speech in BKA corpus “Pool 2010”


Influence of vocal effort (Lombard diti ) LT F1condition) on LT-F1

800

LTF means from 31 speakers in Pool 2010 (telephone transmitted) based on Wagner

700

LTF-means from 31 speakers in Pool 2010 (telephone-transmitted), based on Wagner (2010); cf. also Kirchhübel (2009) and this conference

500

600

T-F1

[Hz]

normalLombard

400

LT

200

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers

LT-F1 consistently higher in Lombard speech. Significant difference with paired t-


test, indicating substantial intra-speaker variation. But: LT-F1 is of limited forensic use anyway (due to the effect of telephone transmission on F1)


1700

1600

1400

1500

F2 [H

z]

normalL b d

1300

LT-F Lombard

1100

1200

Lombard effect on LT F2 inconsistent across speakers Non significant difference

11001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers


Lombard-effect on LT-F2 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.


2700

2600

2400

2500

3 [H

z]

normal

2300

2400

LT-F

3

Lombard

2200

Lik ith LT F2 L b d ff t LT F3 i i t t k N i ifi t

21001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

31 speakers


Like with LT-F2: Lombard-effect on LT-F3 inconsistent across speakers. Non-significant difference with paired t-test, indicating acceptable intra-speaker variation.

DET-Plot

Automatic speaker recognition systemu o a p a og o yGMM-modeled Long-Term Formant Distribution


APE-Plot

Cllrllr


Long-Term FormantTerm Formant Distribution as a …...Long-Term FormantTerm Formant Distribution as...

Documents

Transcript of Long-Term FormantTerm Formant Distribution as a …...Long-Term FormantTerm Formant Distribution as...