Evaluating the Efficacy of Prosody -lab Aligner for a...

EvaluatingtheEfficacyofProsody-labAlignerforaStudyofVowelVariationinCantonese

AndrewPeters(彭浩軒) &HolmanTse(謝浩明)[email protected]@pitt.edu

WorkshoponInnovationsinCantoneseLinguistics(WICL-3)

HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC

Columbus,OHMarch12,2016

PresentationGoals

• TodemonstratetheuseofProsodylab AlignerasatoolforalargescaleprojectexaminingvowelvariationandchangeinTorontoHeritageCantonese

• ToaddresstheeffectivenessofProsody-labalignerforthispurpose

• ToassessthebestsourcefortrainingnewModels– Datafromallspeakerstogether(ALL)?– Datafromeachgenerationalgroupseparately(GEN)?– Datafromeachspeakerindividually(SOLO)?

• Large-scaleprojectinvestigatingvariationandchangeinToronto’sheritagelanguages.

• Includes sociolinguisticinterviewdatafrom7+heritagelanguagesspokenbyimmigrantsand2or3generationsoftheirdescendants

• Thecorpusmakesitpossibletoinvestigatecontacteffectsonawidevarietyofvariablesacrossalllanguagesusingthesamemethodology

WhatistheHLVCProject?

ASampleofLinguisticVariablesCantonese Faetar Italian Korean Polish Russian Ukrainian

VOT ✓ ✓ ✓ ✓ ✓

Ø-subject ✓ ✓ ✓ ✓ ✓

Borrowing ✓ ✓

Classifiers WICL-1/3 WICL-3

Vowels WICL-3

4

GEN 1 GEN2

BornandraisedinHK,ImmigratedtoCanadaasadults

GrewupinToronto

L1 Cantonese,SomeL2English Simultaneous (Early)Bilingual inCantoneseandTorontoEnglish

MethodologicalIssues• Hour-longinterviews(spontaneousspeech)fromeachof~40speakers– 40speakersX8vowelsX6tonesX10+tokens/each=19200!!!

5

• ForcedAlignmentTools• FAVE(Rosenfelder etal2011)

– NowwidelyusedforsociolinguisticstudiesofEnglishdialects

– ButonlyworksonEnglish• Prosodylab-Aligner(Gormanetal2011)

– Cantrainnewmodelsfromrawdatamakingitcustomizableforanylanguage

– However,itsefficacyforCantoneseunknown

MoreAboutProsodylab• ProsodyLab (Gormanetal.2011)isbasedontheHiddenMarkovToolkit(HTK),aspeechrecognitiontoolkitbasedonHiddenMarkovModels,developedatCambridgeUniversity

• Requires– Python2.6orabove– SoX (SoundExchange)– HTK(HiddenMarkovModelToolkit)

• Canbedownloadedfrom– https://github.com/kylebgorman/Prosodylab-Aligner– Moreinfo

• http://prosodylab.org/tools/aligner/

WhatisForcedAlignment?

• Forcedalignmentautomatestheprocessoftime-aligningtranscriptionwithaudiosignal

• Permitsautomatedmeasureofvariable,e.g.formantvalues

AboutAcousticModels

• Usesmachine-learningtoperformtranscripttoaudiotime-alignment

• Speechmodelsmapphoneliststoaudiosignal• Willvaryinhowwelltheyfitthedata,howwelltheydemarcateboundariesetc.Henceourstudy!

Questions• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?

• Whatisthebestdatasourcefortrainingmodels?– Allspeakerstogether(ALL)?

• Morerobustmodel,butdoesitworkaswellwiththevariationpresentinaHLvariety

– Eachgenerationalgroupseparately(GEN)?• Tse(2015)suggestinter-generationalphonologicaldifferences

– Eachspeakerindividually(SOLO)?• Requiresalargepercentageofdata,butwoulditbeasaccurate?

Pre-processing1. Interviewstranscribedbynativespeakersof

CantoneseusingJyutping RomanizationinELAN– Manualsentence-levelalignment

2. Tocreate inputreadablebyProsodylab-Aligner,PRAATscriptusedtocreatesmaller.wavfileswithmatching .txtfilesforeachannotation.

10

PRAATScript(Labber)

C1F54A_IV_2074.wav

C1F54A_IV_2105.wavTranslation: “AndthentheCommunistPartycame,andthen...”

Translation: “Becauseatthattime,Chinawasatwar.”

11

Forcedalignmentneedsacustomdictionary

12

Orthography PhonemesGU1 GUGU2 GUGU3 GUGU4 GUGU5 GUGU6 GUTUB TAH1BTUBA TUW1BAH0TUBAL TUW1BAH0LTUBB TAH1BTUBBS TAH1BZTUBBY TAH1BIY0TUBE TUW1BTUBE TYUW1B

Totrainanacousticmodel:• pronouncingbilingualdictionary(~currently3.6MB)• importantb/cprogramcan’trunwhenthereareunrecognizedwordsinthetranscript• programneedstoconvertorthographytophonemicsegmentasestablishedbycustomdictionary

TrainingandEvaluation• .wavfilesandmatching.labfilesputinaTrainingdirectory

13

CustomdictionaryintheformatofTheCMUPronouncingDictionary

Our3ModelsofTrainingWith50%ofdatafromeachspeaker:1. Solo-trainedmodel:trainedonlyondataforspeakerevaluated2. Generation-trainedmodel:DatafromallspeakersofeachGen.Combined

inTrainingdirectory3. “All”-trainedmodel:DatafromallspeakerscombinedinTrainingdirectory

• Prosodylab-aligner usesTrainingdirectoryanddictionarytobuildanacousticmodel

MoreTrainingData(Hoursofspeech)à BetterModelTherefore:Morespeakersdatausedintraining=Lessdatalost

fromeachspeakertotraining

OutputofProsodylab-Aligner:Time-alignedTextgrid

14

AssessingAccuracy• Assessmentbasedon10speakers(fourGEN1andsixGEN2)• Examined first10usabletextgrids foreachspeaker

GoldStandard:ManuallyidentifyvowelboundariesforallCANmonophthongs

AssessingAccuracyProcedures• Record“GoldStandard”vowelboundaries• RecordAuto-alignedvowelboundariesSegmentBoundaries:Solo-aligned SegmentBoundaries:Gen-aligned SegmentBoundaries:All-aligned

AssessingAccuracy

• Manual(“GoldStandard”)Measurementstakenofleft&RightboundariesofMonopthongs• ComparedtoAutoboundaries:Differentialonleft&right,ABSofdiff.,diff.oftotallength

• Root-Mean-Square-Deviationtakenofeachboundary(Chenetal2004)

• Average Lengthofvowelsforeachmodel• %ofvowels’centres (by“GoldStandard”)whichfallwithintheauto-

alignedboundaries

TranscriptionIssues

Entiretyof“O5Lam2Jiu3”within“Gong2”boundaries Samefile:Thealigner“Catchesup”andalignslatersectionswithexcellentaccuracy

ModelingSilence

Alignerplaces“Hei2Maa5”audiosignalwithinsilence

• TheeffectismorecommoninSolo-alignedtextgrids• Hypothesis:Silencemodellingisbetterwithmoredataformodeltraining

SyllableFusionIssues

Fusion ofMei-Je-->Me

Fusion ofZa-Hai-->ZeiFusion ofSeng-Jat--> Set

Wong (2006)

• Somerareexamplescauseproblems: SengJat• However,whenweuseaclosertranscription,

thealignerdoeswell

ResultsTableSOLO GEN ALL

RootMeanSquareDeviation–LeftBoundary

0.185 0.193 0.214

Root MeanSquareDeviation–RightBoundary

0.187 0.197 0.207

# ofVowelsinTarget 383 368 382

%VowelsinTarget 81.84% 78.63% 81.62%

Avg.AutoV.Length 0.127s 0.124s 0.132s

Avg. V.LengthDeviation 0.014s 0.011s 0.019s

Inspiteofproblems,quiteaccurate:

• Solo-trainedmodelhasthelowestdeviation fromgold-standardboundaries• All-trainedmodelpredictslongervowels:hencehigher%ofvowelcentres

withinboundaries, despitehighdeviation• Overly-longsegmentpredictionwouldbebadforstudiesof length,VOT,etc.

Summary• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?YES

• Whatisthebestbaselinetostartwith– Allspeakerstogether(ALL)?– Eachgenerationalgroupseparately(GEN)?– Eachspeakerindividually(SOLO)?

Discussion

• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?– Yes,Overall,80%accuracyforallthreemodels– Canstillbeausefultoolinfacilitatingthevowelmeasurementprocesswithapreliminaryestimateofwherethevowelboundariesare

– Boundariescanbemanuallyadjustedlater.

Discussion

• Whatisthebestbaselinetostartwith– ALL• Moredataused,butmodelovergeneratesà resultedinhighRMSD

– SOLO• SlightlymoreaccurateandsmallerRMSDthanALLandGENmodels,butnotmuchdata/toomuchdatalosttotraining

– GEN• Areasonablecompromisebetweenamountofdatausedintrainingvs.generalaccuracy

Conclusion

• TheGENmodelworksbetterthanALL(contrarytoexpectations)possiblybecauseofsignificantinter-generationaldifferences(cf.Tse2015)

• Yet,evenwithasmuchvariationaspresent,itisstillgenerallyaccurate,andcanbeausefultoolforCantonesecorpus-basedstudies.

• Usefulforanystudythatrequiressegmentalboundaryinformation– Ex:VOT,vowellength,vowelformantmeasures,tone,consonants,etc

多謝감사합니다дякую СпасибоGraziemolto gratsiə namuor:ə

HLVCRAs:CameronAbmaVanessaBertoneUlyana BilaRosannaCallaMinji ChaAbigailChanKarenChanJoannaChociejSheilaChungTiffanyChungCourtneyClintonRachelCoulterRadu CraioveanuMarcoCoviZahid DaudjeeDerekDenisToniaDjogovicJoyceFok

PaoloFrascaMattGardnerRickGrimmDongkeun HanNataliaHarhajTaisa HewkaMelania HrycynaMichaelIannozziDianaKimJanyce KimIryna KulykMarianaKuzelaAnnKwonAlexLaGambaCarmelaLaRosaNataliaLapinskayaKrisLeeNikkiLeeOlgaLevitski

Arash LotfiSamuelLoPaulinaLyskawaRosaMastriTimea MolnárJamieOhMariaParascandoloRitaPangTiina RebaneHoyeon RimWillSawkiwMaksym ShkvoretsVeraRichetti SmithAnnaShalaginovaKonstantinShapovalYiQingSimMarioSoGaoAwet TekesteJosephineTong

SarahTruongDylanUscherElaineWangKa-manWongJunrui WuOliviaYuMinyi ZhuCollaborators:Yoonjung KangAlexeiKochetovNaomiNagyJamesWalkerFunding:SSHRC,UniversityofToronto,ShevchenkoFoundation

26

ReferencesChen,L.,Liu,Y.,Harper,M.P.,Maia,E.,&McRoy,S.(2004).EvaluatingFactorsImpactingtheAccuracyofForcedAlignments inaMultimodalCorpus.InLREC.Retrievedfromhttps://www-new.comp.nus.edu.sg/~rpnlpir/proceedings/lrec-2004/pdf/307.pdfGorman,K.,Howell,J.,&Wagner,M.(2011).Prosodylab-aligner:Atoolforforcedalignmentoflaboratoryspeech.CanadianAcoustics,39(3),192–193.Nagy,N.(2011).AMultilingual CorpustoExploreVariationinLanguageContactSituations.Rassegna Italiana DiLinguistica Applicata,43(1/2),65–84.Rosenfelder, I.,Fruehwald, J.,Evanini,K.,&Yuan,J.(2011).FAVE(ForcedAlignmentandVowelExtraction)ProgramSuite.Retrievedfromhttp://fave.ling.upenn.eduWong,WaiYiPeggy.2006“SyllableFusion inHongKongCantoneseConnectedSpeech.”Ph.D.Dissertation.TheOhioStateUniversity.

HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTOHTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC

• Slideswillbeavailableathttp://www.pitt.edu/~hbt3/presentations.html

• Thankyou!• 多謝晒!

Evaluating the Efficacy of Prosody -lab Aligner for a...

Documents

Transcript of Evaluating the Efficacy of Prosody -lab Aligner for a...