Evaluating the Efficacy of Prosody -lab Aligner for a...
Transcript of Evaluating the Efficacy of Prosody -lab Aligner for a...
EvaluatingtheEfficacyofProsody-labAlignerforaStudyofVowelVariationinCantonese
AndrewPeters(彭浩軒) &HolmanTse(謝浩明)[email protected]@pitt.edu
WorkshoponInnovationsinCantoneseLinguistics(WICL-3)
HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC
Columbus,OHMarch12,2016
PresentationGoals
• TodemonstratetheuseofProsodylab AlignerasatoolforalargescaleprojectexaminingvowelvariationandchangeinTorontoHeritageCantonese
• ToaddresstheeffectivenessofProsody-labalignerforthispurpose
• ToassessthebestsourcefortrainingnewModels– Datafromallspeakerstogether(ALL)?– Datafromeachgenerationalgroupseparately(GEN)?– Datafromeachspeakerindividually(SOLO)?
• Large-scaleprojectinvestigatingvariationandchangeinToronto’sheritagelanguages.
• Includes sociolinguisticinterviewdatafrom7+heritagelanguagesspokenbyimmigrantsand2or3generationsoftheirdescendants
• Thecorpusmakesitpossibletoinvestigatecontacteffectsonawidevarietyofvariablesacrossalllanguagesusingthesamemethodology
WhatistheHLVCProject?
ASampleofLinguisticVariablesCantonese Faetar Italian Korean Polish Russian Ukrainian
VOT ✓ ✓ ✓ ✓ ✓
Ø-subject ✓ ✓ ✓ ✓ ✓
Borrowing ✓ ✓
Classifiers WICL-1/3 WICL-3
Vowels WICL-3
4
GEN 1 GEN2
BornandraisedinHK,ImmigratedtoCanadaasadults
GrewupinToronto
L1 Cantonese,SomeL2English Simultaneous (Early)Bilingual inCantoneseandTorontoEnglish
MethodologicalIssues• Hour-longinterviews(spontaneousspeech)fromeachof~40speakers– 40speakersX8vowelsX6tonesX10+tokens/each=19200!!!
5
• ForcedAlignmentTools• FAVE(Rosenfelder etal2011)
– NowwidelyusedforsociolinguisticstudiesofEnglishdialects
– ButonlyworksonEnglish• Prosodylab-Aligner(Gormanetal2011)
– Cantrainnewmodelsfromrawdatamakingitcustomizableforanylanguage
– However,itsefficacyforCantoneseunknown
MoreAboutProsodylab• ProsodyLab (Gormanetal.2011)isbasedontheHiddenMarkovToolkit(HTK),aspeechrecognitiontoolkitbasedonHiddenMarkovModels,developedatCambridgeUniversity
• Requires– Python2.6orabove– SoX (SoundExchange)– HTK(HiddenMarkovModelToolkit)
• Canbedownloadedfrom– https://github.com/kylebgorman/Prosodylab-Aligner– Moreinfo
• http://prosodylab.org/tools/aligner/
WhatisForcedAlignment?
• Forcedalignmentautomatestheprocessoftime-aligningtranscriptionwithaudiosignal
• Permitsautomatedmeasureofvariable,e.g.formantvalues
AboutAcousticModels
• Usesmachine-learningtoperformtranscripttoaudiotime-alignment
• Speechmodelsmapphoneliststoaudiosignal• Willvaryinhowwelltheyfitthedata,howwelltheydemarcateboundariesetc.Henceourstudy!
Questions• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?
• Whatisthebestdatasourcefortrainingmodels?– Allspeakerstogether(ALL)?
• Morerobustmodel,butdoesitworkaswellwiththevariationpresentinaHLvariety
– Eachgenerationalgroupseparately(GEN)?• Tse(2015)suggestinter-generationalphonologicaldifferences
– Eachspeakerindividually(SOLO)?• Requiresalargepercentageofdata,butwoulditbeasaccurate?
Pre-processing1. Interviewstranscribedbynativespeakersof
CantoneseusingJyutping RomanizationinELAN– Manualsentence-levelalignment
2. Tocreate inputreadablebyProsodylab-Aligner,PRAATscriptusedtocreatesmaller.wavfileswithmatching .txtfilesforeachannotation.
10
PRAATScript(Labber)
C1F54A_IV_2074.wav
C1F54A_IV_2105.wavTranslation: “AndthentheCommunistPartycame,andthen...”
Translation: “Becauseatthattime,Chinawasatwar.”
11
Forcedalignmentneedsacustomdictionary
12
Orthography PhonemesGU1 GUGU2 GUGU3 GUGU4 GUGU5 GUGU6 GUTUB TAH1BTUBA TUW1BAH0TUBAL TUW1BAH0LTUBB TAH1BTUBBS TAH1BZTUBBY TAH1BIY0TUBE TUW1BTUBE TYUW1B
Totrainanacousticmodel:• pronouncingbilingualdictionary(~currently3.6MB)• importantb/cprogramcan’trunwhenthereareunrecognizedwordsinthetranscript• programneedstoconvertorthographytophonemicsegmentasestablishedbycustomdictionary
TrainingandEvaluation• .wavfilesandmatching.labfilesputinaTrainingdirectory
13
CustomdictionaryintheformatofTheCMUPronouncingDictionary
Our3ModelsofTrainingWith50%ofdatafromeachspeaker:1. Solo-trainedmodel:trainedonlyondataforspeakerevaluated2. Generation-trainedmodel:DatafromallspeakersofeachGen.Combined
inTrainingdirectory3. “All”-trainedmodel:DatafromallspeakerscombinedinTrainingdirectory
• Prosodylab-aligner usesTrainingdirectoryanddictionarytobuildanacousticmodel
MoreTrainingData(Hoursofspeech)à BetterModelTherefore:Morespeakersdatausedintraining=Lessdatalost
fromeachspeakertotraining
OutputofProsodylab-Aligner:Time-alignedTextgrid
14
AssessingAccuracy• Assessmentbasedon10speakers(fourGEN1andsixGEN2)• Examined first10usabletextgrids foreachspeaker
GoldStandard:ManuallyidentifyvowelboundariesforallCANmonophthongs
AssessingAccuracyProcedures• Record“GoldStandard”vowelboundaries• RecordAuto-alignedvowelboundariesSegmentBoundaries:Solo-aligned SegmentBoundaries:Gen-aligned SegmentBoundaries:All-aligned
AssessingAccuracy
• Manual(“GoldStandard”)Measurementstakenofleft&RightboundariesofMonopthongs• ComparedtoAutoboundaries:Differentialonleft&right,ABSofdiff.,diff.oftotallength
• Root-Mean-Square-Deviationtakenofeachboundary(Chenetal2004)
• Average Lengthofvowelsforeachmodel• %ofvowels’centres (by“GoldStandard”)whichfallwithintheauto-
alignedboundaries
TranscriptionIssues
Entiretyof“O5Lam2Jiu3”within“Gong2”boundaries Samefile:Thealigner“Catchesup”andalignslatersectionswithexcellentaccuracy
ModelingSilence
Alignerplaces“Hei2Maa5”audiosignalwithinsilence
• TheeffectismorecommoninSolo-alignedtextgrids• Hypothesis:Silencemodellingisbetterwithmoredataformodeltraining
SyllableFusionIssues
Fusion ofMei-Je-->Me
Fusion ofZa-Hai-->ZeiFusion ofSeng-Jat--> Set
Wong (2006)
• Somerareexamplescauseproblems: SengJat• However,whenweuseaclosertranscription,
thealignerdoeswell
ResultsTableSOLO GEN ALL
RootMeanSquareDeviation–LeftBoundary
0.185 0.193 0.214
Root MeanSquareDeviation–RightBoundary
0.187 0.197 0.207
# ofVowelsinTarget 383 368 382
%VowelsinTarget 81.84% 78.63% 81.62%
Avg.AutoV.Length 0.127s 0.124s 0.132s
Avg. V.LengthDeviation 0.014s 0.011s 0.019s
Inspiteofproblems,quiteaccurate:
• Solo-trainedmodelhasthelowestdeviation fromgold-standardboundaries• All-trainedmodelpredictslongervowels:hencehigher%ofvowelcentres
withinboundaries, despitehighdeviation• Overly-longsegmentpredictionwouldbebadforstudiesof length,VOT,etc.
Summary• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?YES
• Whatisthebestbaselinetostartwith– Allspeakerstogether(ALL)?– Eachgenerationalgroupseparately(GEN)?– Eachspeakerindividually(SOLO)?
Discussion
• IsProsody-labalignereffectiveatproducingsufficientlyaccuratetranscriptalignmenttopermitautomatedmeasurementofvoweldata?– Yes,Overall,80%accuracyforallthreemodels– Canstillbeausefultoolinfacilitatingthevowelmeasurementprocesswithapreliminaryestimateofwherethevowelboundariesare
– Boundariescanbemanuallyadjustedlater.
Discussion
• Whatisthebestbaselinetostartwith– ALL• Moredataused,butmodelovergeneratesà resultedinhighRMSD
– SOLO• SlightlymoreaccurateandsmallerRMSDthanALLandGENmodels,butnotmuchdata/toomuchdatalosttotraining
– GEN• Areasonablecompromisebetweenamountofdatausedintrainingvs.generalaccuracy
Conclusion
• TheGENmodelworksbetterthanALL(contrarytoexpectations)possiblybecauseofsignificantinter-generationaldifferences(cf.Tse2015)
• Yet,evenwithasmuchvariationaspresent,itisstillgenerallyaccurate,andcanbeausefultoolforCantonesecorpus-basedstudies.
• Usefulforanystudythatrequiressegmentalboundaryinformation– Ex:VOT,vowellength,vowelformantmeasures,tone,consonants,etc
多謝감사합니다дякую СпасибоGraziemolto gratsiə namuor:ə
HLVCRAs:CameronAbmaVanessaBertoneUlyana BilaRosannaCallaMinji ChaAbigailChanKarenChanJoannaChociejSheilaChungTiffanyChungCourtneyClintonRachelCoulterRadu CraioveanuMarcoCoviZahid DaudjeeDerekDenisToniaDjogovicJoyceFok
PaoloFrascaMattGardnerRickGrimmDongkeun HanNataliaHarhajTaisa HewkaMelania HrycynaMichaelIannozziDianaKimJanyce KimIryna KulykMarianaKuzelaAnnKwonAlexLaGambaCarmelaLaRosaNataliaLapinskayaKrisLeeNikkiLeeOlgaLevitski
Arash LotfiSamuelLoPaulinaLyskawaRosaMastriTimea MolnárJamieOhMariaParascandoloRitaPangTiina RebaneHoyeon RimWillSawkiwMaksym ShkvoretsVeraRichetti SmithAnnaShalaginovaKonstantinShapovalYiQingSimMarioSoGaoAwet TekesteJosephineTong
SarahTruongDylanUscherElaineWangKa-manWongJunrui WuOliviaYuMinyi ZhuCollaborators:Yoonjung KangAlexeiKochetovNaomiNagyJamesWalkerFunding:SSHRC,UniversityofToronto,ShevchenkoFoundation
26
ReferencesChen,L.,Liu,Y.,Harper,M.P.,Maia,E.,&McRoy,S.(2004).EvaluatingFactorsImpactingtheAccuracyofForcedAlignments inaMultimodalCorpus.InLREC.Retrievedfromhttps://www-new.comp.nus.edu.sg/~rpnlpir/proceedings/lrec-2004/pdf/307.pdfGorman,K.,Howell,J.,&Wagner,M.(2011).Prosodylab-aligner:Atoolforforcedalignmentoflaboratoryspeech.CanadianAcoustics,39(3),192–193.Nagy,N.(2011).AMultilingual CorpustoExploreVariationinLanguageContactSituations.Rassegna Italiana DiLinguistica Applicata,43(1/2),65–84.Rosenfelder, I.,Fruehwald, J.,Evanini,K.,&Yuan,J.(2011).FAVE(ForcedAlignmentandVowelExtraction)ProgramSuite.Retrievedfromhttp://fave.ling.upenn.eduWong,WaiYiPeggy.2006“SyllableFusion inHongKongCantoneseConnectedSpeech.”Ph.D.Dissertation.TheOhioStateUniversity.
HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTOHTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC
• Slideswillbeavailableathttp://www.pitt.edu/~hbt3/presentations.html
• Thankyou!• 多謝晒!