Unsupervised Cross-Modal Alignment of Speech...
Transcript of Unsupervised Cross-Modal Alignment of Speech...
4. Experimental Settings Datasets
Details of Training•Speech2VecwithSkip-grams,window=3• Encoder:single-layerbidirectionalLSTM• Decoder:single-layerunidirectionalLSTM• SGDwithfixedlearningrateof0.001
•Word2VecfastTextimplementation•Bothdimensions=50•Discriminatorinadversarialtraining• 2layers,512neurons,ReLU
5. Results
Task I | Spoken Word Recognition• Accuracydecreasesasthelevelofsupervisiondecreases• Unsupervisedalignmentapproachisalmostaseffectiveasitsu-pervisedcounterpart(Avs.A*)• Wordsegmentationisacriticalstep• Applytodifferentcorporasettings
Task II | Spoken Word Synonyms Retrieval• Theoutputactuallycontainbothsynonymsanddifferentlexicalformsoftheaudiosegment• Alsoconsidersynonymsasvalidresults
Task III | Spoken Word Translation• Moresupervisionyieldsbetterperformance• Translationusingthesamecorpusoutperformsthoseusingdiffer-entcorpora
2. Learning Embeddings
Text Embedding Space• TrainWord2Vec[Mikolovetal.,2013]onthetextcorpus• Unsupervisedlearningofdistributedwordrepresentationsthatmodelwordsemantics
Speech Embedding Space• TrainSpeech2Vec[ChungandGlass,2018]onthespeechcorpus• Thecorpusispre-processedbyanoff-the-shelfspeechsegmentationalgorithmsuchthatutterancesaresegmentedintoaudiosegmentscorrespondingtospokenwords• SpeechversionofWord2Vec:unsupervisedsemanticaudiosegmentrepresentations
Unsupervised Cross-Modal Alignment of Speech and Text Embedding SpacesYu-An Chung, Wei-Hung Weng, Schrasing Tong, James GlassComputer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
1. Overview
• Goal: To learn a linear mapping between speech & text embedding spaces
3. Embedding Spaces Alignment
References•ChungandGlass.Speech2vec:Asequence-to-sequenceframeworkforlearningwordembed-dingsfromspeech.INTERSPEECH2018.•Lampleetal.Wordtranslationwithoutparalleldata.ICLR2018.•Mikolovetal.Distributedrepresentationsofwordsandphrasesandtheircompositionality.NIPS2013.
• Bothembeddingspacesarelearnedfromcorporabasedondistri-butionalhypothesis(e.g.,skip-grams)→approximatelyisomorphic• Constructthesyntheticmappingdictionarytolearnalinearmappingmatrixbetweenthetwoembeddingspaces
Adversarial Training• Makethealignedembeddingsindistinguishable
Refinement (Orthogonal Procrustes Problem)• UsetheWlearnedfromtheadversarialtrainingstepasaninitialproxyandbuildasyntheticparalleldictionary• Considerthemostfrequentwords
Method Comparison