Post on 04-Mar-2021
TextSimilarity
Announcements• NoteproblemswithMidtermmul2plechoiceques2ons2and3.Ifyougotthemwrong,youwillgetcredit.BringyourexambacktoTAhours.
• Therewillbearecita2ontomorrowonHW3.BesuretoprovideyourinterestandavailabilityonPiazza.
• Youhave2weeksforHW3.Duedateontheassignmentiscorrect.Ihaveupdatedwebsite.
TimetoRe4lect• Whydoesaneuralnetwork?
• Whatisitgoodat?
• Empiricalvstheory:whatdoweknow?
Supervised Machine Learning
x Inputfeaturevector
σ
ŷ
w
Predictedvalue
Sigmoidorother
nonlinearity
Parameters(thingswe'relearning)
Supervised Machine Learning
xw
σ
ŷ y Actualvalue
Howwrongwerewe?
Updateparameters
HighlightsofNeuralNets• Learnarepresenta2on,notjusttopredict
• Cri2calcomponentistheembeddinglayer• Mappingfromdiscretesymbolstocon2nuousvectorsinlow-dimensionalspace
• Seman&crepresenta&on:Distributed
• Feed-forwardneuralnetworks(mul2-layerperceptron)canbeusedanywherealinearclassifierisused• Superiorperformanceo[enduetonon-linearity
• Whichparametervalues,whichneuralnet(RNN,CNN,LSTM)arebestforataskisdeterminedexperimentally
7
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
Accuracy
ASR
76
~95
ImageRecogni2on72
Hugeleapforwardin‘SpeechRecogni2on’and‘ImageRecogni2on’
Slidecredit:OmidBakhshandeh
TrendinNLPTasks
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
8
Accuracy
75.6 80.4ParaphraseIden2fica2on
91 9288.99
91.2
NER
96
97
POS
DepParsing
96
97
94.2 94.4
NPChunking
Slidecredit:OmidBakhshandeh
Parsing
TimetoRe4lect• Yourreac2onstoneuralnetssofar
• Aretheys2llconfusing?
• Doyouneedtoseemore?
• Areyouconvinced(yet)?
• Aretheyintriguing?
• Doyouwanttoseemore?
• Successisempiricallydetermined:isempiricalvstheore2calproblema2c?
SynonomyandParaphrase• Acri2calpieceoftextinterpreta2on• Canbedomain-specific• Wordsynonomy:
• Generaldomain:“hot”“sexy”butbiology?
General Biology
hot Warm,sexy,exci2ng Heated,warm,thermal
treat Address,handle Cure,fight,kill
head Leader,boss,mind Skull,brain,cranium
ExamplesfromPavlick
SententialParaphrase• Paraphrasesextractedfromdifferenttransla2onsofthesamenovel
ExamplesfromBarzilay
Emmaburstintotearsandhetriedtocomforther,sayingthingstomakehersmile.
Emmacried,andhetriedtoconsoleher,adorninghiswordswithpuns
Andfinally,dazzlinglywhite,itshonehighabovethemintheemptysky.
Itappearedwhiteanddazzling,intheemptyheavens.
Peoplesaid“TheEveningNoiseissounding,thesunissemng.”
“Theeveningbellisringing”peopleusedtosay.
Phrasalparaphrases
King’sson Sonoftheking
Inbooles booled
Starttotalk Starttalking
Suddenlycame Camesuddenly
Makeappearance appear
ExamplesfromBarzilay
TypesofTextSimilarity
• Manytypesoftextsimilarityexist:• Morphologicalsimilarity(e.g.,respect-respecpul)• Spellingsimilarity(e.g.,theater-theatre)• Synonymy(e.g.,talka2ve-chaoy)• Homophony(e.g.,raise-raze-rays)• Seman2csimilarity(e.g.,cat-tabby)• Sentencesimilarity(e.g.,paraphrases)• Documentsimilarity(e.g.,twonewsstoriesonthesameevent)
SlidefromRadev
Tasksrequiringtextsimilarity• Informa2onretrieval
• Machinetransla2on
• Summariza2on
• Inference
Usingwordembeddingstocomputesimilarity• Cosinesimilarity
• Whenvectorshaveunitlength,cosinesimilarityisthedotproduct
• Commontonormalizeembeddingsmatrixsothateachrowasunitlength
SimilarityMeasures(Cont.)
X
Y
• Cosinesimilarity:similarityoftwovectors,normalized
cos(X,Y ) = x1y1 + x2y2 +...+ xnynx12 +...+ xn
2 ⋅ y12 +...+ yn
2=
xiyii=1
n
∑
xi2
i=1
n
∑ ⋅ yi2
i=1
n
∑
SlidefromRadev
DocumentSimilarity
• Usedininforma2onretrievaltodeterminewhichdocument(d1ord2)ismoresimilartoagivenqueryq.
• Documentsandqueriesarerepresentedinthesamespace.
• Angle(orcosine)isaproxyforsimilaritybetweentwovectors
SlidefromRadev
Quiz
• Giventhreedocuments
D1=<1,3>D2=<10,30>D3=<3,1>
• Computethecosinescores
σ(D1,D2)σ(D1,D3)
• Whatdothenumberstellyou?
SlidefromRadev
Quiz• Whatistherangeofvaluesthatthecosinescorescantake?
SlidefromRadev
FindingSimilarWords• FindingkmostsimilarwordswhereEanembeddingmatrixforallwords.• w=E[w]• S=Ew
• Avectorofsimilari2es• S[i]=similarityofwtoithword• K-mostsimilarwords?
• Howcanwefindthek-mostsimilarwordsthatarealsoorthographicallysimilar?
Similaritytoagroupofwords• Given:wi...wkthatareseman2callysimilar
• Findwjsuchthatitisthemostseman2callysimilartothegroup
• Definesimilarityasaveragesimilaritytothegroup:1/kΣi-1ksimcos(w,wi)s=E(w)E(w1+w2+…+wk)/k
• Howwouldwecomputeoddwordout?
ShortDocumentSimilarity• Wecantrainamodelorwecanjustusewordembeddings
• Suitableforveryshorttextssuchasqueries,newspaperheadlinesortweets
• Similarity=thesumofthepairwisesimilari2esofallwordsinthedocument
ComputingDocumentSimilarity• WhereD1=w1
1…w1mandD2=w2
1….w2n
• Equivalentto:
• Allows:Documentcollec2onDisamatrixwhereeachrowiisadocument.Similaritywithanewdocument:
AnalogySolvingTask•
• Equivalentto(COS-ADD)(LevyandGoldberg2014)
• “…itisnotclearwhatsuccessonabenchmarkofanalogytaskssaysaboutthequalityofwordembeddingsbeyondtheirsuitabilityforsolvingthispar2culartask.”(Goldberg2017)
UsingWordNetandotherparaphrasecorpora• (PPDB)PennParaphraseDatabase(PavlickandCallison-Burch)
• Canweusewordpairsthatreflectsimilaritybeoerforthetask?• Pre-trainedembeddingsE• GraphGrepresen2ngsimilarwordpairs• SearchforanewwordembeddingmatrixE’whoserowsareclosetoEbutalsoclosetoG
• Methodsforcombiningpre-trainedwordembeddingswithsmaller,specializedembeddings
Caveats• Don’tjustuseoff-the-shelfwordembeddingsblindly
• Experimentwithcorpusandhyper-parametersemngs
• Whenusingoff-the-shelfembeddings,usethesametokeniza2onandnormaliza2on
Resources• Wordembeddings
• hops://code.google.com/p/word2vec/• hop://nlp.stanford.edu/projects/glove/images/compara2ve_superla2ve.jpg
• Neuralnetplaporms• Kerrashops://keras.io/• Pytorchhop://pytorch.org/• Tensorflowhops://www.tensorflow.org/• Theanohop://deeplearning.net/so[ware/theano/
Languageismadeupofsequences• Sofarwehaveseenembeddingsforwords
• (andmethodsforcombiningthroughvectorconcatena2onandarithme2c)
• Buthowcanweaccountforsequences?• Wordsassequencesofleoers• Sentencesassequencesofwords• Documentsassequencesofsentences
RecurrentNeuralNetworks• Representarbitrarilysizedsequencesinfixed-sizevector
• Goodatcapturingsta2s2calregulari2esinsequences(ordermaoers)
• IncludesimpleRNNs,Longshort-termmemory(LSTMs),GatedRecurrentUnit(GRUs)
[Thangetal.2013] [Bowmanetal.2014]
Learningwordmeaning Logicalentailmentfromtheirmorphsusingcompositional
semanticsviaRNNs
MachineTranslation(Sequences)
• Sequence-to-sequence• Sutskeveretal.2014
RNNAbstraction• RNNisafunc2onthattakesanarbitrarylengthsequenceasinputandreturnsasingledoutdimensionalvectorasoutput• Input:x1:n=x1x2…xn(xiεRd-in)• Output:ynεRd-out
OOutputvectoryusedforfurtherpredic2on
RNNCharacteristics• Cancondi2onontheen2resequencewithoutresor2ngtotheMarkovassump2on
• Cangetverygoodlanguagemodelsaswellasgoodperformanceonmanyothertasks
RNNsarede4inedrecursively• Bymeansofafunc2onRtakingasinputastatevectorhi-1andaninputvectorxi
• Returnsanewstatevectorhi
• Thestatevectorcanbemappedtoanoutputvectoryiusingasimpledeterminis2cfunc2on
• Andfedthroughso[maxforclassifica2on.
wx
wh
RecurrentNeuralNetworks
x1
h0
h1
ℎ↓𝑡 =𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )
σ
SlidefromRadev
wx
wh
RNN
x1
h0
h1
ℎ↓𝑡 =𝜎(𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )𝑦↓𝑡 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑊↓𝑦 ℎ↓𝑡 )
σ
y1so8maxwy
SlidefromRadev
RNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3so8max
The cat sat
wy
SlidefromRadev
UpdatingParametersofanRNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3so8max
The cat sat
Cost
wy
Backpropaga2onthrough2me
SlidefromRadev
NextTime• MoreonRNNsandtheiruseinsen2mentanalysis