CS395T: Structured Models for NLP Lecture 9: Trees...

38
CS395T: Structured Models for NLP Lecture 9: Trees 3 Greg Durrett

Transcript of CS395T: Structured Models for NLP Lecture 9: Trees...

Page 1: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

CS395T:StructuredModelsforNLPLecture9:Trees3

GregDurrett

Page 2: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Administrivia‣ Project1dueat*5pm*today

‣ Project2willbeoutbytonight.DueOctober17

‣ ShiL-reduceparser:greedymodel,beamsearchmodel,extension

Page 3: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Recall:Dependencies

DT NNTOVBDDT NNthe housetoranthe dog

‣ Dependencysyntax:syntacScstructureisdefinedbydependencies‣ Head(parent,governor)connectedtodependent(child,modifier)‣ EachwordhasexactlyoneparentexceptfortheROOTsymbol‣ Dependenciesmustformadirectedacyclicgraph

ROOT

Page 4: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Recall:ProjecSvity‣ ProjecSve<->no“crossing”arcs

dogsinhousesandcats thedograntothehouse

credit:LanguageLog

‣ Crossingarcs:

‣ Today:algorithmsforprojecSveparsing

Page 5: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

ThisLecture‣ Graph-baseddependencyparsing

‣ TransiSon-based(shiL-reduce)dependencyparsing

‣ Dynamicprogramsforexactinference—lookalotlikesequenSalCRFs

‣ Approximate,greedyinference—fast,butalialebitweird!

Page 6: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Graph-basedDependencyParsing

‣ Howdidweparselexicalizedtrees?

‣ NormalCKYistooslow:grammaristoolargeifitincludeswords

Page 7: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Graph-basedDependencyParsing‣ Naivealgorithm:O(n5)

Y[h] Z[h’]

X[h]

i h k h’ j

‣ CombinespanslikeCKYandlookattheirheads

‣ Fiveindicestoloopover‣ Featurescanlookatspansandheads

‣ Canbeappliedtodependencyparsesaswell!BuildsprojecSvetrees

‣Whatdoourscoreslooklike?Fornow,assumefeaturesonedge(head,child)pairwithsomeweights

Page 8: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Whyisthisinefficient?

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Lotsofspuriousambiguity—manywaystoderivetherightparses

‣ CansplitateitherpointandwecanbuildupsubtreesY[h] Z[h’]

X[h]

i h k h’ j

Page 9: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Completeitems:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitems:arcfrom“tall”to“short”end,wordonshortendhasparentbutmaybenotallofitschildren

‣ Cubic-SmealgorithmlikeCKY

‣Maintaintwochartswithdimension[n,n,2]:

Page 10: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

‣ Completeitem:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,maysSllexpectchildren

‣ Taketwoadjacentcompleteitems,addarcandbuildincompleteitem

= or

+ =

‣ Takeanincompleteitem,completeit(othercaseissymmetric)

Page 11: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

1)Buildincompletespan

2)Promotetocomplete

3)Buildincompletespan

+

=

+

or

=

Page 12: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

=

+

or

=4)Promotetocomplete

Page 13: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣We’vebuiltleLchildrenandrightchildrenofranascompleteitems

‣ AaachingtoROOTmakesanincompleteitemwithleLchildren,aaacheswithrightchildrensubsequentlytofinishtheparse

Page 14: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Eisner’sAlgorithm

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Eisner’salgorithmdoesn’thavesplitpointambiguiSeslikethis

‣ LeLandrightchildrenarebuiltindependently,headsareedgesofspans

‣ Chartsarenxnx2becauseweneedtotrackarcdirecSon/leLvsright

Eisner:

n5

Page 15: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

MSTParser‣ Viewdependencyparsingasfindingamaximumdirectspanningtree—spaceofallspanningtrees,sowefindnonprojecSvetreestoo!

‣ Chu-Liu-EdmondsalgorithmtofindthebestMSTinO(n2)

McDonaldetal.(2005)

‣ Ironically,thesoLwarearSfactcalledMSTParserhasanimplementaSonofEisner’salgorithm,whichiswhatmostpeopleuse

‣ Thisonlycomputesmaxes,butthereisanalgorithmforsummingoveralltreesaswell(matrix-treetheorem)

Page 16: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

BuildingSystems

‣ CanimplementViterbidecodingandmarginalcomputaSonusingEisner’salgorithmorMSTtomax/sumoverprojecSve/nonprojecSvetrees

‣ SameconceptassequenSalCRFsforNER,canalsousemargin-basedmethods—youknowhowtoimplementthese!

‣ Featuresareoverdependencyedges

Page 17: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

FeaturesinGraph-BasedParsing‣ Dynamicprogramexposestheparentandchildindices

‣McDonaldetal.(2005)—conjuncSonsofparentandchildwords+POS,POSofwordsinbetween,POSofsurroundingwords.~91UAS

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Leietal.(2014)—waysoflearningconjuncSonsofthese

‣ HEAD=TO&MOD=NN‣ HEAD=TO&MOD-1=the

‣ HEAD=TO&MOD=house‣ HEAD=TO&MOD=DT

Page 18: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

FeaturesinGraph-BasedParsing

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Ideallywouldusefeaturesonmorearcs

‣ Grandparents:ran->to->house

‣ Siblings:dog<-ran->to

Page 19: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Higher-OrderParsing‣ TerryKoo(2010)

‣ TrackaddiSonalstateduringparsingsowecanlookatgrandparentsandsiblings,O(n4)

‣ AddiSonalindicatorfeaturesbasedonthisinformaSon,~93UAS(upfrom91UAS)

‣ Turnsoutyoucanjustusebeamsearchandforgetthiscrazydynamicprogram…

Page 20: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

ShiL-ReduceParsing

Page 21: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

ShiL-ReduceParsing

‣ SimilartodeterminisScparsersforcompilers

‣ AtreeisbuiltfromasequenceofincrementaldecisionsmovingleLtorightthroughthesentence

‣ ShiLsconsumethebuffer,reducesbuildatreeonthestack

‣ StackcontainingparSally-builttree,buffercontainingrestofsentence

‣ AlsocalledtransiSon-basedparsing

Page 22: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

ShiL-ReduceParsing

Iatesomespaghewbolognese

ROOT

‣ ShiL1:Stack:[ROOTI]Buffer:[atesomespaghewbolognese]

‣ ShiL:topofbuffer->topofstack

‣ IniSalstate:Stack:[ROOT]Buffer:[Iatesomespaghewbolognese]

‣ ShiL2:Stack:[ROOTIate]Buffer:[somespaghewbolognese]

Page 23: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

ShiL-ReduceParsing

Iatesomespaghewbolognese

ROOT

‣ State:Stack:[ROOTIate]Buffer:[somespaghewbolognese]

‣ LeL-arc(reduceoperaSon):Letdenotethestack�‣ “Poptwoelements,addanarc,putthembackonthestack”

‣ State:Stack:[ROOTate]Buffer:[somespaghewbolognese]

I

�|w�2, w�1 ! �|w�1 w�1w�2 isnowachildof,

Page 24: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Arc-StandardParsing

‣ Start:stackcontains[ROOT],buffercontains[Iatesomespaghewbolognese]

‣ ShiL:topofbuffer->topofstack‣ LeL-Arc: �|w�2, w�1 ! �|w�1 w�1w�2

‣ Right-Arc �|w�2, w�1 ! �|w�2

isnowachildof,

w�1 w�2,

Iatesomespaghewbolognese

‣ End:stackcontains[ROOT],bufferisempty[]

‣Musttake2nstepsfornwords(nshiLs,nLA/RA)

isnowachildof

ROOT

‣ Arc-standardsystem:threeoperaSons

Page 25: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Arc-StandardParsing

[Iatesomespaghewbolognese][ROOT]

[ROOTI]

[ROOTIate]

[ROOTate]

I

S

S

L

‣ CoulddotheleLarclater!Butnoreasontowait‣ Can’taaachROOT<-ateyeteventhoughthisisacorrectdependency!

Stopofbuffer->topofstackLARA

[Isomespaghewbolognese]

[somespaghewbolognese]

[somespaghewbolognese]

Iatesomespaghewbolognese

ROOTpoptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

Page 26: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Arc-StandardParsing

[ROOTate]

I

[somespaghewbolognese]

[ROOTatesomespaghew]

I

[bolognese]

[ROOTatespaghew]

I some

[bolognese]

S

L

Iatesomespaghewbolognese

S

ROOT

S

Stopofbuffer->topofstackLARA

poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

Page 27: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Arc-StandardParsing

[ROOTatespaghewbolognese]

I some

[ROOTatespaghew]

I some bolognese[ROOTate]

Isome bolognesespaghew

‣ StackconsistsofallwordsthataresSllwaiSngforrightchildren,endwithabunchofright-arcops

[ROOT]

Isome bolognesespaghew

ate

[]

Iatesomespaghewbolognese

ROOT

[]

[][]

Finalstate:

R

R

Stopofbuffer->topofstackLARA

poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

Page 28: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

OtherSystems‣ Arc-eager(Nivre,2004):letsyouaddrightarcssoonerandkeepsitemsonstack,separatereduceacSonthatclearsoutthestack

‣ Arc-swiL(QiandManning,2017):explicitlychooseaparentfromwhat’sonthestack

‣Manywaystodecomposethese,whichoneworksbestdependsonthelanguageandfeatures

Page 29: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

BuildingShiL-ReduceParsers

[ROOTatesomespaghew]

I

[bolognese]

‣ CorrectacSonisleL-arc

‣MulS-wayclassificaSonproblem:shiL,leL-arc,orright-arc?

[ROOT] [Iatesomespaghewbolognese]

‣ Howdowemaketherightdecisioninthiscase?

‣ Howdowemaketherightdecisioninthiscase?(allthreeacSonslegal)

‣ Onlyonelegalmove(shiL)

Page 30: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

FeaturesforShiL-ReduceParsing

[ROOTatesomespaghew]

I

[bolognese]

‣ FeaturestoknowthisshouldleL-arc?

‣ Oneoftheharderfeaturedesigntasks!

‣ Inthiscase:thestacktagsequenceVBD-DT-NNispreayinformaSve—lookslikeaverbtakingadirectobjectwhichhasadeterminerinit

‣ Thingstolookat:topwords/POSofbuffer,topwords/POSofstack,leLmostandrightmostchildrenoftopitemsonthestack

Page 31: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

TrainingaGreedyModel

‣ Useouroracletoextractparserstates+correctdecisions

‣ Problem:nolookahead

[ROOTatesomespaghew]

I

[bolognese]

‣ Trainaclassifiertopredicttherightdecisionusingtheseastrainingdata

‣ Thealgorithmwe’vedevelopedsofarisanoracle,tellsusthecorrectstatetransiSonsequenceforeachtree

‣ Nolookahead‣ Trainingdataisextractedassumingeverythingiscorrect

Page 32: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

DynamicOracle

‣ Needadynamicoracletodeterminewhat’stheopSmalthingtodoevenifmistakeshavealreadybeenmade(soweknowhowtosuperviseit)

[ROOTatesomespaghew]

I

[bolognese]

‣ ExtracttrainingdatabasedontheoraclebutalsoanexecuSontraceofatrainedparser

GoldbergandNivre(2012)

‣We’llseesimilarideasinneuralnetcontextsaswell

Page 33: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

SpeedTradeoffs

UnopSmizedS-R{{{{

ChenandManning(2014)

OpSmizedS-R

Graph-basedNeuralS-R

‣ OpSmizedconsStuencyparsersare~5sentences/sec

‣ UsingS-Rusedtomeantakingaperformancehitcomparedtograph-based,that’snolongertrue

Page 34: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

GlobalDecoding

[ROOTatesomespaghew]

I

[bolognese]

‣ Trytofindthehighest-scoringsequenceofdecisions

‣ Globalsearchproblem,requiresapproximatesearch

Page 35: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

GlobalDecoding

[ROOTgavehim]

I

[dinner]

‣ Correct:Right-arc,ShiL,Right-arc,Right-arc

Igavehimdinner

ROOT

[ROOTgave]

I

[dinner]

him

[ROOTgavedinner]

I

[]

him

[ROOTgave]

I

[]

him dinner

Page 36: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

GlobalDecoding:ACartoon

S

LA

RA

‣ Bothwrong!Alsobothprobablylowscoring!

RA S‣ Correct,highscoringopSon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

[ROOTgavehimdinner]

I

[]

LA

[ROOTgave]

I him

[dinner]

Page 37: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

GlobalDecoding:ACartoon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

‣ Lookaheadcanhelpusavoidgewngstuckinbadspots

‣ Globalmodel:maximizesumofscoresoveralldecisions

‣ SimilartohowViterbiworks:wemaintainuncertaintyoverthecurrentstatesothatifanotheronelooksmoreopSmalgoingforward,wecanusethatone

Page 38: CS395T: Structured Models for NLP Lecture 9: Trees 3gdurrett/courses/fa2017/lectures/lec9-1pp.pdf · Eisner’s Algorithm DT NN VBD TO DT NN the dog ran to the house ROOT ‣ Eisner’s

Recap

‣ Eisner’salgorithmforgraph-basedparsing

‣ Arc-standardsystemfortransiSon-basedparsing

‣ Runaclassifieranddoitgreedilyfornow,we’llseeglobalsystemsnextSme