CS395T: Structured Models for NLP Lecture 9: Trees...

Post on 11-Jul-2020

5 views 0 download

Transcript of CS395T: Structured Models for NLP Lecture 9: Trees...

CS395T:StructuredModelsforNLPLecture9:Trees3

GregDurrett

Administrivia‣ Project1dueat*5pm*today

‣ Project2willbeoutbytonight.DueOctober17

‣ ShiL-reduceparser:greedymodel,beamsearchmodel,extension

Recall:Dependencies

DT NNTOVBDDT NNthe housetoranthe dog

‣ Dependencysyntax:syntacScstructureisdefinedbydependencies‣ Head(parent,governor)connectedtodependent(child,modifier)‣ EachwordhasexactlyoneparentexceptfortheROOTsymbol‣ Dependenciesmustformadirectedacyclicgraph

ROOT

Recall:ProjecSvity‣ ProjecSve<->no“crossing”arcs

dogsinhousesandcats thedograntothehouse

credit:LanguageLog

‣ Crossingarcs:

‣ Today:algorithmsforprojecSveparsing

ThisLecture‣ Graph-baseddependencyparsing

‣ TransiSon-based(shiL-reduce)dependencyparsing

‣ Dynamicprogramsforexactinference—lookalotlikesequenSalCRFs

‣ Approximate,greedyinference—fast,butalialebitweird!

Graph-basedDependencyParsing

‣ Howdidweparselexicalizedtrees?

‣ NormalCKYistooslow:grammaristoolargeifitincludeswords

Graph-basedDependencyParsing‣ Naivealgorithm:O(n5)

Y[h] Z[h’]

X[h]

i h k h’ j

‣ CombinespanslikeCKYandlookattheirheads

‣ Fiveindicestoloopover‣ Featurescanlookatspansandheads

‣ Canbeappliedtodependencyparsesaswell!BuildsprojecSvetrees

‣Whatdoourscoreslooklike?Fornow,assumefeaturesonedge(head,child)pairwithsomeweights

Whyisthisinefficient?

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Lotsofspuriousambiguity—manywaystoderivetherightparses

‣ CansplitateitherpointandwecanbuildupsubtreesY[h] Z[h’]

X[h]

i h k h’ j

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Completeitems:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitems:arcfrom“tall”to“short”end,wordonshortendhasparentbutmaybenotallofitschildren

‣ Cubic-SmealgorithmlikeCKY

‣Maintaintwochartswithdimension[n,n,2]:

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

‣ Completeitem:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,maysSllexpectchildren

‣ Taketwoadjacentcompleteitems,addarcandbuildincompleteitem

= or

+ =

‣ Takeanincompleteitem,completeit(othercaseissymmetric)

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

1)Buildincompletespan

2)Promotetocomplete

3)Buildincompletespan

+

=

+

or

=

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

=

+

or

=4)Promotetocomplete

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣We’vebuiltleLchildrenandrightchildrenofranascompleteitems

‣ AaachingtoROOTmakesanincompleteitemwithleLchildren,aaacheswithrightchildrensubsequentlytofinishtheparse

Eisner’sAlgorithm

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Eisner’salgorithmdoesn’thavesplitpointambiguiSeslikethis

‣ LeLandrightchildrenarebuiltindependently,headsareedgesofspans

‣ Chartsarenxnx2becauseweneedtotrackarcdirecSon/leLvsright

Eisner:

n5

MSTParser‣ Viewdependencyparsingasfindingamaximumdirectspanningtree—spaceofallspanningtrees,sowefindnonprojecSvetreestoo!

‣ Chu-Liu-EdmondsalgorithmtofindthebestMSTinO(n2)

McDonaldetal.(2005)

‣ Ironically,thesoLwarearSfactcalledMSTParserhasanimplementaSonofEisner’salgorithm,whichiswhatmostpeopleuse

‣ Thisonlycomputesmaxes,butthereisanalgorithmforsummingoveralltreesaswell(matrix-treetheorem)

BuildingSystems

‣ CanimplementViterbidecodingandmarginalcomputaSonusingEisner’salgorithmorMSTtomax/sumoverprojecSve/nonprojecSvetrees

‣ SameconceptassequenSalCRFsforNER,canalsousemargin-basedmethods—youknowhowtoimplementthese!

‣ Featuresareoverdependencyedges

FeaturesinGraph-BasedParsing‣ Dynamicprogramexposestheparentandchildindices

‣McDonaldetal.(2005)—conjuncSonsofparentandchildwords+POS,POSofwordsinbetween,POSofsurroundingwords.~91UAS

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Leietal.(2014)—waysoflearningconjuncSonsofthese

‣ HEAD=TO&MOD=NN‣ HEAD=TO&MOD-1=the

‣ HEAD=TO&MOD=house‣ HEAD=TO&MOD=DT

FeaturesinGraph-BasedParsing

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Ideallywouldusefeaturesonmorearcs

‣ Grandparents:ran->to->house

‣ Siblings:dog<-ran->to

Higher-OrderParsing‣ TerryKoo(2010)

‣ TrackaddiSonalstateduringparsingsowecanlookatgrandparentsandsiblings,O(n4)

‣ AddiSonalindicatorfeaturesbasedonthisinformaSon,~93UAS(upfrom91UAS)

‣ Turnsoutyoucanjustusebeamsearchandforgetthiscrazydynamicprogram…

ShiL-ReduceParsing

ShiL-ReduceParsing

‣ SimilartodeterminisScparsersforcompilers

‣ AtreeisbuiltfromasequenceofincrementaldecisionsmovingleLtorightthroughthesentence

‣ ShiLsconsumethebuffer,reducesbuildatreeonthestack

‣ StackcontainingparSally-builttree,buffercontainingrestofsentence

‣ AlsocalledtransiSon-basedparsing

ShiL-ReduceParsing

Iatesomespaghewbolognese

ROOT

‣ ShiL1:Stack:[ROOTI]Buffer:[atesomespaghewbolognese]

‣ ShiL:topofbuffer->topofstack

‣ IniSalstate:Stack:[ROOT]Buffer:[Iatesomespaghewbolognese]

‣ ShiL2:Stack:[ROOTIate]Buffer:[somespaghewbolognese]

ShiL-ReduceParsing

Iatesomespaghewbolognese

ROOT

‣ State:Stack:[ROOTIate]Buffer:[somespaghewbolognese]

‣ LeL-arc(reduceoperaSon):Letdenotethestack�‣ “Poptwoelements,addanarc,putthembackonthestack”

‣ State:Stack:[ROOTate]Buffer:[somespaghewbolognese]

I

�|w�2, w�1 ! �|w�1 w�1w�2 isnowachildof,

Arc-StandardParsing

‣ Start:stackcontains[ROOT],buffercontains[Iatesomespaghewbolognese]

‣ ShiL:topofbuffer->topofstack‣ LeL-Arc: �|w�2, w�1 ! �|w�1 w�1w�2

‣ Right-Arc �|w�2, w�1 ! �|w�2

isnowachildof,

w�1 w�2,

Iatesomespaghewbolognese

‣ End:stackcontains[ROOT],bufferisempty[]

‣Musttake2nstepsfornwords(nshiLs,nLA/RA)

isnowachildof

ROOT

‣ Arc-standardsystem:threeoperaSons

Arc-StandardParsing

[Iatesomespaghewbolognese][ROOT]

[ROOTI]

[ROOTIate]

[ROOTate]

I

S

S

L

‣ CoulddotheleLarclater!Butnoreasontowait‣ Can’taaachROOT<-ateyeteventhoughthisisacorrectdependency!

Stopofbuffer->topofstackLARA

[Isomespaghewbolognese]

[somespaghewbolognese]

[somespaghewbolognese]

Iatesomespaghewbolognese

ROOTpoptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

Arc-StandardParsing

[ROOTate]

I

[somespaghewbolognese]

[ROOTatesomespaghew]

I

[bolognese]

[ROOTatespaghew]

I some

[bolognese]

S

L

Iatesomespaghewbolognese

S

ROOT

S

Stopofbuffer->topofstackLARA

poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

Arc-StandardParsing

[ROOTatespaghewbolognese]

I some

[ROOTatespaghew]

I some bolognese[ROOTate]

Isome bolognesespaghew

‣ StackconsistsofallwordsthataresSllwaiSngforrightchildren,endwithabunchofright-arcops

[ROOT]

Isome bolognesespaghew

ate

[]

Iatesomespaghewbolognese

ROOT

[]

[][]

Finalstate:

R

R

Stopofbuffer->topofstackLARA

poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem

OtherSystems‣ Arc-eager(Nivre,2004):letsyouaddrightarcssoonerandkeepsitemsonstack,separatereduceacSonthatclearsoutthestack

‣ Arc-swiL(QiandManning,2017):explicitlychooseaparentfromwhat’sonthestack

‣Manywaystodecomposethese,whichoneworksbestdependsonthelanguageandfeatures

BuildingShiL-ReduceParsers

[ROOTatesomespaghew]

I

[bolognese]

‣ CorrectacSonisleL-arc

‣MulS-wayclassificaSonproblem:shiL,leL-arc,orright-arc?

[ROOT] [Iatesomespaghewbolognese]

‣ Howdowemaketherightdecisioninthiscase?

‣ Howdowemaketherightdecisioninthiscase?(allthreeacSonslegal)

‣ Onlyonelegalmove(shiL)

FeaturesforShiL-ReduceParsing

[ROOTatesomespaghew]

I

[bolognese]

‣ FeaturestoknowthisshouldleL-arc?

‣ Oneoftheharderfeaturedesigntasks!

‣ Inthiscase:thestacktagsequenceVBD-DT-NNispreayinformaSve—lookslikeaverbtakingadirectobjectwhichhasadeterminerinit

‣ Thingstolookat:topwords/POSofbuffer,topwords/POSofstack,leLmostandrightmostchildrenoftopitemsonthestack

TrainingaGreedyModel

‣ Useouroracletoextractparserstates+correctdecisions

‣ Problem:nolookahead

[ROOTatesomespaghew]

I

[bolognese]

‣ Trainaclassifiertopredicttherightdecisionusingtheseastrainingdata

‣ Thealgorithmwe’vedevelopedsofarisanoracle,tellsusthecorrectstatetransiSonsequenceforeachtree

‣ Nolookahead‣ Trainingdataisextractedassumingeverythingiscorrect

DynamicOracle

‣ Needadynamicoracletodeterminewhat’stheopSmalthingtodoevenifmistakeshavealreadybeenmade(soweknowhowtosuperviseit)

[ROOTatesomespaghew]

I

[bolognese]

‣ ExtracttrainingdatabasedontheoraclebutalsoanexecuSontraceofatrainedparser

GoldbergandNivre(2012)

‣We’llseesimilarideasinneuralnetcontextsaswell

SpeedTradeoffs

UnopSmizedS-R{{{{

ChenandManning(2014)

OpSmizedS-R

Graph-basedNeuralS-R

‣ OpSmizedconsStuencyparsersare~5sentences/sec

‣ UsingS-Rusedtomeantakingaperformancehitcomparedtograph-based,that’snolongertrue

GlobalDecoding

[ROOTatesomespaghew]

I

[bolognese]

‣ Trytofindthehighest-scoringsequenceofdecisions

‣ Globalsearchproblem,requiresapproximatesearch

GlobalDecoding

[ROOTgavehim]

I

[dinner]

‣ Correct:Right-arc,ShiL,Right-arc,Right-arc

Igavehimdinner

ROOT

[ROOTgave]

I

[dinner]

him

[ROOTgavedinner]

I

[]

him

[ROOTgave]

I

[]

him dinner

GlobalDecoding:ACartoon

S

LA

RA

‣ Bothwrong!Alsobothprobablylowscoring!

RA S‣ Correct,highscoringopSon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

[ROOTgavehimdinner]

I

[]

LA

[ROOTgave]

I him

[dinner]

GlobalDecoding:ACartoon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

‣ Lookaheadcanhelpusavoidgewngstuckinbadspots

‣ Globalmodel:maximizesumofscoresoveralldecisions

‣ SimilartohowViterbiworks:wemaintainuncertaintyoverthecurrentstatesothatifanotheronelooksmoreopSmalgoingforward,wecanusethatone

Recap

‣ Eisner’salgorithmforgraph-basedparsing

‣ Arc-standardsystemfortransiSon-basedparsing

‣ Runaclassifieranddoitgreedilyfornow,we’llseeglobalsystemsnextSme