Algorithms for NLPtbergkir/11711fa17/FA17 11-711 lecture 15... · 2017-10-19 · P1 Shout-outs §...
Transcript of Algorithms for NLPtbergkir/11711fa17/FA17 11-711 lecture 15... · 2017-10-19 · P1 Shout-outs §...
ParsingVITaylorBerg-Kirkpatrick– CMU
Slides:DanKlein– UCBerkeley
AlgorithmsforNLP
P1Shout-outs§ Saksham Singhal -- implementedpseudo-tries.Usedimplicitcaching(storedthemost
frequentn-gramsontopofhashtables)andexplicitcaching.§ Soumya Wadhwa,Tejas Nama-- approximatedbyignoringalltrigramswithcount1.
ThatdroppedBLEUscorebylessthan0.1onlybutfreedhalfthememory!§ CraigStewart-- rehashannealingidea.Maderesizingfactorandloadfactorchange
witheveryrehashtoconvergeto0.9loadfactortominimizewastedspace.§ GriffinThomasAdams-- Builta"waterfall"tieredcachesystem§ DeanAlderucci -- Builtaclasstopackdatatypesofarbitrarysizeintoanarrayoflongs.
Builtacustomimplementationoflogthatranfaster.§ RobinJonathanAlgayres – Contexttrie!§ Raghuram Mandyam Annasamy -- Useddatabaseinspiredsharding techniqueonkeys§ Xianyang Chen-- Compressedhashtableanddidsmarterbinarysearchbyindexing
chunkswiththesamelastword§ Aldrian Obaja -- ImplementedNestedMap,achieving792MBofmemory.§ Otherthingsmanypeopledid -- LRUcaching,packingmultiplevalues(countsand
contextfertilites)intoasinglelong,binarysearchinsteadofhashtable.
GrammarProjections
NP→DT@NP
Coarse Grammar Fine Grammar
DT
NP
JJ
@NP
@NPNN
@NP
NN
DT^NP
NP^VP
JJ^NP
@NP^VP[DT]
@NP^VP[…,NN]NN^NP
@NP^VP[…,JJ]
NN^NP
NP^VP→DT^NP@NP^VP[DT]
Note:X-BarGrammarsareprojectionswithruleslikeXP→Y@XorXP→@XYor@X→X
EfficientParsingforStructuralAnnotation
Coarse-to-FinePruning
… QP NP VP …coarse:
fine:
E.g.considerthespan5to12:
< thresholdP (X|i, j, S)
Coarse-to-FinePruning
ForeachcoarsechartitemX[i,j],computeposteriorprobability:
… QP NP VP …coarse:
fine:
E.g.considerthespan5to12:
< threshold↵(X, i, j) · �(X, i, j)
↵(root, 0, n)
ComputingMarginals↵(X, i, j) =
X
X!Y Z
X
k2(i,j)
P (X ! Y Z)↵(Y, i, k)↵(Z, k, j)
ComputingMarginals�(X, i, j) =
X
Y!ZX
X
k2[0,i)
P (Y ! ZX)�(Y, k, j)↵(B, k, i)
+X
Y!XZ
X
k2(j,n]
P (Y ! XZ)�(Y, i, k)↵(Z, j, k)
EfficientParsingforLexicalGrammars
LexicalizedTrees
§ Add“headwords”toeachphrasalnode§ Syntacticvs.semantic
heads§ Headshipnotin(most)
treebanks§ Usuallyuseheadrules,
e.g.:§ NP:
§ TakeleftmostNP§ TakerightmostN*§ TakerightmostJJ§ Takerightchild
§ VP:§ TakeleftmostVB*§ TakeleftmostVP§ Takeleftchild
LexicalizedPCFGs?§ Problem:wenowhavetoestimateprobabilitieslike
§ Nevergoingtogettheseatomicallyoffofatreebank
§ Solution:breakupderivationintosmallersteps
LexicalDerivationSteps§ Aderivationofalocaltree[Collins99]
Choose a head tag and word
Choose a complement bag
Generate children (incl. adjuncts)
Recursively derive children
LexicalizedCKY
bestScore(X,i,j,h)if (j = i+1)return tagScore(X,s[i])
elsereturn max max score(X[h]->Y[h] Z[h’]) *
bestScore(Y,i,k,h) *bestScore(Z,k,j,h’)
max score(X[h]->Y[h’] Z[h]) *bestScore(Y,i,k,h’) *bestScore(Z,k,j,h)
Y[h] Z[h’]
X[h]
i h k h’ j
k,h’,X->YZ
(VP->VBD •)[saw] NP[her]
(VP->VBD...NP •)[saw]
k,h’,X->YZ
QuarticParsing§ Turnsout,youcando(alittle)better[Eisner99]
§ GivesanO(n4)algorithm§ Stillprohibitiveinpracticeifnotpruned
Y[h] Z[h’]
X[h]
i h k h’ j
Y[h] Z
X[h]
i h k j
PruningwithBeams§ TheCollinsparserpruneswithper-
cellbeams[Collins99]§ Essentially,runtheO(n5) CKY§ Rememberonlyafewhypothesesfor
eachspan<i,j>.§ IfwekeepKhypothesesateachspan,
thenwedoatmostO(nK2)workperspan(why?)
§ Keepsthingsmoreorlesscubic(andinpracticeismorelikelinear!)
§ Also:certainspansareforbiddenentirelyonthebasisofpunctuation(crucialforspeed)
Y[h] Z[h’]
X[h]
i h k h’ j
PruningwithaPCFG
§ TheCharniak parserprunesusingatwo-pass,coarse-to-fineapproach[Charniak 97+]§ First,parsewiththebasegrammar§ ForeachX:[i,j]calculateP(X|i,j,s)
§ Thisisn’ttrivial,andtherearecleverspeedups§ Second,dothefullO(n5) CKY
§ SkipanyX:[i,j]whichhadlow(say,<0.0001)posterior§ Avoidsalmostallworkinthesecondphase!
§ Charniak etal06:canusemorepasses§ Petrov etal07:canusemanymorepasses
Results
§ Someresults§ Collins99– 88.6F1(generativelexical)§ Charniak andJohnson05– 89.7/91.3F1(generativelexical/reranked)
§ Petrov etal06– 90.7F1(generativeunlexical)§ McClosky etal06– 92.1F1(gen+rerank +self-train)
LatentVariablePCFGs
§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]
TheGameofDesigningaGrammar
§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]
TheGameofDesigningaGrammar
§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]§ Automaticclustering?
TheGameofDesigningaGrammar
LatentVariableGrammars
Parse Tree Sentence Parameters
...
Derivations
Backward
LearningLatentAnnotations
EMalgorithm:
X1
X2 X7X4
X5 X6X3
He was right
.
§ Brackets are known§ Base categories are known§ Only induce subcategories
JustlikeForward-BackwardforHMMs.Forward
RefinementoftheDTtag
DT
DT-1 DT-2 DT-3 DT-4
Hierarchicalrefinement
HierarchicalEstimationResults
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700Total Number of grammar symbols
Par
sing
acc
urac
y (F
1)
Model F1Flat Training 87.3Hierarchical Training 88.4
Refinementofthe,tag§ Splittingallcategoriesequallyiswasteful:
Adaptive Splitting
§ Want to split complex categories more§ Idea: split everything, roll back splits which
were least useful
AdaptiveSplittingResults
Model F1Previous 88.4With 50% Merging 89.5
0
5
10
15
20
25
30
35
40
NP
VP PP
ADVP S
ADJP
SBAR Q
P
WH
NP
PRN
NX
SIN
V
PRT
WH
PP SQ
CO
NJP
FRAG
NAC UC
P
WH
ADVP INTJ
SBAR
Q
RR
C
WH
ADJP X
RO
OT
LST
NumberofPhrasalSubcategories
NumberofLexicalSubcategories
0
10
20
30
40
50
60
70
NNP JJ
NNS NN VBN RB
VBG VB VBD CD IN
VBZ
VBP DT
NNPS CC JJ
RJJ
S :PR
PPR
P$ MD
RBR
WP
POS
PDT
WRB
-LRB
- .EX
WP$
WDT
-RRB
- ''FW RB
S TO$
UH, ``
SYM RP LS #
LearnedSplits
§ Proper Nouns (NNP):
§ Personal pronouns (PRP):
NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega Peters
NNP-15 New San WallNNP-3 York Francisco Street
PRP-0 It He IPRP-1 it he theyPRP-2 it them him
§ Relativeadverbs(RBR):
§ CardinalNumbers(CD):
RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later
CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34
LearnedSplits
FinalResults(Accuracy)
≤ 40 wordsF1
all F1
ENG
Charniak&Johnson ‘05 (generative) 90.1 89.6
Split / Merge 90.6 90.1
GER
Dubey ‘05 76.3 -
Split / Merge 80.8 80.1
CH
N
Chiang et al. ‘02 80.0 76.6
Split / Merge 86.3 83.4
Still higher numbers from reranking / self-training methods
EfficientParsingforHierarchicalGrammars
Coarse-to-FineInference§ Example:PPattachment
?????????
HierarchicalPruning
… QP NP VP …coarse:
split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …
… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:
split in eight: … … … … … … … … … … … … … … … … …
BracketPosteriors
1621min111min35min
15min(nosearcherror)
OtherSyntacticModels
DependencyParsing
§ Lexicalizedparserscanbeseenasproducingdependencytrees
§ Eachlocalbinarytreecorrespondstoanattachmentinthedependencygraph
questioned
lawyer witness
the the
DependencyParsing
§ Puredependencyparsingisonlycubic[Eisner99]
§ Someworkonnon-projective dependencies§ Commonin,e.g.Czechparsing§ CandowithMSTalgorithms[McDonaldandPereira05]
Y[h] Z[h’]
X[h]
i h k h’ j
h h’
h
h k h’
Shift-ReduceParsers
§ Anotherwaytoderiveatree:
§ Parsing§ Nousefuldynamicprogrammingsearch§ Canstillusebeamsearch[Ratnaparkhi97]
TreeInsertionGrammars
§ Rewritelarge(possiblylexicalized)subtrees inasinglestep
§ Formally,atree-insertiongrammar§ Derivationalambiguitywhethersubtrees weregeneratedatomically
orcompositionally§ MostprobableparseisNP-complete
TIG:Insertion
Tree-adjoininggrammars
§ Startwithlocaltrees§ Caninsertstructure
withadjunctionoperators
§ Mildlycontext-sensitive
§ Modelslong-distancedependenciesnaturally
§ …aswellasotherweirdstuffthatCFGsdon’tcapturewell(e.g.cross-serialdependencies)
TAG:LongDistance
CCGParsing
§ CombinatoryCategorialGrammar§ Fully(mono-)
lexicalizedgrammar§ Categoriesencode
argumentsequences§ Verycloselyrelated
tothelambdacalculus(morelater)
§ Canhavespuriousambiguities(why?)
EmptyElements
EmptyElements
EmptyElements§ InthePTB,threekindsofemptyelements:
§ Nullitems(usuallycomplementizers)§ Dislocation(WH-traces,topicalization,relativeclauseandheavyNPextraposition)
§ Control(raising,passives,control,sharedargumentation)
§ Needtoreconstructthese(andresolveanyindexation)
Example:English
Example:German
TypesofEmpties
APattern-MatchingApproach§ [Johnson02]
Pattern-MatchingDetails§ Somethingliketransformation-basedlearning§ Extractpatterns
§ Details:transitiveverbmarking,auxiliaries§ Details:legalsubtrees
§ Rankpatterns§ Pruningranking:bycorrect/matchrate§ Applicationpriority:bydepth
§ Pre-ordertraversal§ Greedymatch
TopPatternsExtracted
Results
SemanticRoles
SemanticRoleLabeling(SRL)
§ Characterizeclausesasrelations withroles:
§ SaysmorethanwhichNPisthesubject(butnotmuchmore):§ Relationslikesubject aresyntactic,relationslikeagent ormessage are
semantic§ Typicalpipeline:
§ Parse,thenlabelroles§ Almostallerrorslockedinbyparser§ Really,SRLisquitealoteasierthanparsing
SRLExample
PropBank/FrameNet
§ FrameNet:rolessharedbetweenverbs§ PropBank:eachverbhasitsownroles§ PropBank moreused,becauseit’slayeredoverthetreebank (andsohas
greatercoverage,plusparses)§ Note:somelinguistictheoriespostulatefewerrolesthanFrameNet (e.g.
5-20total:agent,patient,instrument,etc.)
PropBankExample
PropBankExample
PropBankExample
SharedArguments
PathFeatures
Results
§ Features:§ Pathfromtargettofiller§ Filler’ssyntactictype,headword,case§ Target’sidentity§ Sentencevoice,etc.§ Lotsofothersecond-orderfeatures
§ Goldvsparsedsourcetrees
§ SRLisfairlyeasyongoldtrees
§ Harderonautomaticparses
ParseReranking
§ Assumethenumberofparsesisverysmall§ WecanrepresenteachparseTasafeaturevectorj(T)
§ Typically,alllocalrulesarefeatures§ Alsonon-localfeatures,likehowright-branchingtheoveralltreeis§ [Charniak andJohnson05]givesarichsetoffeatures
K-BestParsing [Huang and Chiang 05, Pauls, Klein, Quirk 10]