Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron...

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing

Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall

April 2010

Talk Outline

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Intro to Syntactic Parsing

• Hierarchically cluster and label syntactic word groups (constituents)• Provides structure and meaning

• Why Parse?– Machine Translation

• Synchronous Grammars

– Language Understanding• Semantic Role Labeling• Word Sense Disambiguation• Question-Answering• Document Summarization

– Language Modeling• Long-distance dependencies

– Because it’s fun

• What you (usually) need to parse– Supervised data: A treebank of sentences with annotated parse structure

• WSJ treebank: 50k sentences

– A Binarized Probabilistic Context Free Grammar induced from a treebank

– A parsing algorithm

• Example grammar rules:– S NP VP prob=0.2

– NP NP NN prob=0.1

– NP JJ NN prob=0.06

– Binarize: VP PP VB NN • VP PP @VP prob=0.2• @VP VB NN prob=0.5

Parsing Accuracy

Non-terminals

Grammar Size

Sec / Sent

F-Score

Baseline 2,500 64,000 0.1 74%

Parent Annotation (Johnson) 6,000 75,000 1.0 78%

Manual Refinement (Klein) 15,000 86%

Latent Variable (Petrov) 1,100 4,000,000 100.0 89%

Lexical (Collins, Charniak) Lots Implicit 89%

• Accuracy Improvements from grammar refinement– Split original non-terminal categories (Subject-NP vs. Object-NP)

– Accuracy at the cost of speed• Solution space becomes impractical to exhaustively search

Berkeley Grammar & Parser

• Petrov et al. automatically split non-terminals using latent variables• Example grammar rules:

– S_3 NP_12 VP_6 prob=0.2

– NP_12 NP_9 NN_7 prob=0.1

– NN_7 house prob=0.06

• Berkeley Coarse-to-Fine parser uses six latent variable grammars– Parse input sentence once with each grammar

– Posterior probabilities from pass n used to prune pass n+1

– Must know mapping between non-terminals from different grammars• Grammar(2) { NP_1, NP_6 } Grammar(3) { NP_2, NP_9, NP_14 }

Research Goals

• Our Research Goals– Find good solutions very quickly in this LARGE grammar space (not ML)– Algorithms should be grammar agnostic– Consider practical implications (speed, memory)

• This talk: Exponential Decay Pruning– Beam-Search parsing for efficient search– Searches the final grammar space directly– Balance overhead of targeted exploration (best-first) vs. memory and

cache benefits of local exploration (CYK)

Parsing Algorithms: CYK

• Exhaustive population of all parse trees permitted by the grammar

• Dynamic Programming algorithm give Maximum Likelihood solution

• Fill in cells for SPAN=1,2,3,4,…

GrammarS NP VP (p=0.7)

NP NP NP (p=0.2)

NP NP VP (p=0.1)

NN court (p=0.4)

VB court (p=0.1)

NP NP NP (p=0.2)

NP NP VP (p=0.1)

NN court (p=0.4)

VB court (p=0.1)

• N iterations through the grammar at each chart cell to consider all possible midpoints

Parsing Algorithms: Best-First

VB court (p=0.1)

Frontier PQ[try][shooting,defendant]VP VB NP fom=28.1

[try,shooting][defendant]VP VB NP fom=14.7

[Juvenile][court]NP ADJ NN fom=13

• Frontier is a Priority Queue of all potentially buildable entries

• Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

VB court (p=0.1)

• Frontier is a Priority Queue of all potentially buildable entries

• Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

• How do we rank Frontier entries?– Figure-of-Merit (FOM)– FOM = Inside (grammar) * Outside (heuristic)

– Caraballo and Charniak, 1997 (C&C)– Problem with comparisons of different spans

VB court (p=0.1)

Parsing Algorithms: Beam-Search

• Beam-Search: Best of both worlds

• CKY exhaustive traversal (bottom-up)

• At each chart cell– Compute FOM for all possible cell entries– Rank entries in a (temporary) local priority queue– Only populate the cell with the n-best entries (beam-width)

• Less Memory– Not storing all cell entries (CYK) nor bad frontier entries (Best-First)

• Runs Faster– Search space is pruned (unlike CYK) and don’t need to maintain global

priority queue (Best-First)

• Eliminates problem of global cell entry comparison

Exponential Decay Pruning

• What is the optimal beam-width per chart cell?– Common solutions:

• Relative score difference from highest ranking entry• Global maximum number of candidates

• Exponential Decay Pruning– Adaptive beam-width conditioned on chart cell information– How reliable is our Figure-of-Merit per chart cell?– Plotted rank of Gold entry against span and sentence size

• FOM is more reliable for larger spans– Less dependent on outside estimate

• FOM is less reliable for short sentences– Atypical grammatical structure (in WSJ?)

• Confidence in FOM can be modeled with the Exponential Decay function– N0 = Global beam-width maximum

– n = sentence length– s = span length (number of words covered)– λ = tuning parameter

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

SpanLength / SentenceLength

baseline

• Confidence in FOM can be modeled with the Exponential Decay function

Results

• Wall Street Journal treebank– Train: Sections 2-21 (40k sentences)

– Dev: Section 24 (1.3k sentences

– Test: Section 23 (2.4k sentences)

• Berkeley SM6 Latent Variable Grammar• Figure-of-Merit from Caraballo and Charniak, 1997 (C&C)• Also applied Cell Closing Constraints (Roark and Hollingshead, 2008)• External comparison with Berkeley Coarse-to-Fine parser using same

grammar

Results: Dev

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First Inside 138.0 152472 87.2

Best-First C&C 1.43 349 85.2

Beam-Search Inside Constant 5.68 35501 87.2

Beam-Search Inside Decay 3.01 20002 87.0

Beam-Search C&C Constant 0.62 7548 87.0

Beam-Search C&C Decay 0.37 5145 87.1

Beam-Search C&C Constant Yes 0.31 5333 87.4

Beam-Search C&C Decay Yes 0.20 3839 87.5

• Figure-of-Merit makes a big difference• Fast solution, but significant accuracy degradation

Results: Dev

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First C&C 1.43 349 85.2

• Using the inside probability for the FOM– 95% speed reduction with Beam-Search over Best-First

– Exponential Decay adds additional 47% speed reduction

Results: Dev

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First C&C 1.43 349 85.2

Beam-Search C&C Decay Yes 0.20 3839 87.5• Using the C&C FOM

– Beam-Search is faster (57%) and more accurate than Best-First

– Exponential Decay adds additional 40% speed reduction

Results: Dev

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First C&C 1.43 349 85.2

Results: Test

Cell Closing

Seconds per Sent

F-Score

CYK 76.63 88.0

Beam-Search C&C Constant 0.45 87.9

Beam-Search C&C Decay 0.28 88.0

Beam-Search C&C Decay Yes 0.16 88.3

Berkeley C2F 0.21 88.3

• 38% relative speed-up (Decay vs. Constant beam-width)• Decay pruning and Cell Closing Constraints are complementary• Same ball-park as Coarse-to-Fine (perhaps a bit faster)• Requires no knowledge of the grammar

Thanks

FOM Details

• C&C FOM Details– FOM(NT) = Outsideleft * Inside * Outsideright

– Inside = Constituent grammar score for NT

– Outsideleft = Max { POS forward prob * POS-to-NT transition prob }

– Outsideright = Max { NT-to-POS transition prob * POS bkwd prob }

FOM Details

• C&C FOM Details

Research Goals

• Research Goals– Find good solutions very quickly in this LARGE grammar space (not ML)– Algorithms should be grammar agnostic– Consider practical implications (speed, memory)

• Current projects towards these goals– Better FOM function

• Inside estimate (grammar refinement)• Outside estimate (participation in complete parse tree)

– Optimal chart traversal strategy• Which areas of the search space are most promising?• Cell Closing Constraints (Roark and Hollingshead, 2008)

– Balance between targeted and exhaustive exploration• How much “work” should be done exploring the search space around these promising

areas?• Overhead of targeted exploration (best-first) vs. memory and cache benefits of local

exploration (CYK)

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron...

Documents

Transcript of Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron...

Mark Dunlop

DUNLOP FERROFLEX

"Macbeth" by Kelley Roark

Dunlop tire

site dunlop

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

ROADMAG - Dunlop

Ambiguous Keyboards 4/8/2010 -- Nate Bodenstab

WHP-East Piatu-Roark Calc & Stiffener-Rev0

INTOuch - Dunlop

FOUNDED BY BRITTANY L ROARK

DF1 - R - Roark - H2O Overview

Dunlop Project

Monitoring: Computer & Conversiongrindhousestudiosathens.com/wp-content/uploads/... · -Dunlop Crybaby Rack -Electroharmonix C9 -Sansasmp GT2 -Dunlop Analog Delay -Dunlop MXR 6 band

HYDRAULIC HOSE · DUNLOP HIFLEX DUNLOP HIFLEX // Past ˜ex.com // DUNLOP HIFLEX //E FIRt tH s HOsEs Scotland, 1887. John Boyd Dunlop, a Scottish born inventive veterinarian observes

Dunlop Book

Dunlop Tyres

Roark v. United States Doc. 35 - Indybay · IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF OREGON DIANE ROARK, Plaintiff, v. UNITED STATES OF AMERICA, Defendant. Diane Roark

-MS TIMBER FLOOR ADHESIVEcdn.dunlopresources.com/pdf/datasheets/Dunlop Ardit... · Dunlop X-MS Timber Floor Adhesive 2. Dunlop Ardit PU Primer (2 coats) 1. Concrete. MIXING DUNLOP

Política Exterior Argentina Giglio Calvento Roark