Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → •...

73
Basic Parsing Algorithms – Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt

Transcript of Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → •...

Page 1: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Basic Parsing Algorithms –Chart Parsing

SeminarRecent Advances in Parsing Technology

WS 2011/2012

Anna Schmidt

Page 2: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Talk Outline Chart Parsing – Basics Chart Parsing – Algorithms

– Earley Algorithm– CKY Algorithm

→ Basics→ BitPar: Efficient Implementation of CKY

Page 3: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Chart Parsing – Basics

Page 4: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Chart Parsing – Basics First proposed by Martin Kay Dynamic programming approach

– Partial results of the computation are stored and (re)used later if needed

→ Same problem is not solved more than once Operates on a CFG Functionality: Recogniser / Parser

… in this talk focus on recogniser functionality

Page 5: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Main Components

Chart Edges Agenda

Page 6: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Component: Chart Is a well-formed substring table (WFST)

– Stores partial and complete analyses of substrings– Information stored in one triangular half

of a two-dimensional array of (n+1)*(n+1) | n*n

Can also be understood as a (directed) graph– Vertices: positions between input words

0 Mary 1 feeds 2 the 3 otter 4– Edges connecting vertices

Allows no duplicate entries

Page 7: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Component: Edge Data structure storing information about a

particular step in the parsing process Inhabit cells of the chart Contain

– Start and end position in input string– A dotted rule– Can also contain edge probability

Page 8: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Component: Edge A dotted rule consists of

– Left hand side (LHS) = non-terminal symbol– Right hand side (RHS) = non-terminal or terminal symbol– A dot between RHS symbols indicating which

constituents have already been found Edges can be

– Active / incomplete: dot not the last element of RHS– Inactive / complete: dot is last element of RHS

Example: S → NP • VP (0,1)

Page 9: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Component: Agenda

Organises the order in which tasks are executed

Here all tasks (edges) are collected before being put on the chart

Ordering of agenda determines what is processed first → Therefore also which parse is found first– Queue, stack, ordering with respect to

probabilities, …

Page 10: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Parsing Strategies Kay differentiates parsing strategies along two dimensions:

– Bottom-up versus top-down – Directed versus undirected

Directed bottom-up– Only build edges for phrases that can actually be incorporated into a

higher level structure → Left-Corner Parser Directed top-down

– Only build a new (active) edge if the next word of the input can be used to extend such an edge → Earley

Undirected varieties: No such restrictions → Undirected Bottom-Up: CKY

Page 11: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Parsing Strategies

Ways of achieving directedness: Reachability Table:

– Contains for each non-terminal N the set of all symbols that can be the first element of a string dominated by N

– For example: NP can start with DET, N, ADJ, but not with V Rule selection table:

– M*N table where M = non-terminals excluding pre-terminals N = all non-terminals

– Contains all grammar rules applicable in a situation where M is the 'upper' and N is the 'lower' symbol

Page 12: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Chart Parsing: Advantages No repeated computation of same subproblem Deals well with left-recursive grammars Deals well with ambiguity No backtracking necessary

Page 13: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Earley Algorithm

Page 14: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Earley Algorithm Proposed by Jay Earley Top down search Can handle all CFGs Efficient:

– O(n3) in the general case – Faster for particular types of grammar

Page 15: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Terminology In his paper, Earley does not use the notion of a

'chart' He represents the parsing process as

sets of states– Index of each state set

= end position of all states in the set– A state largely corresponds to an edge

- Contains dotted rule- Pointer to start position- End position can be derived from state set

Page 16: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Terminology Formalisms are very similar Examples easier to follow when represented in

charts So we will stick with 'chart' representations

Page 17: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Algorithm – Components Initialization Predictor Scanner Completer

Algorithm operates on one half of an array of size (n+1)*(n+1)

Page 18: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

1

2

3

4

5

Initialise

Page 19: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

1

2

3

4

5

Predict

Page 20: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary •

1

2

3

4

5

Scan

Page 21: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1

2

3

4

5

Complete

Page 22: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

2

3

4

5

Predict

Page 23: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •

2

3

4

5

Scan

Page 24: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2

3

4

5

Complete

Page 25: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

3

4

5

Predict

Page 26: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •

3

4

5

Scan

Page 27: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

3

4

5

Complete

Page 28: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

3 N → • MaryN → • otter

4

5

Predict

Page 29: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

3 N → • MaryN → • otter

N → otter •

4

5

Scan

Page 30: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4

5

Complete

Page 31: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4

5

Complete

Page 32: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

S → NP VP •

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4

5

Complete

Page 33: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

S → NP VP • X → S • eos

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4

5

Complete

Page 34: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

S → NP VP • X → S • eos

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4 eos → • eos

5

Predict

Page 35: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

S → NP VP • X → S • eos

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4 eos → • eos eos → eos •

5

Scan

Page 36: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4 eos 5

0 1 2 3 4 50 X → • S eos

S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the

N → Mary • NP → N •S → NP • VP

S → NP VP • X → S • eos

X →S eos •

1 VP → • V NPV → • feeds

V → feeds •VP → V • NP

VP → V NP •

2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the

DET → the •NP → DET • N

NP → DET N •

3 N → • MaryN → • otter

N → otter •

4 eos → • eos eos → eos •

5

Complete

Page 37: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Lookahead Component

In original paper, Earley proposes the use of a lookahead string for each state which represents the allowed successor for LHS

Prevents completer from processing a state if lookahead string and next word of input do not match→ Remember Kay's directed top-down strategy?

Page 38: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

CKY: Basics

Page 39: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

CKY Basics

Proposed by John Cocke, Daniel H. Younger, and Tadao Kasami (independently)

Bottom-up search Incremental Grammar must be in Chomsky normal form (CNF) Complexity O(n3) Chart: (upper triangle of) array of size n*n

Page 40: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

CKY Algorithm: Idea

Initialise upper triangle of a chart of size n*n From upper left to lower right corner of chart:

Go to the next cell in the diagonal– Fill in POS tag of next word in input string – Each time a POS tag has been filled in,

go up cell by cell and build larger constituentsthat end at the current end position

Page 41: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

1 2 3 4

1

2

3

4

Page 42: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1

2

3

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 43: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2

3

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 44: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 45: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 46: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3 DET

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 47: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3 DET

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 48: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3 DET

4

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 49: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3 DET

4 NNP

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 50: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V

3 DET NP

4 NNP

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 51: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

2 V VP

3 DET NP

4 NNP

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 52: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

0 Mary 1 feeds 2 the 3 otter 4

1 2 3 4

1 NNP

S

2 V VP

3 DET NP

4 NNP

S → NP VPNP → NNP → DET NVP → V NP

N → Mary | otter

V → feedsDET → the

Page 53: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

CKY: BitPar

Page 54: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

BitPar: Basics

Proposed by Helmut Schmid Bit-vector-based parser Efficiently implements a CKY-style algorithm Uses bit vector operations to parallelise parsing

operations Idea:

Don't try to decrease number of edges that are built, instead minimise cost of building edges

Especially useful if all analyses are needed

Page 55: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

BitPar: Requirements Restrictions on Context Free Grammar

– Must be in CNF– Must be ε-free – Chain rules allowed

Precomputed for each non-terminal N: – Set of non-terminals that are derivable from N via

chain rules– Set is stored in the bit vector chainvec[N]– Set includes N itself

Page 56: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Background: Bitwise AND and OR

● AND

0101& 0011 = 0001

Both corresponding bits must equal 1

● OR

0101 | 0011 = 0111

At least one of corresponding bits must equal 1

Page 57: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

BitPar: Chart Chart = three-dimensional bit array

chart [start position b] [end position e] = [011000...] [b] [e] contains a bit vector with one bit for each

non-terminal– Bit is set to 1 if non-terminal was inserted– 0 otherwise

Chart initialised with all bits = 0

Page 58: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Filling the Chart: POS Tags

Inserting POS tags into a cell of the diagonal: For each non-terminal N that can be rewritten

as the word at the current position Do a bitwise OR of– Bits inhabiting the chart cell– chainvec[N]

→ N and all its chain derivations are inserted in just one operation

Page 59: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Mary feeds the otter 1 2 3 4

1 011000 000000 000000 000000

2 000000 000010 000000 000000

3 000000 000000 000000 000000

4 000000 000000 000000 000000S, NP, N, VP, V, DET

Page 60: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Mary feeds the otter 1 2 3 4

1 011000 000000

?000000 000000

2 000000 000010 000000 000000

3 000000 000000 000000 000000

4 000000 000000 000000 000000S, NP, N, VP, V, DET

Page 61: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Filling the Chart: Larger Constituents

Conceptually: Determine if several cells can be combined to

form a higher level constituent labeled N For this:

Loop over grammar rules with LHS = N,extract RHS (consisting of RHS1, RHS2)

Loop over all possible combinations of cells that together could contain the substructure of N and determine whether they contain RHS1 and RHS2 respectively

Page 62: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Filling the Chart: Larger Constituents

This has to be done – For each super-diagonal cell– For each non-terminal– For all corresponding grammar rules– For all possible cell combinations that could

constitute a substructure of N This is a time-consuming process BUT: The same functionality can be achieved

by a single AND operation on two bit vectors

Page 63: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Filling the Chart: Larger Constituents

Internally: Can a given non-terminal LHS be inserted into a

given chart cell [b] [e]? Get RHS1, RHS2 from grammar Vector 1

Contains bits stored in chart [ b ] [ b ..b+1..e-1 ] [ RHS1 ]

Vector 2Contains bits stored in chart [ b+1..b+2..e ] [ e ] [ RHS2 ]

Page 64: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Filling the Chart: Larger Constituents

If a bitwise AND operation on the two new vectors produces one bit = 1– A valid substructure for LHS has been found– LHS can be inserted into the chart cell

Let's look at an example

Page 65: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Mary feeds the otter 1 2 3 4

1 011000

2 000010

3 000001

?4 011000

Example:Lets determine if NP should go into cell [3] [4].

S, NP, N, VP, V, DET

1 2 3 4

1 011000 000000 000000 000000

2 000000 000010 000000 000000

3 000000 000000 000001 000000 ?

4 000000 000000 000000 011000

Page 66: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Should NP go into [3] [4]?

First, we consult the grammar We find a rule NP → DET N, so allowed right-hand sides for NP are RHS1 = DET RHS2 = N

Reminder: Rules v1 = chart [ b ] [ b .. b+1 .. e-1] [ RHS1 ]v2 = chart [ b+1.. b+2 .. e ] [ e ] [ RHS2 ]

Vector1 = 1chart [3] [3] = RHS1 = DET? → yes, so insert 1

Vector2 = 1chart [4] [4] = RHS2 = N? → yes, so insert 1

Vector1 AND Vector2 = 1, so insert NP

Page 67: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Mary feeds the otter 1 2 3 4

1 011000

2 000010

3 000001

?4 011000

Example:Lets determine if NP should go into cell [3] [4].→ Yes!

S, NP, N, VP, V, DET

1 2 3 4

1 011000 000000 000000 000000

2 000000 000010 000000 000000

3 000000 000000 000001 010000

4 000000 000000 000000 011000

Page 68: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Thank you for your attention!

Page 69: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

References Earley, Jay: An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–

102, 1970.

Jurafsky, Daniel and Martin, James H.: 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall

Kay, Martin: Algorithm schemata and data structures in syntactic processing. In Readings in natural language processing, pages 35–70. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986.

Kay, Martin: Lecture Slides of the Course 'Basic Algorithms for Computational Linguistics' http://www.coli.uni-saarland.de/courses/algorithms-11/

Schmid, Helmut: Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors. In Proceedings of Coling 2004, pages 162–168, Geneva, Switzerland, 2004.

Wirén, Mats: A Comparison of Rule-Invocation Strategies in Context-Free Chart Parsing

Page 70: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Initialization introduces a new non-terminal start symbol X

and a new end symbol EOS

adds EOS to the end of the input string for each root symbol R of the grammar:

add to the chart[0,0] an edge of the form:

X → . R EOS

Page 71: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Predictor for all non-terminals N directly following a dot

(in the current state set): and for each grammar rule with N as LHS:

add a new edge with – LHS = N – RHS according to grammar, but– dot first element of RHS– start and end = end of original state

Page 72: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Scanner

for all terminal symbols immediately following a dot:

compare terminal symbol with input string starting at end position of current edge

if they match: add new edge to the chart with – dot moved over the terminal symbol – end position incremented by 1

Page 73: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete

Completer

If the dot is last element of a production with LHS of type T

find edges that– are still waiting for a constituent of the type T– end where the complete edge is starting

Add to the chart an edge with– dot moved over T– end position = end position of completed edge