October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems...

22
October 2005 CSA3180: Parsing Algorith ms 2 1 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm

description

October 2005CSA3180: Parsing Algorithms 23 Left Recursion A grammar is left recursive if it contains at least one non-terminal A for which A  *  A  and   *  Intuitive idea: derivation of that category includes itself along its leftmost branch. NP  NP PP NP  NP and NP NP  DetP Nominal DetP  NP ' s

Transcript of October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems...

Page 1: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 1

CSA3050: NLP Algorithms

Parsing Algorithms 2 Problems with DFTD Parser

Earley Parsing Algorithm

Page 2: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 2

Problems withDFTD Parser

• Left Recursion• Ambiguity• Inefficiency

Page 3: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 3

Left Recursion• A grammar is left recursive if it contains

at least one non-terminal A for whichA * A and *

• Intuitive idea: derivation of that category includes itself along its leftmost branch. NP NP PP NP NP and NP NP DetP Nominal DetP NP ' s

Page 4: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 4

Infinite Search

Page 5: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 5

Dealing with Left Recursion

• Reformulate the grammar – A A |

as– A A'

A' A' |

– Disadvantage: different (and probably unnatural) parse trees.

• Use a different parse algorithm

Page 6: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 6

Ambiguity• Coordination Ambiguity: different scope

of conjunction:Black cats and dogs like to play

• Attachment Ambiguity: a constituent can be added to the parse tree in different places:I shot an elephant in my pyjamas

• VP → VP PPNP → NP PP

Page 7: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 7

Catalan Numbers

The nth Catalan number counts the ways of dissecting a polygon with n+2 sides into triangles by drawing nonintersecting diagonals. No of PPs # parses

2 23 54 145 1326 4697 14308 4867

Page 8: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 8

Handling Disambiguation

• Statistical disambiguation• Semantic knowledge.

Page 9: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 9

Repeated Parsing ofSubtreesa flight 4

from Indianapolis 3

to Houston 2

on TWA 1

A flight from Indianapolis 3

A flight from Indianapolis to Houston

2

A flight from Indianapolis to Houston on TWA

1

Page 10: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 10

Earley Algorithm• Dynamic Programming: solution involves

filling in table of solutions to subproblems.• Parallel Top Down Search• Worst case complexity = O(N3) in length N

of sentence.• Table, called a chart, contains N+1 entries

● book ● that ● flight ● 0 1 2 3

Page 11: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 11

The Chart

• Each table entry contains a list of states• Each state represents all partial parses that have

been reached so far at that point in the sentence.• States are represented using dotted rules

containing information about– Rule/subtree: which rule has been used– Progress: dot indicates how much of rule's RHS has

been recognised.– Position: text segment to which this parse applies

Page 12: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 12

Examples of Dotted Rules

• Initial S RuleS → ● VP, [0,0]

• Partially recognised NPNP → Det ● Nominal, [1,2]

• Fully recognised VPVP → V VP ● , [0,3]

• These states can also be represented graphically

Page 13: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 13

The Chart

Page 14: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 14

Earley Algorithm• Main Algorithm: proceeds through each text

position, applying one of the three operators below.

• Predictor: Creates "initial states" (ie states whose RHS is completely unparsed).

• Scanner: checks current input when next category to be recognised is pre-terminal.

• Completer: when a state is "complete" (nothing after dot), advance all states to the left that are looking for the associated category.

Page 15: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 15

Early Algorithm – Main Function

Page 16: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 16

Early Algorithm – Sub Functions

Page 17: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 17

Page 18: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 18

Page 19: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 19

fl

Page 20: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 20

Retrieving Trees• To turn recogniser into a parser, representation of

each state must also include information about completed states that generated its constituents

Page 21: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 21

Page 22: October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

October 2005 CSA3180: Parsing Algorithms 2 22

Chart[3]

↑Extra Field