Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in...

24
Basic Parsing Algorithms: Basic Parsing Algorithms: Earley Parser and Left Corner Earley Parser and Left Corner Parsing Parsing Alexandr Chernov Recent Advances in Parsing Technology

Transcript of Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in...

Page 1: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

Basic Parsing Algorithms:Basic Parsing Algorithms:Earley Parser and Left Corner ParsingEarley Parser and Left Corner Parsing

Alexandr Chernov

Recent Advances in Parsing Technology

Page 2: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

2

Chomsky hierarchy● Type-0 grammars (unrestricted grammars)

include all formal grammars

● Type-1 grammars (context-sensitive grammars) generate the context-sensitive languages

● Type-2 grammars (context-free grammars) generate the context-free languages

● Type-3 grammars (regular grammars) generate the regular languages

Page 3: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

3

Context-free Grammar● A context-free grammar (for short, CFG) is a quadruple G

= (V, Σ, P, S), where– V is a finite set of symbols called the vocabulary

(or set of grammar symbols);– Σ V⊆ is the set of terminal symbols (for short,

terminals);– S (V − Σ)∈ is a designated symbol called the

start symbol;– P (V − Σ) × V⊆ ∗ is a finite set of productions (or

rewrite rules, or rules).● The set N = V −Σ is called the set of nonterminal symbols

(for short, nonterminals). Thus, P N × V⊆ ∗, and every production A, α is also denoted as A → α

Page 4: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

4

Rewrite Rules

S → NP VP

NP → Det N

Det → the

NP → the N

...

Page 5: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

5

Formal Grammar

●Terminals– Letters, numbers, words (cannot be broken

down into "smaller" units)

●Nonterminals– Syntactic variable (category), formula,

arithmetic expression

Page 6: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

6

Parsers➢ Parsing algorithms for context-free grammar play

an important role in the implementation of:

➢ compilers and interpreters for programming languages

➢ programs which "understand" or translate natural languages

Page 7: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

7

Two common types of parsers➢ The main task of parsing is to connect the root node S

with the tree leaves, the input➢ Top-down parsers: starts constructing the parse tree

from the root and move down towards the leaves. Easy to implement, but work with restricted grammars. Examples:➢ Predictive parsers (e.g., LL(k))

➢ Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. Examples:➢ Shift-reduce parser (or LR(k) parsers)

➢ Both are general techniques that can be made to work for all languages (but not all grammars!).

Page 8: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

8

Basic Parsing Algorithms➢ Earley parser

➢ Chart parser

➢ CKY (Cocke-Younger-Kasami)

➢ Head Driven / Left Corner Parsing

Page 9: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

9

Earley Parser

http://jayearley.com/

➢ Can parse all context-free languages

➢ Complexity O(n³), where n is the length of the parsed string, O(n²) for unambiguous grammars

➢ Top-down dynamic programming algorithm

Page 10: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

10

Special Symbols

➢ ┤ - right terminator

➢ . (dot) – position between terminals/nonterminals

E→ .E+TE→ E.+T

➢Φ – complete production

Page 11: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

11

Earley Parser's Steps➢ Predictor (applicable to a state when there is a

nonterminal to the right of the dot)

➢ Scanner (applicable if there is a terminal to the right of the dot)

➢ Completer (applicable to a state if its dot is at the end of its production)

Page 12: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

12

Earley Parser Algorithm● Grammar AE input string = a+a*a

root: E→T | E+TT→P | T*PP→a

S0 (x1=a)Φ→ .E ┤E→ .E+TE→ .TT→ .T*PT→ .PP→ .a

S1 (x2=+)P→ a.T→ P.E→ T.T→ T.*PΦ→ E. ┤E→ E.+T

S2 (x3=a)E→ E+.TT→ .T*PT→ .PP→ .a

S3 (x4=*)P→ a.T→ P.E→ E+T.T→ T.*P

S4 (x5=a)T→ T*.PP→ .a

S5 (x6= ┤)P→ a.T→ T*P.E→ E+T.T→ T.*PΦ→ E.┤E→ E.+T

S6 Φ→ E ┤.

Page 13: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

13

Left-Corner Parsing● For some grammars top-down prediction can

fail to terminate, bottom-up parser is needed● Going Wrong with Top-down Parsing

– Input string: John diedS → NP VPNP → Det NNP → PNVP → IVDet → theN → robberPN → JohnIV → died

Page 14: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

14

Left-Corner Parsing● Going Wrong with Bottom-up Parsing

– Input string: The plant died

S → NP VPNP → Det NVP → IVVP → TV NPTV → plantIV → diedDet → theN → plant

Page 15: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

15

Left-Corner Parsing● The key idea of left-corner parsing is to

combine top-down and bottom-up processing– Left corner of a rule

S → NP VPVP → IVPN → John

Page 16: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

16

Left-Corner ParsingS → NP VPNP → Det NNP → PNVP → IVDet → theN → robberPN → JohnIV → died

● How does it work?

S

NP VP

PN

John

IV

died

Page 17: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

17

Head-Corner Parsing● Head-Corner Parser starts by locating a

potential head of the phrase and then proceeds by parsing the daughters to the left and the right of the head

● Head-Corner Parser is a generalization of Left-Corner Parser

● Left-Corner Parser is 10% faster

Page 18: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

18

Head-Corner Parsing● The daughters left of the head are parsed from

right to left (starting from the head), the daughters right of the head are parsed from left to right (starting from the head)

Page 19: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

19

Page 20: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

20

Head-Corner Parsing● Input string:

– Time flies like an arrow

Page 21: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

21

Summary● Bottom-up parsing is used for analyzing

unknown data relationships in attempt to identify the most fundamental units first, and then to infer higher-order structures from them

● Top-down parsing is employed for analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis

Page 22: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

22

Possible ways of using● Chart parsers can be used for parsing

computer languages. Earley Parsers in particular have been used in compiler compilers where their ability to parse using arbitrary CFG eases the task of writing the grammar for a particular language.

● Left-Corner Parser can be used for processing of natural languages as long as it recognizes ambiguity

Page 23: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

23

Thank you for attention

Questions?

Page 24: Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology.

24

Sources● Jay Earley. An efficient context-free parsing

algorithm. Communications of the ACM, 13(2):94–102, 1970

● Gertjan van Noord. An efficient implementation of the head-corner parser. Computational Linguistics, 23(3):425–456, 1997

● http://cs.union.edu/~striegnk/courses/nlp-with-prolog/html/node53.html