LR Parsing Table Costruction

47
1 LR Parsing Table Costructio LR Parsing Table Costructio n n Lecture 6 Lecture 6 Syntax Analysis Syntax Analysis

description

LR Parsing Table Costruction. Lecture 6 Syntax Analysis. LR parsing example. Grammar: E -> E + T E -> T T -> T * F T -> F F -> ( E ) F -> id. LR parsing example. CONFIGURATIONS STACK INPUT ACTION 0 id * id + id $ shift 5. - PowerPoint PPT Presentation

Transcript of LR Parsing Table Costruction

Page 1: LR Parsing Table Costruction

1

LR Parsing Table CostructionLR Parsing Table Costruction

Lecture 6Lecture 6

Syntax AnalysisSyntax Analysis

Page 2: LR Parsing Table Costruction

2

LR parsing example

Grammar:

1. E -> E +

T

2. E -> T

3. T -> T * F

4. T -> F

5. F -> ( E )

6. F -> id

Page 3: LR Parsing Table Costruction

3

LR parsing example

CONFIGURATIONSSTACK INPUT ACTION 0 id * id + id $ shift 5

Page 4: LR Parsing Table Costruction

4

Fig. 4.32. Moves of LR parser on Fig. 4.32. Moves of LR parser on idid * * id +id + idid..

STACKSTACK INPUTINPUT ACTIONACTION

(1) (1)

(2)(2)

(3)(3)

(4)(4)

(5)(5)

(6)(6)

(7)(7)

(8)(8)

(9)(9)

(10)(10)

(11)(11)

(12)(12)

(13)(13)

(14)(14)

00

0 0 idid 5 5

0 0 FF 5 5

0 0 TT 2 2

0 0 T T 2 * 72 * 7

0 0 T T 2 * 7 2 * 7 idid 5 5

0 0 T T 2 * 7 2 * 7 FF 10 10

0 0 T T 22

0 0 E E 11

0 0 E E 1 + 61 + 6

0 0 E E 1 + 6 1 + 6 idid 5 5

0 0 E E 1 + 6 1 + 6 FF 3 3

0 0 E E 1 + 6 1 + 6 TT 9 9

0 0 E E 11

id id * * idid + + idid$$

* * idid + + idid$$

* * idid + + idid$$

* * idid + + idid$$

idid + + idid$$

+ + idid$$

+ + idid$$

+ + idid$$

+ + idid$$

idid$$

$$

$$

$$

$$

shiftshift

reduced by reduced by FF idid

reduced by reduced by TT FF

shiftshift

shiftshift

reduced by reduced by FF idid

reduced by reduced by TT TT**FF

reduced by reduced by EE TT

shiftshift

shiftshift

reduced by reduced by FF idid

reduced by reduced by TT FF

EE EE + + TT

acceptaccept

Page 5: LR Parsing Table Costruction

5

LR grammars

If it is possible to construct an LR parse table for G, wesay “G is an LR grammar”.

LR parsers DO NOT need to parse the entire stack todecide what to do (other shift-reduce parsers might).

Instead, the STATE symbol summarizes all the informationneeded to make the decision of what to do next.

The GOTO function corresponds to a DFA that knows howto find the HANDLE by reading the top of the stackdownwards.

In the example, we only looked at 1 input symbol at atime. This means the grammar is LR(1).

Page 6: LR Parsing Table Costruction

6

How to construct an LR parse table?

We will look at 3 methods: Simple LR (SLR): simple but not very powerful Canonical LR: very powerful but too many states LALR: almost as powerful with many fewer states

yacc uses the LALR algorithm.

Page 7: LR Parsing Table Costruction

7

SLR (Simple LR) Parse Table Construction

Page 8: LR Parsing Table Costruction

8

SLR parse tables

The SLR parse table is easy to construct, but the resulting parser isa little weak.

The table is based on LR(0) ITEMS, or just plain ITEMS.A LR(0) item is a production G with a dot at some position on the

RHS.The production A -> XYZ could generate the following LR(0) items:

A -> .XYZ A -> X.YZ A -> XY.Z A -> XYZ.

The production A -> ε only generates 1 LR(0) item: A -> .

Page 9: LR Parsing Table Costruction

9

LR(0) items

An item indicates how far we are in parsing the RHS.

A -> .XYZ means we think we’re at the beginning of anA production, but haven’t seen an X yet.

A -> X.YZ means we think we’re in the middle of an Aproduction, have seen an X, and should see a Y soon.

Page 10: LR Parsing Table Costruction

10

Augmenting the grammar G

Before we can produce an SLR parse table, we have toAUGMENT the input grammar, G.

Given G, we produce G’, the AUGMENTED GRAMMARfor G: Add a new symbol S’ Add a new production S’ -> S (where S is the

old start symbol)

Make S’ the new start symbol

Page 11: LR Parsing Table Costruction

11

Item set closure

We need a new concept: the CLOSURE of a set ofLR(0) items.

If I is a set of items for grammar G’, then the CLOSUREof I is defined recursively: Initially, every item in I is added to closure(I) If A -> α . B β is in closure(I) and B -> γ is a producti

on, then add the item B -> . γ to I, if not already there.

Page 12: LR Parsing Table Costruction

12

Itemset closure example

E’ -> E Closure(I) = { E’ -> . EE -> E + T | T E -> . E + TT -> T * F | F E -> . TF -> ( E ) | id T -> . T * F

T -> . FInitial itemset I is { E’ -> .E } F -> . ( E )

F -> . id }

Page 13: LR Parsing Table Costruction

13

The goto table

We also need the function goto(I,X) that takes anitemset I and a grammar symbol X, and returns theclosure of the set of all items [ A -> α X . β ] suchthat [ A -> α . X β ] is in I.

Example: I = { [E’ -> E.], [E -> E. + T] } goto(I,+) =

Page 14: LR Parsing Table Costruction

14

Fig. 4.35. Canonical LR(0) Fig. 4.35. Canonical LR(0) collectioncollection

for grammar (4.19) for grammar (4.19)

II00:: E'E' · · EEEE · · E E ++ T TEE · · TTTT · · T T * * FFTT · · FFFF · (· (EE))FF · · idid

II55:: FF id id ··

II66:: EE EE + + · · TTTT · · T T * * FFTT · · FFFF · (· (EE))FF · · idid

II77:: TT T T * · * · FFFF · (· (EE))FF · · idid

II11:: E'E' EE ··EE EE · +· + T T

II22:: E E TT ··TT TT · *· * F F

II88::

F F ( ( EE · )· )EE EE · +· + T T

II99: : EE EE ++ T T ··TT T T ·· * * FF

II33:: T T FF ··

II44:: FF ( (· · E E ))EE · · E E ++ T TEE · · TTTT · · T T * * FFTT · · FFFF · (· (EE))FF · · idid

II1010:: TT T T * * F F ··

II1111:: F F ( ( EE ) ) ··

Page 15: LR Parsing Table Costruction

15

Fig. 4.36. Transition diagram of DFA Fig. 4.36. Transition diagram of DFA DD form viable form viable prefixes.prefixes.

to I4(

to I5id

to I3F

to I4(

to I5id

I0 to I7E

I1+

I6T

I9*

I2*

I7F

I10T

I3F

I5id

to I6+

I4E

I8)

I11(

(

to I2T

to I3F

id

Page 16: LR Parsing Table Costruction

16

Canonical LR(0) itemsets

The CANONICAL LR(0) ITEMSETS can be used to createthe states in the SLR parse table.

We begin with an initial set C = {closure({ [S’->.S] })}.Then, foreach I in C and each grammar symbol X such

that goto(I,X) is not empty and not in C already, do Add goto(I,X) to C

Example: canonical LR(0) itemsets for the same grammar.Each set in C corresponds to a state in a DFA.

Page 17: LR Parsing Table Costruction

17

How to build the SLR parse table

1. Take the augmented grammar G’2. Construct the canonical LR(0) itemsets C for G’3. Associate a state with each itemset Ii in C

4. Construct the parse table as follows:1. If A -> α . a β is in Ii and goto(Ii,a) = Ij, then set action[i,a]

to “shift j” (“a” here is a terminal)2. If A -> α . is in Ii then set action[i,a] to “reduce A -> α” for

all a in FOLLOW(A)3. If S’ -> S . is in Ii then set action[i,$] to “accept”

If any of the actions in the table conflict, then G is NOT SLR.

Page 18: LR Parsing Table Costruction

18

Example SLR table construction

For the first LR(0) itemset in our favorite grammar:

I0: E’ -> .E

E -> .E + TE -> .TT -> .T * FT -> .FF -> .(E) This gives us action[0,(] = shift 4F -> .id This gives us action[0,id] = shift 5

Page 19: LR Parsing Table Costruction

19

Using Ambiguous Grammars

Page 20: LR Parsing Table Costruction

20

What to do with ambiguity?

Sometimes it is convenient to leave ambiguity in GFor instance, G1: is simpler than G2:

E -> E + E E -> E + T | T| E * E E -> T * F | F| ( E ) F -> ( E ) | id| id

But SLR(1), LR(1), and LALR(1) parsers will all have ashift/reduce conflict for G1.

Page 21: LR Parsing Table Costruction

21

What to do with ambiguity?

Sometimes it is convenient to leave ambiguity in GFor instance, G1: is simpler than G2:

E -> E + E E -> E + T | T| E * E E -> T * F | F| ( E ) F -> ( E ) | id| id

But SLR(1), LR(1), and LALR(1) parsers will all have ashift/reduce conflict for G1.

Page 22: LR Parsing Table Costruction

22

LR(0) itemsets for G1

Page 23: LR Parsing Table Costruction

23

Ambiguity leads to conflicts

G1 is ambiguous, so we are guaranteed to get conflicts.

For example, in I7: We will add rules to “shift 4” on ‘+’ and “shift 5”

on ‘*’. For the item E -> E+E. we will add the rule

“reduce E->E+E” to the parse table for each terminal in FOLLOW(E).

But! FOLLOW(E) contains + and * -- shift/reduce conflict.

LR(1) and LALR(1) tables will have the same problems.

Page 24: LR Parsing Table Costruction

24

Resolving the conflicts

Knowing about operator precedence and associativity, we can resolve the conflicts.

Example: for input “id + id * id”, we will be in state 7 after processing “id + id”

STACK INPUT 0 E 1 + 4 E 7 * id $

since * has higher precedence than +, we should really shift, not reduce.

With a + next in the input, we should reduce, to enforce left-associativity.

See Fig. 4.47 in text for a complete SLR(1) table.

Page 25: LR Parsing Table Costruction

25

If-else ambiguity

The ambiguity of the “dangling else” creates a shift-reduce conflict in parsers for most languages.

Since the else is normally associated with the nearest if, we resolve the conflict by shifting, instead of educing, when we see “else” in the input.

See the LR(0) states and parse table on page 251.This method is much simpler than writing an

unambiguous grammar.

Page 26: LR Parsing Table Costruction

26

Non-SLR grammars

Consider the assignment grammar

1. S’ -> S generating, e.g. S =*> id = * id2. S -> L = R3. S -> R4. L -> * R5. L -> id6. R -> L

Page 27: LR Parsing Table Costruction

27

Non-SLR grammars

Construct the initial canonical LR(0) itemset I0.

Compute I2 = goto(I0,L) and I6 = goto(I2,=).

Compute FOLLOW(L)Compute parse table entries for I2: shift/reduce conflict!

This means in state I2, with ‘=’ in the input, we do notknow whether to shift and go to state I6 or reducewith R -> L, since ‘=’ is in FOLLOW(L).

To correct this, we need to know more about the contextof the L we just parsed.

“Canonical LR(1)” and “LALR(1)” are powerful enough.

Page 28: LR Parsing Table Costruction

28

Canonical LR Parse Table Construction

Page 29: LR Parsing Table Costruction

29

II00:: S'S' · · SSSS · · L L == R RSS · · RRLL · * · * RRLL · · ididRR · · LL

II55:: LL id id ··

II66:: SS LL = = · · RRRR · · L L LL · · * R* RLL · · idid

II11:: S'S' S S ·· II77:: LL * * RR · ·

II22:: SS L L ·· == R R RR LL ··

II88::

R R LL ··

II99: : SS LL = = R R ··II33:: SS R R ··

II44:: LL * * · · R R RR · · L L LL · * · * RRLL · · idid

Fig. 4.37. Canonical LR(0) collection for grammar (4.20).Fig. 4.37. Canonical LR(0) collection for grammar (4.20).

Page 30: LR Parsing Table Costruction

30

More states means more memory

In SLR, we said in state i we should reduce by A -> α ifthe itemset contains the item [A -> α .] and a is inFOLLOW(A).

However, sometimes when state i is on top of the stack,and a is next in the input, what comes BEFORE α onthe stack might invalidate the reduction A -> α.

Example from previous grammar: sentential form “R = …” is impossible, but “* R =” is possible.

So actually, we really want to reduce by L -> * R whenwe see R on stack and “=” in the input.

Page 31: LR Parsing Table Costruction

31

LR(1) idea

Our parser needs to keep track of more state information.How can it?

Idea: use canonical LR(0) states, but split states asneeded by adding a terminal symbol to each item.

LR(1) ITEMS take the form [A-> α.β,a], where A-> αβ is a production in G and a is a terminal symbol or $.

The “1” refers to the length of a, the LOOKAHEAD foreach item. If length = k, we would have an LR(k) item.

In parsing, we will now only reduce αβ. to A if an item’slookahead symbol agrees with the next input.

Page 32: LR Parsing Table Costruction

32

LR(1) parse table construction

We need to redefine closure(I) for a set of LR(1) items:for each

item [A-> α.B β,a] in Iproduction B -> γ in G’terminal b in FIRST(β a)

such that [B->. γ,b] is not already in I, do:add [B->. γ,b] to I

repeat until no more items can be added to I

goto(I,X) is the same as for SLR(1).

Page 33: LR Parsing Table Costruction

33

Example LR(1) parser construction

Begin with augmented grammar G’:S’ -> SS -> C C [ what is L(G’)?? ]C -> c C | d

The first itemset I0 = closure({S’->.S,$}) = {

S’ -> .S,$S -> .CC,$ [ from S’->.S,$ and S->CC, B=S, α=ε, β= ε ]C -> .cC,c/d [ from S’->.CC,$ and C->cC, B=C, α= ε, β=C ]C -> .d,c/d [ from S’->.CC,$ and C->d, B=C, α= ε, β=C ]

}

Page 34: LR Parsing Table Costruction

34 Fig. 4.39. The goto graph for grammar (4.21).

S' → ·S , $S → ·CC , $

C → ·cC , c /dC → ·d , c /d

I 0

S' → S·, $I 1

S → C·C , $C → ·cC , $C → ·d , $

I 2

S → CC·, $I 5

C → c·C , $C → ·cC , $C → ·d , $

I 6

C → cC·, $I 9

C → d·, $I 7

C → c·C , c /dC → ·cC , c /dC → ·d , c /d

I 3

C → cC·, c /dI 8

C → d·, c /dI 4

S

C C

C

C

c

c

c

c

d

d

d

d

Page 35: LR Parsing Table Costruction

35

LR(1) parsers: the good news

LR(1) is quite similar to SLR(1), with one main difference: We only add reduce rules to the parse table

when the input matches the LOOKAHEAD for the item

SLR(1) adds reduce rules for any terminal in the FOLLOW set.

This means LR(1) will have fewer shift/reduce and reduce/reduce conflicts, because it tries to reduce in fewer situations.

Page 36: LR Parsing Table Costruction

36

LR(1) parsers: the bad news

LR(1) parsers are powerful, able to parse almost any unambiguous CFG used for real programming languages.

But there is a price: the number of states is huge.For the very simple c*dc*d language with 4

productions, we already needed 10 LR(1) states.For a typical PL like Pascal, the LR(1) table would

contain a few THOUSAND states!Is there a technique as powerful with fewer states?

Page 37: LR Parsing Table Costruction

37

STATESTATEactionaction gotogoto

cc dd $$ SS CC

00 s3s3 s4s4 11 22

11 accacc

22 s6s6 s7s7 55

33 s3s3 s4s4 88

44 r3r3 r3r3

55 r1r1

66 s6s6 s7s7 99

77 r3r3

88 r2r2 r2r2

99 r2r2

Fig. 4.40. Canonical parsing table for grammar (4.21).

Page 38: LR Parsing Table Costruction

38

LALR Parse Table Construction

Page 39: LR Parsing Table Costruction

39

LALR parse tables

LALR makes smaller parse tables than canonical LR, but still covers most common programming language constructs.

LALR has the same number of states as the SLR parser for the same grammar, but is more picky about when to reduce, so fewer conflicts come up.

yacc actually constructs a LALR(1) table, not a canonical LR(1) table.

Page 40: LR Parsing Table Costruction

40

LALR idea

Usually, in a LR parser, there will be many states that are identical, except for the lookahead symbol.

LALR takes these identical states and MERGES them, forming the UNION of the lookahead symbols for the merged items.

Algorithm: build the LR(1) itemsets, then merge itemsets with the same CORES.

Page 41: LR Parsing Table Costruction

41

LALR example

I0: S’ -> .S,$ I3: C -> c.C,c/d S -> .CC,$ C -> .cC,c/d C -> .cC,c/d C -> .d,c/d C -> .d,c/d

I5: S -> CC.,$I1: S’ -> S.,$

I6: C -> c.C,$I2: S -> C.C,$ C -> .cC,$

C -> .cC,$ C -> .d,$ C -> .d,$

I7: C -> d.,$I4: C -> d.,c/d

I8: C -> cC.,c/dI9: C -> cC.,$

Which LR(1) itemsetscan be merged?

Page 42: LR Parsing Table Costruction

42

STATESTATEactionaction gotogoto

cc dd $$ SS CC

00 s36s36 s47s47 11 22

11 accacc

22 s36s36 s47s47 55

3636 s36s36 s47s47 8989

4747 r3r3 r3r3 r3r3

55 r1r1

8989 r2r2 r2r2 r2r2

Fig. 4.41. LALR parsing table for grammar (4.21).

Page 43: LR Parsing Table Costruction

43

Efficient Construction of LALR Parsing TablesEfficient Construction of LALR Parsing Tables

Example 4.46.Example 4.46. Let us again consider the augmented grammar Let us again consider the augmented grammarS'S' SSSS LL = = R R | | RRAA * * R R | | ididBB LL

The kernels of the sets of LR(0) items for this grammar are shown in Fig. 4.42.The kernels of the sets of LR(0) items for this grammar are shown in Fig. 4.42.

II00:: S'S' ·· SS

II11:: S'S' S S ··

II22:: SS L L ·· == R R RR LL ··

II33:: SS R R ··

II44:: LL * * · · RR

II55:: LL id id ··

II66:: SS L L == ·· R R

II77:: LL * * RR ··

II88:: RR L L ··

II99:: SS LL = = R R ··

Fig. 4.42. Kernels of the sets of LR(0) items for grammar (4.20).

Page 44: LR Parsing Table Costruction

44

Efficient Construction of LALR Parsing TablesEfficient Construction of LALR Parsing Tables

Example 4.47. Example 4.47. Let us construct the kernels of the LALR(1) items for the Let us construct the kernels of the LALR(1) items for the

grammar in the previous example. The kernels of the LR(0) items were grammar in the previous example. The kernels of the LR(0) items were

shown in Fig. 4.42. When we apply Algorithm 4.12 to the kernel of set of shown in Fig. 4.42. When we apply Algorithm 4.12 to the kernel of set of

items items II00, we compute , we compute closureclosure ({[ ({[S'S' ·· S, S, #]#]}), which is}), which is

S'S' ·· S, S, ##

SS ·· LL = = RR, #, #

SS ·· RR, , ##

LL ·· * * RR, #/=, #/=

LL ·· idid, #/=, #/=

RR ·· LL, #, #

Page 45: LR Parsing Table Costruction

45

Fig.4.44. Propagation of lookaheads.Fig.4.44. Propagation of lookaheads.

FROMFROM TOTO

II00:: S'S' ·· SSII11::II22::II22::II33::II44::II55::

S'S' S S ··SS LL · = · = RRRR L L ··S S R R ··LL * * · · RRL L idid ··

II22:: SS L L · = · = RR II66:: SS L L == ·· R R

II44:: LL * * · · RR II44::II55::II77::II88: :

LL * * · · RRLL id id ··LL * * RR ··R R L L ··

II66:: SS L L == ·· R R II44::II55::II88::II99::

LL * * · · RRLL * * · · RRRR LL ··S S L L == R R ··

Page 46: LR Parsing Table Costruction

46

Fig. 4.45. Computation of lookaheads.Fig. 4.45. Computation of lookaheads.

SETSET ITEMITEMLOOKAHEADSLOOKAHEADS

INITINIT PASS1PASS1 PASS2PASS2 PASS3PASS3

II00:: S'S' ·· SS $$ $$ $$ $$

II11:: S'S' S S ·· $$ $$ $$

II22:: SS LL ·· = = RR $$ $$ $$

II22:: RR L L ·· $$ $$ $$

II33:: S S RR ·· $$ $$ $$

II44:: LL * * ·· RR == =/=/$$ =/=/$$ =/=/$$

II55:: LL id id ·· == =/=/$$ =/=/$$ =/=/$$

II66:: SS LL = = ·· RR $$ $$

II77:: LL * * RR ·· == =/=/$$ =/=/$$

II88:: RR LL ·· == =/=/$$ =/=/$$

II99:: SS LL = = R R ·· $$

Page 47: LR Parsing Table Costruction

47

Next time

- Yacc - Yacc 사용법은 조교가 설명사용법은 조교가 설명- Semantic - Semantic 처리 처리 (Yacc(Yacc 에서 배운 것 구현 방법에서 배운 것 구현 방법 ))