1 LR Parsing Techniques Bottom-Up Parsing - LR: a special form of BU Parser LR Parsing as Handle...

101
1 LR Parsing Techniques LR Parsing Techniques Bottom-Up Parsing - LR: a special form of BU Parser LR Parsing as Handle Pruning Shift-Reduce Parser (LR Impleme ntation) LR(k) Parsing Model - k lookaheads to determine next action Parsing Table Construction:

Transcript of 1 LR Parsing Techniques Bottom-Up Parsing - LR: a special form of BU Parser LR Parsing as Handle...

1

LR Parsing TechniquesLR Parsing Techniques

Bottom-Up Parsing- LR: a special form of BU Parser

LR Parsing as Handle PruningShift-Reduce Parser (LR Implementation)LR(k) Parsing Model

- k lookaheads to determine next actionParsing Table Construction:

SLR, LR, LALR

2

Bottom-Up ParsingBottom-Up Parsing• A bottom-up parser attempts to construct a

parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

3

Bottom-Up Parsing: Ex1Bottom-Up Parsing: Ex1

BU Parsing: Construct a parse tree from the leaves to the root: left-to-right reduction

G: S a A B e input: abbcdeA A b c | bB d

ca d eb

A

b

A

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

4

Bottom-Up Parsing: Ex2Bottom-Up Parsing: Ex2

BU Parsing: Construct a parse tree from the leaves to the root: random reduction

G: S a A B e input: abbcdeA A b c | bB d

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

B

ca d eb

A

b

5

LR Parsing: BU + Left-to-RightLR Parsing: BU + Left-to-Right• Many ways to construct a parse tree bottom-up

– Ex1 & Ex2– Prefer a simpler form of parser… Left-to-right scanning

• If scanning strictly Left-to-right Rightmost derivation in reverse (thus the name LR Parser). Why rm.? (…Ex1)– Never consider right terminals while reducing left (|N)*– Reduce left (|N)* (terminals or non-terminals) as much as

possible until no further reduce– Shift when no further reduce

Reversing the sequence of reduction corresponds to a rightmost derivation

• LR Parser– A special form of BU Parser– A parser with simpler form: left-to-right scan

6

LR Parsing: BU + Left-to-RightLR Parsing: BU + Left-to-Right

LR Parsing: Construct a parse tree from the leaves to the root, scanning left-to-right (resulting in rightmost derivation in reverse)

S a A B e input: abbcdeA A b c | bB d

ca d eb

A

b

A

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

abbcde rm aAbcde rm aAde rm aABe rm S

ca d eb

A

b

A

8

Rightmost Derivation in Rightmost Derivation in ReverseReverse

E

E E

E E

id1 + id2 * id3

1

23

45

E

E E

E E

id1 + id2 * id3

1 2

3 4

5

9

LR ParsingLR Parsing

The L stands for scanning the input from left to right

The R stands for constructing a rightmost derivation in reverse

10

LR ParsingLR Parsing LR Parsing =/= Leftmost Reduction

The 1st reducible substring does not always result in successful parse

Handle(s): those successfully lead to S

Top-Down: Expansion Matching

Bottom-Up: Shift/Reduce Locating next “handle” to reduce [How To??] Handle pruning: hide details below reduced (|N)*

ca d eb

A

b

A

A

ca d eb

A

b

11

HandlesHandles

NOT all (leftmost) reduction (A ) leads to the start symbol S: rm A rm (n)

rm S Only some handles do

A handle of a right-sentential form consists of– a production A – a position of where can be replaced by A to produce

the previous right-sentential form in a rightmost derivation of

abbcde rm aAbcde rm aAde rm aABe rm S

A b A A b c B d S a A B eHandles:

Right-sent. forms:

12

HandlesHandles

• If , then A in the position following is a handle of . (The string contains only terminal symbols.)

• We say “a handle” rather than “the handle” since the grammar may be ambiguous. But if the grammar is unambiguous, then every right sentential form has exactly one handle.

rmrm

AS*

15

LR Parsing as Handle PruningLR Parsing as Handle Pruning rm A rm S

S

A

The string to the right of the handle contains only terminals (A is the rightmost non-terminal) A is the leftmost complete interior node with all its children in the tree

Pruning: Find a string that is reducible to S and hide its details by reductionand proceed with the new sentential form.

Never consider right terminals while reducing left grammar symbols

16

An ExampleAn Example

S

S

BA

ca d eb

A

b

S

BA

ca d eb

A

S

BA

a d e

S

BA

a e

17

• A rightmost derivation in reverse can be obtained by handle pruning.

• Let G =

E E+E | E*E | (E) | id (ambiguous!)

Right-sententialform

Handle Reducingproduction

id1+id2*id3 id1 E→id

E+id2*id3 id2 E→id

E+E*id3 id3 E→id

E+E*E E*E E→E*E

E+E E+E E→E+E

E

rm

LR Parsing as Handle PruningLR Parsing as Handle Pruning(1st reduction sequence)(1st reduction sequence)

18

LR Parsing as Handle PruningLR Parsing as Handle Pruning(2nd reduction sequence)(2nd reduction sequence)

• A rightmost derivation in reverse can be obtained by handle pruning.

• Let G =

E E+E | E*E | (E) | id (ambiguous!)Right-sentential form

Handle Reducing production

id1+id2*id3 id1 E→id

E+id2*id3 id2 E→id

E+E*id3 E+E E→E+E

E*id3 id3 E→id

E*E E*E E→E*E

E

rm

20

Shift-Reduce ParsingShift-Reduce Parsing

Parsing program

Parsing table

Input

Output

Stack

Handle

rm A rm S

Areduce)

shift

21

Stack Implementation of Stack Implementation of Shift-Reduce ParsersShift-Reduce Parsers

• A convenient way to implement a shift-reduce parse is to use a stack to hold grammar symbols and an input buffer to hold the string to be parsed.

• a push-down machine with a tape

• The parser operates by shifting zero or more symbols onto the stack until a handle is on top of the stack. The parser then replaces/reduces with/to the left side of the appropriate production.

• This procedure repeats until the stack contains the start symbol and the input is empty.

22

Stack OperationsStack Operations

Shift: shift the next input symbol onto the top of the stack

Reduce: replace the handle at the top of the stack with the corresponding nonterminal

Accept: announce successful completion of the parsing

Error: call an error recovery routine

24

An ExampleAn Example

Action Stack InputS $ a b b c d e $S $ a b b c d e $R $ a b b c d e $S $ a A b c d e $S $ a A b c d e $R $ a A b c d e $S $ a A d e $R $ a A d e $S $ a A B e $R $ a A B e $A $ S $

25

Configurations of shift-reduce Configurations of shift-reduce parser on input parser on input idid11+id+id22*id*id33

Step Stack Input Action 1 $ id1+id2*id3$ shift 2 $id1 +id2*id3$ reduce by E id 3 $E +id2*id3$ shift 4 $E+ id2*id3$ shift 5 $E+id2 *id3$ reduce by E id 6 $E+E *id3$ shift 7 $E+E* id3$ shift 8 $E+E*id3 $ reduce by E id 9 $E+E*E $ reduce by E E*E 10 $E+E $ reduced by E E+E 11 $E $ accept

*Note: The grammar is ambiguous. Therefore, there is another possible reduction sequence.

26

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

How to represent parsing states so we can tell the right parsing actions to take?

0 1 2 4 5 7 9

3

6 8

10

27

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

How to represent parsing states so we can tell the right parsing actions to take?

0 1 2 4 5 7 9

3

6 8

10

28

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S0: . S . a A B e

0 1 2 4 5 7 9

3

6 8

10

29

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S1: S a . A B e (shift a) A . A b c A . b

0 1 2 4 5 7 9

3

6 8

10

30

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

• S2: A b . • (shift b, to reduce A b)

0 1 2 4 5 7 9

3

6 8

10

31

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S3: S a A . B e B . d A A . b c

0 1 2 4 5 7 9

3

6 8

10

32

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S4: A A b . c(shift b)

0 1 2 4 5 7 9

3

6 8

10

33

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S5: A A b c .(shift c, reduce A A b c )

0 1 2 4 5 7 9

3

6 8

10

34

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S6: S a A . B e B . d

0 1 2 4 5 7 9

3

6 8

10

35

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S7: B d . (shift d, reduce B d)

0 1 2 4 5 7 9

3

6 8

10

36

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S8: S a A B . e

0 1 2 4 5 7 9

3

6 8

10

37

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S9: S a A B e .(shift e, reduce S a A B e )

0 1 2 4 5 7 9

3

6 8

10

38

LR Parsing StatesLR Parsing States

G: S a A B e A A b c | b B dInput: abbcde

S

BA

ca d eb

A

b $

S10: S’ S .(S reduced)

0 1 2 4 5 7 9

3

6 8

10

39

LR Parsing StatesLR Parsing States S0: . S . a A B e S1: S a . A B e (shift a) A . A b c | . b A . A b c | . b (closed, no further expansion) S2: A b . (shift b, reduce A b) S3: S a A . B e B . d A A . b c S4: A A b . c (shift b) S5: A A b c . (shift c, reduce A A b c ) S6: S a A . B e B . d S7: B d . (shift d, reduce B d) S8: S a A B . e S9: S a A B e . (shift e, reduce S a A B e )

ca d eb

A

bA

ca d eb

A

bBA

ca d eb

A

b

S

BA

ca d eb

A

b

ca d ebb

G: S a A B e A A b c | b B dInput: abbcde

S10: S’ S . (S reduced)

40

Ambiguity: Sources of ConflictsAmbiguity: Sources of Conflicts

When trying to reduce a sub-string of the current sentential form:Not all reducible substrings are handlesAmbiguous: More than one substring as a handle

Sources of Conflictsnon-LR GrammarShift-reduce conflictsReduce-reduce conflicts

41

Shift/Reduce ConflictShift/Reduce Conflict

stmt if expr then stmt | if expr then stmt else stmt | other

Stack Input$ - - - if expr then stmt * else stmt - - - $

Shift if expr then stmt else stmt Reduce if expr then stmt

42

Reduce/Reduce ConflictReduce/Reduce Conflict(1) stmt id ( para_list ) // func(a,b) (2) stmt expr := expr (3) para_list para_list , para(4) para_list para(5) para id(6) expr id ( expr_list ) // array(a,b)(7) expr id(8) expr_list expr_list , expr(9) expr_list expr

Stack Input(a) $ - - - id ( id , id ) - - - $ [Q: r5? r7?] [Sol: use “stmt procid ( para_list )” => (a) r7 (b) r5](b) $- - - procid ( id , id ) - - - $ [r5]

-Need a complex lexical analyzer to identify id vs. procid- Reduction depends on stack[sp-2]

43

LR(k) GrammarsLR(k) Grammars

Only some classes of grammars, known as the “LR(k) Grammars,” can be parsed deterministically by a shift-reduce parserCFG’s that are non-LR may need some

adaptation to make them deterministically parsed with a shift-reduce parser

Parsing Table ConstructionPredict handles at each positions (after shifts)

44

LR(k) ParsingLR(k) Parsing

The L stands for scanning the input from left to right

The R stands for constructing a rightmost derivation in reverse

The k stands for the number of lookahead input symbols used to make parsing decisions

45

LR ParsingLR Parsing

The LR parsing algorithm

Constructing SLR(1) parsing tables

Constructing LR(1) parsing tables

Constructing LALR(1) parsing tables

46

Model of an LR ParserModel of an LR Parser

LRParsing Program

Input

Output

Stack

Action Goto

Sm

Sm-1

Xm-1

Xm

S0Parsing table

State after

action

State before action

Initial State

Shift/Reduce State after Reductionhandle

47

Parsing Table for Expression Parsing Table for Expression GrammarGrammar

(0) E’ E (1) E E + T (2) E T(3) T T * F (4) T F(5) F ( E ) (6) F id

Follow(E)={+,),$}Follow(T)={+,),$,*}Follow(F)={+,),$,*}

State Action Goto id + * ( ) $ E T F0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r110 r3 r3 r3 r311 r5 r5 r5 r5

48

GOTO ActionsGOTO Actions

I0: E’ . E E . E + T E . T T . T * F T . F F . ( E ) F . id

I1: E’ E . E E . + T

I2: E T . T T . * F

I3: T F .

I4: F ( . E ) E . E + T E . T T . T * F T . F F . ( E ) F . id

I5: F id .id

(

E

T

F

Before reduction

After reduction0 E 1

0 T 2

0 F 3

0 id 5

49

LR Parsing AlgorithmLR Parsing Algorithm

• Input:– An input string and an LR parsing table with functions

action and goto for a grammar G.

• Output:– If is in L(G), a bottom-up parse for ; otherwise, an err

or indication.

• Method:– Initially, the parser has s0 on its stack, where s0 is the initi

al state, and $ in the input buffer.

– Shift/reduce according to the parsing table (See next Page)

50

LR Parsing ProgramLR Parsing Programwhile (1) do { s := the state of top of the stack; a := get input token; if (action[s,a] == shift s’) { push a then s’ on top of the stack; a = get input token; } else if (action[s,a] == reduce A->) { pop 2*|| symbols off the stack; s’ = the state now on top of the stack; push A then goto[s’,A] on top of the stack; output the production A->; } else if (action[s,a] == accept) return; else error();}

51

Stack Input shift/reduce+goto Action

(1) 0 id * id + id $ (0,id):s5 Shift

(2) 0 id 5 * id + id $ (5,*):r6; (0,F):3 Reduce by F id

(3) 0 F 3 * id + id $ (3,*):r4; (0,T):2 Reduce by T F

(4) 0 T 2 * id + id $ (2,*):s7 Shift

(5) 0 T 2 * 7 id + id $ (7,id):s5 Shift

(6) 0 T 2 * 7 id 5 + id $ (5,+):r6; (7,F):10 Reduce by F id

(7) 0 T 2 * 7 F 10 + id $ (10,+):r3; (0,T):2 Reduce by T T*F

(8) 0 T 2 + id $ (2,+):r2; (0,E):1 Reduce by E T

(9) 0 E 1 + id $ (1,+):s6 Shift

(10) 0 E 1 + 6 id $ (6,id):s5 Shift

(11) 0 E 1 + 6 id 5 $ (5,$):r6; (6,F):3 Reduce by F id

(12) 0 E 1 + 6 F 3 $ (3,$):r4; (6,T):9 Reduce by T F

(13) 0 E 1 + 6 T 9 $ (9,$):r1; (0,E):1 Reduce by E E+T

(14) 0 E 1 $ (1,$):acc Accept

LR Parsing on LR Parsing on idid11*id*id22+id+id33

52

LR Parsing AdvantagesLR Parsing Advantages

Efficient: non-backtracking Efficient Parsing Efficient Error detection (& correction)

Detect syntax error as soon as one appear during L-o-R scan

Coverage: virtually all programming languages

G(LR) > G(TD predictive parsing)

Disadvantages: Too much work to construct by hands (YACC)

53

How To: LR Parsing (repeated)How To: LR Parsing (repeated) LR Parsing =/= Leftmost Reduction

The 1st reducible substring does not always result in successful parse

Handle(s): those successfully lead to S

Top-Down: Expansion Matching

Bottom-Up: Shift/Reduce Locating next “handle” to reduce [How To??] Handle pruning: hide details below reduced (|N)*

54

LR Parsing Table ConstructiLR Parsing Table Construction Techniqueson Techniques

Parsing Table Construction:

SLR(1) Parser- LR(0) Items & States

LR(1) Parser- shift/reduce conflict resolution

- LR(1) Items & States

LALR(1) Parser- LR(1) state merge

- reduce-reduce conflict

56

SLR ParserSLR Parser

Coverage: weakest in terms of #grammars it succeeds Easiest to construct

Parser: a DFA for recognizing viable prefixes States: Sets of LR(0) Items

The items in a set can be viewed as the states of an NFA recognizing viable prefixes

Grouping items into sets is equivalent to subset construction

57

LR Parsing StatesLR Parsing States S0: . S . a A B e S1: S a . A B e (shift a) A . A b c | . b A . A b c | . b (closed, no further expansion) S2: A b . (shift b, reduce A b) S3: S a A . B e B . d A A . b c S4: A A b . c (shift b) S5: A A b c . (shift c, reduce A A b c ) S6: S a A . B e B . d S7: B d . (shift d, reduce B d) S8: S a A B . e S9: S a A B e . (shift e, reduce S a A B e )

ca d eb

A

bA

ca d eb

A

bBA

ca d eb

A

b

S

BA

ca d eb

A

b

ca d ebb

G: S a A B e A A b c | b B dInput: abbcde

S10: S’ S . (S reduced)

60

Viable PrefixViable Prefix

• The set of prefixes of c.s.f.’s (canonical/right sentential forms) that can appear on the stack of a shift-reduce parser are called viable prefixes.

• Equivalently, it is a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form

• If is a viable prefix, then w * w is a c.s.f.

61

Item and Valid ItemItem and Valid Item

• An LR(0) item (item for short) is a marked production

[A 1•2] (dotted rule: production with a dot at RHS)

• An item [A 1•2] is said to be valid for some viable prefix 1 iff w * S*Aw 12w

• The “•” represents where we are now during parsing– Left of dot: those scanned

– Right of dot: those to be visited later

S

A w

1 2

62

Example of Valid ItemExample of Valid Item

• Consider the grammar:S • 1C | • DC 3 | 4D • 1BB 2

• Valid items for the viable prefix :[S • 1C], [S • D], and [D • 1B]

S

S

S

D

1 B

1 C

or

63

Example of Valid Item (cont.)Example of Valid Item (cont.)• Assume 1 ’, i.e.,

could be

Valid items for the viable prefix “1”: [S 1 • C], [C • 3], [C • 4], [D 1 • B], and [B • 2]

S

'

S

D

1 B

or

S

1 C

S

1 C

2

43

64

Example of Valid Item (cont.)Example of Valid Item (cont.)

• Assume

• Valid item for viable prefix “13”: [C 3 • ]

• Valid item for viable prefix “1C”: [S 1 C • ]

S

1 C

3

65

Closure: All Valid Items Closure: All Valid Items Enumerable from GEnumerable from G• Given a grammar

E’ EE E+T | TT T*F | FF (E) | id

• What are valid items for the viable prefix “…E+” ?• [E E+ • T], but also [T ...•F] since

• E’*E+T T F E+ F

• Likewise, [T •T*F], [T •F], [F • (E)] , [F •id]– called Closure of [E E+•T] (inclusive)

1

2

3

4

1

2

3

4

1

66

Computation of ClosureComputation of Closure

• Given a set, I, of items• Initially Closure(I) = I• Loop: for all items [A •B]…• If [A •B] is in Closure(I) and B is in P,

then include [B • ] into Closure(I).• Repeat the Loop until no new dotted rules can be

added

• Initial set of items for a grammar:– I0 = Closure({[S’ •S] })– (S: start symbol, S’: augmented start symbol)

67

GOTO ComputationGOTO Computation

• Let I be a set of items which are valid for some viable prefix .

• Then goto(I,X), where X(N or Σ), is the set of items which are valid for the viable prefix X.

• So [A •X] in I implies Closure({[A X • ]}) in goto(I,X)

S* A]w • X w X • w ([]: set of items I, including [A •X] others)

=

68

Sets of LR(0) Items Sets of LR(0) Items ConstructionConstruction

• Augment the grammar with: S’ S

• Let I0 = Closure({[S’ •S] }), C = {I0}

while (not all elements of C are marked) {

-select an unmarked item set of C (say “I”) and mark it;

- X (V or Σ), if goto(I,X) is not already in C, then add goto(I,X) to C (unmarked);

}

• also called Characteristic Finite State Machine (CFSM) Construction Algorithm.

69

SLR(1) Parsing ActionsSLR(1) Parsing Actions

• Compute the CFSM states C={I0,I1,…,In}.1. If [A •a] Ii and goto(Ii,a) = Ij then set action(Ii,a) = s

hift,Ij (where ‘a’ is a terminal)2. If [A •] Ii then set action(Ii,a) = reduce A for all

a in Follow(A)1. A terminal a in Follow(A) does not guarantee that A will resu

lt in a successful parse. (not necessarily a “handle”)2. But, a terminal NOT in Follow(A) will definitely indicate an impo

ssible parse.3. So reduction on symbols in Follow(A) is only a loose criterion for

possible success parse.

3. If [S’ S•] Ii then set action(Ii,$) = accept4. Other action(*,*) = error

70

ConflictsConflicts

• Shift-reduce conflicts: both a shift action and a reduce action are possible in the same Closure.– E.g., state 2 in Figure 4.37 (p.229) [Aho 86]

• Reduce-reduce conflicts: two or more distinct reduce actions are possible in the same Closure.

71

Example: Grammar G for Example: Grammar G for Math ExpressionsMath Expressions

(0) E’ E(1) E E+T(2) E T(3) T T*F(4) T F(5) F (E)(6) F id

Follow(E)={+,),$}, Follow(T)={+,),$,*}, Follow(F)={+,),$,*}

72

Computing SLR(1) States for GComputing SLR(1) States for G

• an SLR(1) State = a set of LR(0) items

• (See the next slide, Fig. 4.35, page 225, [Aho 86])

73

Canonical LR(0) Collection for GCanonical LR(0) Collection for GI0: E’ . E E . E + T E . T T . T * F T . F F . ( E ) F . id

I1: E’ E . E E . + T

I2: E T . T T . * F

I3: T F .

I4: F ( . E ) E . E + T E . T T . T * F T . F F . ( E ) F . id

I5: F id .

I6: E E + . T T . T * F T . F F . ( E ) F . id

I7: T T * . F F . ( E ) F . id

I8: F ( E . ) E E . + T

I9: E E + T . T T . * F

I10: T T * F .

I11: F ( E ) .

id

(

E

T

F

+

*

E

FT

(

id

T

F

( id

F

)

+

*( id

75

GOTO ActionsGOTO Actions

I0: E’ . E E . E + T E . T T . T * F T . F F . ( E ) F . id

I1: E’ E . E E . + T

I2: E T . T T . * F

I3: T F .

I4: F ( . E ) E . E + T E . T T . T * F T . F F . ( E ) F . id

I5: F id .id

(

E

T

F

Before reduction

After reduction0 E 1

0 T 2

0 F 3

0 id 5

77

Parsing Table for Expression Parsing Table for Expression GrammarGrammar

(0) E’ E (1) E E + T (2) E T(3) T T * F (4) T F(5) F ( E ) (6) F id

Follow(E)={+,),$}Follow(T)={+,),$,*}Follow(F)={+,),$,*}

State Action Goto id + * ( ) $ E T F0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r110 r3 r3 r3 r311 r5 r5 r5 r5

78

Transition Diagram of DFA D Transition Diagram of DFA D for Viable Prefixesfor Viable Prefixes

• State transition in terms of sets of LR(0) items (Fig. 4.36)

• SLR(1) Parsing Table: (Fig. 4.31)– Ii = “a” => Ij: action(i,a) = shift-j– Ii = “A” => Ij: goto(i,A) = j– Ii : [A . ] action(i,FOLLOW

(A)) = reduce [A • If A = S’ (augmented start symbol ) action(i,$)=acc

ept

79

Visualizing Transitions in the Visualizing Transitions in the Transition DiagramTransition Diagram

• Shift: moving forward one step along arc– Equivalent to pushing input symbols

• Reduce “LHS RHS”: moving backward to a previous state ‘s’ along arcs labeled with the RHS symbols– Then GOTO(s, LHS)

• equivalent to popping RHS symbols from stack then pushing LHS, then redefining current state

80

Parsing Table for Expression Parsing Table for Expression GrammarGrammar

action gotoStateid + * ( ) $ E T F

0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r1

10 r3 r3 r3 r311 r5 r5 r5 r5

81

LR Parsing Table ConstructiLR Parsing Table Construction Techniques…on Techniques…

Canonical LR Parsing Table …LALR Parsing Table …(See Textbook …)

82

Canonical LR ParserCanonical LR Parser

• SLR(1) parser does NOT always work– SLR(1) Grammar => Unambiguous– Unambiguous CFG =/=> SLR(1) Grammar

• E.g., Shift-reduce conflicts in the SLR(1) parsing table may NOT be a real shift-reduce conflict (e.g., impossible “reduce”)

• Need more specific & additional information to define states [to avoid false reductions]– use LR(1) items, instead of LR(0) items– Much more states than SLR(1)

• Need (canonical) LR(1) or LALR(1) Parsers (Parsing Table construction methods)

83

Example: non-SLR(1) Example: non-SLR(1) Grammar for AssignmentGrammar for Assignment

(0) S’ S

(1) S L = R

(2) S R

(3) L * R (content of R)

(4) L id

(5) R L

I3:

(2) S R .

‘=’ Follow(S)

I2:

(1) S L . = R

(5) R L .

Action(2,‘=’) =shift 6

Action(2,‘=’) = reduce 5

Follow(R) = {‘=’, …}

IF: Reduce on ‘=’ Goto I3 Error (Follow(S)) NOT Really Reducible

L

R

S => L = R => *R = R

84

Example: non-SLR(1) Example: non-SLR(1) Grammar for AssignmentGrammar for Assignment

• Problem:– G is unambiguous– SLR Shift/Reduce conflict is false, but– SLR parsing table is unable to remember

enough left context to decide proper action on ‘=’ when seeing a string reducible to L

85

Why UnambiguousWhy UnambiguousYet Non-SLR(1)Yet Non-SLR(1)

• Some reduce actions are not really reducible by checking input against Follow(LHS)– Not all symbols in FOLLOW(LHS) result in

successful reduction to S.

– May fail after a few steps of reductions.

• SLR(1) states does not resolve such conflicts by using LR(0)-item defined states– Need more specific constraints to rule out a subset

of Follow(LHS) from indicating a reduction action

86

LR(1) Parsing Table LR(1) Parsing Table ConstructionConstruction

• SLR: reduce A → on input ‘a’ if Ii contains [A → .] & ‘a’ FOLLOW(A)– Not really reducible for all ‘a’ FOLLOW(A)– Only a subset (maybe proper subset)– But on some cases: S a =/=> A a

• Reduce A → does not produce a right sentential form

– E.g., “S L • = R …” =/=> “S R • = R …”– although “S *R • = R” ‘=‘ in follow(R)

87

LR(1) Parsing Table LR(1) Parsing Table ConstructionConstruction

• Solution:– Define each state by including more specific informatio

n to rule out invalid reductions

– Sometimes results in splitting states of the same “core”

• LR(0) items: [A → . ]– Only dotted production (the “core”)

• LR(1) items: [A → . , LA’s]– Dotted production(the “core”), plus lookaheads that all

ow reduction upon [A → ]• “1”: length of LA symbols

88

LR(1) Parsing Table LR(1) Parsing Table ConstructionConstruction

• [A → . , a] (& ≠) : LA (‘a’) has no effect on items of this form

• [A → . , a] (i.e., =): LA has effect on items of this form– Reduction is called for only when next input is

‘a’ (not all terminal symbols in Follow(A))– Only a subset in Follow(A) will be the right

LA’s• Initially, only one restriction is known: [S’ → . S, $]• Infer other restrictions by closure computation

89

LR(1) Item and Valid ItemLR(1) Item and Valid Item

• An LR(1) item is a dotted production plus lookahead symbols: [A •,, a]

• An LR(1) item [A •,, a] is said to be valid for a viable prefix if r.m. derivation S* A w w, where2. ‘a’ First(w) (or w= && a = ‘$’)

• The “•” represents where we are now during parsing– Left of dot: those scanned– Right of dot: those to be visited later

90

LR(1) Parsing Table LR(1) Parsing Table ConstructionConstruction

• Change the closure() and goto() functions of SLR parsing table construction, with initial collection:– C = {closure({S’ . S, $})}– [A •Ba] valid implies [B • , b] valid if b is in

FIRST(a)

• Construction method for set of LR(1) items– See next few pages

91

LR(1): Closure(I)LR(1): Closure(I)

• Given a set, I, of items• Initially Closure(I) = I• Repeat:

– for each items [A •Ba] in I,

– each production B is in G’,

– and each terminal b in FIRST(a),

– include [B • , b] to Closure(I).

• Until no more items can be added to I

92

LR(1): GOTO(I,X)LR(1): GOTO(I,X)

• Let J = {[A X • , a] | such that [A •Xa] is in I}.

• goto(I,X) = closure(J)

• That is:– J = {}

– For all [A •Xa] in I, J += {[A X • , a]}

– Return(closure(J))

J: [A X • , a][A’ ’X • ’, a’]…

I: [A •X , a][A’ ’ •X ’, a’]…

Goto(I,X) = Closure({[A X • , a],[A’ ’X • ’, a’]})

X

93

Sets of LR(1) Items Sets of LR(1) Items ConstructionConstruction

• Augment the grammar with: S’ S, call it G’

• Let I0 = Closure({[S’ •S, $] }), C = {I0}

Repeat {

- I C, - X(N or Σ), if goto(I,X) is not already in C, then add goto(I,X) to C

}

Until no more sets of items can be added to C

94

Example: resolving shift/reduce Example: resolving shift/reduce conflicts with LR(1) itemsconflicts with LR(1) items

• G’: {S’S, S CC, C cC|d}• L(G)={ cm d cn d }• => I0 ~ I9 (Fig. 4.39, p. 235 [Aho 86])• I3 vs. I6: same set of LR(0) items with differe

nt lookaheads• Conditions for reduction are different

– I3: reduce on c/d (when constructing 1st ‘C’)– I6: reduce on $ (when constructing 2nd ‘C’)

95

SLR(1) Goto GraphSLR(1) Goto Graph

I0: S’ . S S . C C C . c C C . d

I1: S’ S . [$]

I2: S C . C C . c C C . d

S

CI5: S C C . [$]

I3: C c . C C . c C C . d

I4: C d . , c/d/$

I8: C c C .,c/d/$

d

cC

c

d

C

d

c

G: S’ S S C C C c C C d

Follow Sets:S: {$}C: {c,d,$}

96

LR(1) Goto GraphLR(1) Goto Graph

I0: S’ . S, $ S . C C, $ C . c C, c/d C . d , c/d

I1: S’ S ., $

I2: S C . C, $ C . c C, $ C . d , $

S

CI5: S C C ., $

I3: C c . C, c/d C . c C, c/d C . d , c/d

I4: C d . , c/d

I8: C c C . , c/d

d

cC

c

d

C

I6: C c . C, $ C . c C, $ C . d , $

I7: C d . , $

I9: C c C . , $

d

cC

c

d

G: S’ S S C C C c C C d

97

Construction of Canonical Construction of Canonical LR(1) Parsing TableLR(1) Parsing Table

• Algorithm 4.10– Shift: (same as SLR, ignoring LA in item)– Reduce on ‘a’: [A •,, a] – Accept on ‘$’: [S’ S •,, $] – Goto: (same as SLR)

• LR(1) Grammar:– a grammar without conflicts (multiply defined a

ctions) in LR(1) Parsing Table

98

SLR(1) vs. LR(1)SLR(1) vs. LR(1)

• LR(1): more specific states– May split into states with the same “core” but w

ith different lookaheads– SLR(1) Grammar LR(1) Grammar– Number of states LR(1) >> SLR(1)

99

LALR(1)LALR(1)

• Merge LR(1) states with the same core, while retaining lookahead symbols– Considerably smaller than canonical LR tables

• Most programming language constructs can be expressed by an LALR grammar

– SLR and LALR have the same number of states• Without/with lookahead symbols [full/subset of FOLLOW]• Several hundred states for PASCAL• Several thousands, if using LR(1)

• G is an LALR(1) Grammar: if no conflicts after state merge

100

LALR(1) vs. LR(1)LALR(1) vs. LR(1)

• Effect of LR(1) state merge:– The merging of states with common cores can n

ever produce a shift-reduce conflict that was not present in one of the original states

• Because shift actions depend only on the core, not the lookahead

– However, a merge may produce a reduce-reduce conflict.

• Because union of lookaheads may introduce unnecessary reductions

101

LALR(1) vs. LR(1)LALR(1) vs. LR(1)

• Example: merging that produces reduce-reduce conflicts.– LR(1) Grammar:

• S’ S• S a A d | b B d | a B e | b A e• A c• B c

– Sets of LR(1) items:• {[A c . , d], [B c . , e]} (valid for viable prefix ac)• {[A c . , e], [B c . , d]} (valid for viable prefix bc)

– Merging states with common cores {[A c . , d/e], [B c . , d/e]}• merging also merges loohaheads

– Reduce-reduce conflicts:• A c and B c , on inputs d and e

102

LALR(1) vs. LR(1)LALR(1) vs. LR(1)

• Effect of LR(1) state merge:– Behave like the original, or– Declare error later, but before shifting next

input symbol– For correct input: LR and LALR have the same

sequence of shift/reduce– For erroneous input: LALR requires extra

reduces after LR has detected an error (but before shifting next)

103

Example: Merge States with Example: Merge States with Same CoreSame Core

• Fig. 4.39: I4 vs. I7 – same reduction with different lookaheads

• State merge:– dotted rules remain, LA’s merged

• Examples:– I3 + I6 => I36– I4 + I7 => I47– I8 + I9 => I89– Same as SLR(1) table (Fig. 4.41, p239, [Aho 86])

104

LALR(1) Parsing Table LALR(1) Parsing Table Construction (I)Construction (I)

• Method 1: (Naïve Method)– [1] Construct LR(1) parsing table

• Very costly [#states is normally very large]

– [2] Merge states with the same core

105

LALR(1) Parsing Table LALR(1) Parsing Table Construction (II)Construction (II)

• Method 2: (Efficient Construction Method)– [1] Construct kernels set of LR(0) items, from [S

’•S] • It is Possible to Compute shift/reduce/goto actions dir

ectly from kernel items• kernel items: items whose dot is not at the beginning,

except [S’ . S, $]: those not derived from closure()– Can represent a set of items

– [2] Append lookaheads• Compute initial spontaneous lookaheads, and• those item pairs that pass Propagated lookaheads

106

LALR(1) Parsing Table LALR(1) Parsing Table Construction (II.1)Construction (II.1)

• Compute shift/reduce/goto actions directly from kernel items: (pps. 240-241)– Reduce:– Shift:– Goto:– Need to pre-compute First’(C) = {A | r.m. C*

A } for all pairs of nonterminals (C, A) and

107

LALR(1) Parsing Table LALR(1) Parsing Table Construction (II.2)Construction (II.2)

• Determine spontaneous and propagated lookaheads (Fig. 4.43)– Compute closure({core,#}) by assuming a “du

mmy lookahead” ‘#’

108

LALR(1) Parsing Table LALR(1) Parsing Table Construction: ExampleConstruction: Example

• Example: 4.46/Fig. 4.42 [p. 241, Aho 86]– Kernels of sets of LR(0) items

Fig. 4.37 [with non-kernel items]

• Example: 4.47– Get Spontaneous & Propagated lookaheads

• Fig. 4.44: item pairs that propagate lookaheads• Fig. 4.45: initial spontaneous lookahead, and multipl

e passes of lookahead propagation

• LALR(1) parsing table:– Todo by yourself

109

LALR(1) Parsing Table LALR(1) Parsing Table ConstructionConstruction

• LALR(/LR) (Fig 4.45) SLR (Fig. 4.37)– SLR: I2: shift/reduce conflict on ‘=’

– LALR(/LR): I2: shift on ‘=’, reduce on ‘$’, NO conflict I2:

(1) S L . = R, $

(5) R L . , $

I2:

(1) S L . = R

(5) R L .

110

Using Ambiguous GrammarUsing Ambiguous Grammar

• (see Handouts)

111

Parser GeneratorsParser Generators

• YACC (Slide Part II)