LR Parsing Build Bottom-Up Parse Tree Handles Writing a Shift-Reduce Parser

Spring 2014 Jim Hogg - UW - CSE - P501 D-1

LR Parsing

Build Bottom-Up Parse Tree

Handles

Writing a Shift-Reduce Parser

ACTION & GOTO Parse Tables

Dotted Items

SR & RR conflicts

Next

CSE P501 – Compilers

Spring 2014 Jim Hogg - UW - CSE P501 D-2

Source TargetFront End Back End

Scan

chars

tokens

AST

IR

AST = Abstract Syntax Tree

IR = Intermediate Representation

‘Middle End’

Optimize

Select Instructions

Parse

Convert

Allocate Registers

Emit

Machine Code

IR

IR

IR

IR

IR

LR Parsing


S aABeA Abc | bB d

a b b c d

e

Build Bottom-Up Parse Tree

Black dot marks how much we've read of the input token stream so far (none)

Shift dot to right


S aABeA Abc | bB d

a b b c d

e

abbcde – done wrong – step 2

Can we reduce a ? No. Shift dot to right


S aABeA Abc | bB d

a b b c d

e


Can we reduce a or ab? Yes, using A b.

Note: a b (in red) marks current frontier


S aABeA Abc | bB d

A | a b b c d

e


Note: frontier is now aA

Can we reduce aA or A ? No. Shift dot


S aABeA Abc | bB d

A |a b b c d e


Can we reduce aAb or Ab or b? Yes, using A b


A A | | a b b c d e


Can we reduce aAA or AA or A? No. Shift dot

S aABeA Abc | bB d


A A | | a b b c d e


Can we reduce aAAc or AAc or Ac or c? No. Shift dot

S aABeA Abc | bB d


A A | | a b b c d e


Can we reduce aAAcd or AAcd or Acd or dd or d? No. Shift dot

S aABeA Abc | bB d


S a A B eA A b c | bB d

A A | | a b b c d e


Can we reduce aAAcde or AAcde or Acde or cde or de or e? No.

We are stuck!


Rewind

• Rewind to step 5

• Let's not reduce using A b. Shift instead

• Try again!


A |a b b c d e

abbcde – done right – step 5

Can we reduce aAb or Ab or b? Yes, we could, using A b. But it led to a dead-end, first time. So do not reduce. Instead, shift

S aABeA Abc | bB d


A |a b b c d e


Can we reduce aAbc or Abc or bc or c? Yes, using A Abc.

S aABeA Abc | bB d


A | A |a b b c d e


Can we reduce aA or A? No. Shift

S aABeA Abc | bB d


A | A |a b b c d e


Can we reduce aAd or Ad or d? Yes, using B d

S aABeA Abc | bB d


A | A B | |a b b c d e


Can we reduce aAB or AB? No. Shift

S aABeA Abc | bB d


A | A B | |a b b c d e


Can we reduce aABe or ABe or Be or e ? Yes, using S aABe

S aABeA Abc | bB d


S A | A B | |a b b c d e


S aABeA Abc | bB d

We just executed a rightmost derivation, backwards:

S => a A B e => a A d e => a A b c d e => a b b c d e


LR(1) Parsing

Left to right scan; Rightmost derivation; 1 token lookahead

"Bottom-Up" approach. Also called Shift-Reduce parser

The syntax of almost all practical programming languages can be specified by an LR(1) grammar

LALR(1) and SLR are subsets of LR(1) LALR(1) can parse most real languages, has a smaller memory

footprint, and is used by CUP All variants (SLR, LALR, LR) use same algorithm – but different

driver tables


LR Parsing in Greek

Bottom-up parser builds a rightmost derivation, backwards

Given the rightmost derivation: S =>1=>2=>…=>n-2=>n-1=>n = w

parser will first discover n-1=>n , then n-2=>n-1 , etc

But it discovers n-1=>n before seeing all of n : S => a A B e => a A d e => a A b c d e => a b b c d e

X denotes rightmost terminal to derive

S <= a A B e <= a A d e <= a A b c d e <= a b b c d e denotes handle denotes top-of-stack = end of input so far seen

Parsing terminates when 1 reduced to S (start symbol, success), or No match can be found (syntax error)


Terminology : Sentential Forms

If S =>* , the string is called a sentential form of the of the grammar (not yet a sentence, but on its way)

In the derivation

S =>1=>2=>…=>n-2=>n-1=>n = w

each of the i are sentential forms

A sentential form in a rightmost derivation is called a right sentential form


Informally, a handle is a substring of the frontier that matches the RHS of the "correct " production

Even if A is a production, is a handle only if it matches the frontier at a point where A was used in the derivation. So, it’s a handle if we should reduce by it (yes, this definition is circular)

may appear in many other places in the frontier without being a handle for that particular production

Handles


Handle – the Dragon Definition

Formally, a handle of a right-sentential form is a production A and a position in where may be replaced by A to produce the previous right-sentential form in the rightmost derivation of


Handle Examples

In the derivation:S => a A B e => a A d e => a A b c d e => abbcde

abbcde is a right sentential form whose handle is Ab at position 2

aAbcde is a right sentential form whose handle is AAbc at position 4

Note: some books take the left of the match as the position – but it really doesn't matter


Key Data structures

A stack holding the frontier of the tree

The token-stream of remaining input

Writing a Shift-Reduce Parser


Shift-Reduce Parser Operations

Reduce – if the top of stack is a handle (RHS of some A that we should use to reduce), pop , push A

Shift – push the next input symbol onto the stack

Accept – announce success

Error – syntax error discovered


Shift-Reduce Example – Step 0

Stack Input Action$ abbcde$ shift

S aABeA Abc | bB d

Note:

• “$” marks bottom-of-stack• “$” also marks end-of-input• Neither one takes part in the parse. They don’t

move.


Shift-Reduce Example – Step 1

Stack Input Action$ abbcde$ shift$a bbcde$ shift

• At each step, look for a handle at top-of-stack• If we find a handle, then reduce• If we don’t find a handle, then shift

• We are relying on clairvoyance - foretelling the future - to decide which RHSs at top-of-stack are handles

S aABeA Abc | bB d


Shift-Reduce Example

Stack Input Action0 $ abbcde$ shift1 $a bbcde$

shift2 $ab bcde$

reduce3 $aA bcde$ shift4 $aAb cde$ shift5 $aAbc de$ reduce6 $aA de$ shift7 $aAd e$ reduce8 $aAB e$ shift9 $aABe $ reduce10 $S $

accept

S aAB eA Abc | bB d


How Do We Automate This?

Lacking a clairvoyance function, we could resort to back-tracking. But it's too slow. It's a non-starter

Viable prefix – a prefix of a right-sentential form that can appear on the stack of the shift-reduce parser (on its way to a successful parse)

Idea: Construct a DFA to recognize viable prefixes given the stack and (one or two tokens from) remaining input

Perform reductions when we recognize handles


DFA for prefixes for:

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA A b c

B

e

S a A B eaccept

$

• We have augmented the grammar with a unique start symbol, S’• This DFA replaces our clairvoyance function – equally magical at this

point!• States 4,5,7,9 of this DFA define the handles• Eg: if stack is …aAbc then reduce using A Abc (always at top of

stack); then unwind, back to state 1

S' S$S aABeA Abc | bB d


Trace – Step 1

Stack Input$ abbcde$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = { } DFA = state 1; not a “reduce” state, {4,5,7,9}, so

shift



Trace – Step 2

Stack Input$a bbcde$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = a DFA = 2; not a “reduce” state, so

shift



Trace – Step 3

Stack Input$ab bcde$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = ab DFA = 4 => reduce, using production A b So, pop b (RHS of production); and push A (LHS or

production) Retrace to DFA state 1



Trace – Step 4

Stack Input$aA bcde$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aA So transition states 1 2 3 DFA = 3 => shift



Trace – Step 5

Stack Input$aAb cde$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aAb DFA = 6 => shift



Trace – Step 6

Stack Input$aAbc de$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aAbc DFA = 7 => reduce by A Abc So, pop Abc push A Retreat to state 1



Trace – Step 7

Stack Input$aA de$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aA DFA = 3 => shift



Trace – Step 8

Stack Input$aAd e$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aAd DFA = 5 => reduce by B d So, pop d, push B Retreat to DFA = 1



Trace – Step 9

Stack Input$aAB e$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = aAB So transition states 1 2 3 8 DFA = 8 => shift



Trace – Step 10

Stack Input$aABe $

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

• Stack = aABe• DFA = 9 => reduce by S aABe• So pop aABe, push S• Retreat to DFA = 1



Trace – Step 11

Stack Input$S $

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = S DFA = 1 => shift



Trace – Step 12

Stack Input$S$

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Stack = S$ => accept


Cast out the Magic

We started with a magical clairvoyance function We replaced this with, an equally magical, DFA

The DFA approach included too much repetition: retreat to DFA = 1, then rescan the stack to find the new DFA state we only replaced the handle with its NonTerminal LHS, so first part of

stack is unchanged Want the parser to run in linear time – proportional to total number of

tokens

How do avoid repetition? How to construct the magic DFA, for any grammar?



Avoiding DFA Rescanning

Observe: after a reduce, the contents of the stack are little altered: we replaced handle at top-of-stack with its LHS non-terminal

So, re-scanning the stack will step thru same DFA transitions as before, until the last one

So, record trace of DFA state numbers on stack to avoid the rescan


LR Stack

DFA pictures are nice, but we want a program to do it

Could change the stack to hold <state, token> pairs. Perhaps easier to understand and/or debug? $ <s0,X0> <s1,X1> . . . <sn,Xn>

But, all we need are the states (think about it!) on a reduce, pop top states - reduce rule tells us how many then push corresponding LHS, non-terminal


Reminder - DFA for:

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA A b c

B

e

S a A B eaccept

$

=> shift

=> reduce

1. S aABe2. A Abc3. A b4. B d

D-49

State

ACTION GOTO

a b c d e $ A B S

1 s2

acc

g1

2 s4 g3

3 s6 s5 g8

4 r3

r3 r3 r3 r3 r3

5 r4

r4 r4 r4 r4 r4

6 s7

7 r2

r2 r2 r2 r2 r2

8 s9

9 r1

r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

KeysN = shift; transition to state number NrR = reduce using rule RgN = goto state number Nacc = acceptblank = syntax error in program

ACTION & GOTO Parse Tables



DFA Transition Tables: Summary

ACTION => what to do after a shift Row = current state Column = next token (terminal) sN = shift; move to DFA state number N rR = reduce, using Rule number R acc = accept the input program blank = syntax error in input program

report, recover, continue for P501, just report and stop!

GOTO => what to do after a reduce Row = current state (top-of-stack, after pushing non-terminal) –

think of this as the uncovered state Column = LHS of reduction (non-terminal) gN = goto DFA state number N blank = bug in the GOTO table

StateACTION GOTO

a b c d e $ A B S1 s2 acc g12 s4 g33 s6 s5 g84 r3 r3 r3 r3 r3 r35 r4 r4 r4 r4 r4 r46 s77 r2 r2 r2 r2 r2 r28 s99 r1 r1 r1 r1 r1 r1


LR Parsing Algorithm

terminal = getToken()while (true)

s = top-of-stack // current DFA state numberif (ACTION[s, terminal] = si ) // shift; and transition to state i push i // new state terminal = getToken()elseif (ACTION[s, terminal] = rj ) // reduce, using rule number j pop (length of RHS of Rj) times // || uncovered = top-of-stack push LHS of Rj // A, in A

push GOTO[uncovered, A]elseif (ACTION[s, terminal] == accept) returnelse report syntax error; recoverendif

endwhile


Parse Trace – Step 0

Stack Input

$1 abbcde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 bbcde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 4 bcde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 bcde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 bcde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 6 cde$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 6 7 de$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 de$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 de$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 5 e$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 e$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 8 e$

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 2 3 8 9 $

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$




Stack Input

$1 $

Saction goto

a b c d e $ A B S

1 s2 ac g1

2 s4 g3

3 s6 s5

g8

4 r3 r3 r3 r3 r3 r3

5 r4 r4 r4 r4 r4 r4

6 s7

7 r2 r2 r2 r2 r2 r2

8 s9

9 r1 r1 r1 r1 r1 r1

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$



Idea is that each state encodes

The set of all possible productions that we could be looking at, given the current state of the parse, and

Where we are in the right hand side of each of those productions

How to build ACTION & GOTO


An item is a production with a dot in the right hand side

Example: Items for production A XY A XY A XY A XY

Idea: The dot represents a position in the production

Dotted Items

Spring 2014 D-68

Items for:

S aABeS aABeA AbcA b

A b

accept$

a

b

S aABeA AbcB d

A

B d

d

bA Abc A Abc

c

B

S aABeeS aABe

12

4

3

5

6 7

8

9

1 2 3 6 7

4 5

8 9

start a

A b B d

b d

A b cA Abc

B

eS aABeaccept

$

Jim Hogg - UW - CSE - P501



Grammars can encounter two problems when constructing an LR parser

Shift-reduce conflict Reduce-reduce conflict

SR & RR Conflicts


Shift-Reduce Conflicts

Situation: both a shift and a reduce are possible at a given point in the parse

Equivalently: entry in ACTION table holds both ri and sj

Classic example: if-else statement S ifthen S | ifthen S else S we elide the (exp) – common to both


Parser States for

State has SR conflict

Can shift else into state 4 Can reduce S ifthen S

1. S ifthen S2. S ifthen S else S

S ifthen SS ifthen S else S

ifthen

1

S ifthen SS ifthen S else S

S

2

S ifthen S S ifthen S else S

else

3

S ifthen S else S 4

3


De-Conflicting Shift-Reduce Conflicts

Fix the grammar Done in Java reference grammar

Use a parse tool with a “longest match” rule – i.e., if there is a conflict, choose to shift instead of reduce

Does exactly what we want for if-else case Guideline: a few shift-reduce conflicts are fine, but be

sure they do what you want


Reduce-Reduce Conflicts

Situation: two different reductions are possible in a given state

Contrived exampleS AS BA xB x


Parser States for

State has a RR conflict (r3, r4)

S AS BA xB x

x

1

A x B x 2

1. S A2. S B 3. A x4. B x

2


De-conflicting Reduce-Reduce Conflicts

Normally indicates serious problem with the grammar

Fixes

Use a different kind of parser generator that takes lookahead information into account when constructing the states (LR(1) instead of SLR(1) for example)

Most practical tools use this information

Fix the grammar


Another Reduce-Reduce Conflict

Suppose the grammar separates arithmetic and boolean expressions

exp aexp | bexpaexp aexp * aident | aident bexp bexp && bident | bident aident id

bident id

This will create an RR conflict


Covering Grammars

A solution is to merge aident and bident into a single non-terminal (or use id in place of aident and bident everywhere they appear)

This expanded grammar accepts a larger language than we want

This is a covering grammar Includes some programs that are not generated by

the original grammar Use the type checker or other static semantic

analysis to weed out illegal programs later


Constructing LR tables We’ll present a simple version - SLR(0).

Then talk about extending it to LR(1)

LL parsers and recursive descent

Cooper&Torczon chapter 4

Next

LR Parsing Build Bottom-Up Parse Tree Handles Writing a Shift-Reduce Parser

Documents

Transcript of LR Parsing Build Bottom-Up Parse Tree Handles Writing a Shift-Reduce Parser