COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

31
COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

description

COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION. The LR(0) algorithm for creating compilers is one in which contexts are not evaluated, and states are considered identical if they consist of the same set of marked productions. - PowerPoint PPT Presentation

Transcript of COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Page 1: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

COMBINING COMPATIBLE STATES DURING LR(1)

PARSER CONSTRUCTION

Page 2: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The LR(0) algorithm for creating compilers is one in which contexts are not evaluated, and states are considered identical if they consist of the same set of marked productions

Page 3: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

But this algorithm is insufficient for actual programming languages, producing parsers with numerous conflicts

Page 4: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The LR(1) algorithm when applied to creating compilers for real computer languages, such as those for Java or C++, results in a parsing machine that is a order or more larger than those produced by an LR(0) algorithm for the same grammar.

Page 5: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

On the other hand the LR(1) algorithm, which you made use of in your last assignment, produces parsers, for the large grammars employed for actual computer languages, which are a few orders larger than those produced by the LR(0) algorithm.

Page 6: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

As a compromise, various methods, including the one employed by Yacc, have been devised for subsets of the LR(1) languages, using a hybrid approach.

Page 7: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

This works well for most programming languages,

but imposes a greater responsibility on the compiler writer, to come up with a grammar that does not lead to conflicts (i.e. to cases where more than one action is defined at a parsing machine state for the same next input symbol).

These methods only work for a subset of the LR(1) grammars, and there are applications, including ones involving natural language processing, for which they are inadequate.

Page 8: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

However one can employ a definition of compatibility between states, which works for all LR(1) languages, and which produces parsers of the same size as those referred to previously

Page 9: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

DEFINITION. The nucleus of state consists of the configurations in the state in which the marker is in a position greater that zero.

Example

A configuration in a state of the form

A → bc.d, {x,y}

would be a member of its nucleus, but a configuration such as

A → .bcd, {x,y}

would not be a member.

Page 10: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

DEFINITION OF COMPATIBILITY BETWEEN LR(1) STATES

Let S and S be two states in a LR(1) parsing machine whose nuclei consist of the same marked productions, which we will denote as P1,…,Pn .

For 1≤ t ≤ n, let Ut denote the set of contexts associated with

marked production Pt in state S, and let Ut denote the set of contexts associated with that marked production in state S.

Then states S and S are compatible if, for all 1 ≤ i < j ≤ n,

at least one of the following condition holds:

(a) Ui Uj = and Ui Uj = ( is the empty set, i.e. the intersections involved are both

empty)

(b) Ui Uj ≠ (c) Ui Uj ≠

Page 11: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Note

If states S and S are as described above, and their nuclei consist of only a single configuration, then according to the above definition they are compatible

Page 12: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

In the case where S and S as described above are compatible, one can combine the states into a single state whose nucleus consists of the same marked productions listed above, while for 1≤ t ≤ n, the set of contexts associated with marked production Pt is Ut Ut .

Page 13: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

One way of looking at the definition is to say that every pair of configurations in the nuclei must pass a test, and that two states are

compatible only if they all in fact pass.

Page 14: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Fortunately, in grammars for actual programming languages such as Java, C++, etc., there are at most 6 configurations in the nucleus of any state.

The states may be large, with many immediate successors, but the nuclei are all quite small.

Page 15: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

EXAMPLES

We show only the nucleus of the states in these examples, since, according to the definition, states are compatible if and only if their nuclei are.

Page 16: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

S S’

The above two states are not compatible because the pair consisting of the first and last

configurations fail the test. For this pair condition (a) of the defn. is not true,

since the context of the first configuration of S contains an x, and so does the context of the third production of S’

In addition neither of conditions (b) or (c) are true.

A → ab.c {x,y}

B → b.n {s,t}

C → rb.ed {u,v}

A → ab.c {d}

B → b.n {s}

C → rb.ed {x,v}

Page 17: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

S S’

The first and third configurations in this case pass the test

because condition (b) of the defn. applies to the first and third configurations of S. Both of these configurations contain x in their set of contexts. The states in this case are compatible.

Remember, that while every pair of configurations in the nucleus must pass the test, it only requires that one of conditions (a), (b) or (c) be true for a given pair for it to pass.

A → ab.c {x,y}

B → b.n {s,t}

C → rb.ed {x,v}

A → ab.c {x,y,d}

B → b.n {s}

C → rb.ed {x,v}

Page 18: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Since the states are compatible, they can be combined to form one whose nucleus is:

A → ab.c {x,y,d}

B → b.n {s,t}

C → rb.ed {x,u,v}

Page 19: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Note.

In the figure on the next slide, where we omit the context set of various configurations (i.e. only show the marked production involved), the inference involved is that they are irrelevant to the assertions being made about the figure.

Page 20: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

States 2 and 8 are not compatible since the first configuration of state 2 has d as context in common with the second configuration of state 8. In fact if we were to combine states 2 and 8, it would produce a combination of states 3 and 9 as its u-successor. This state would have a conflict, in that in had reduce actions, for when the next input symbol was d, for bothZ → tu and V → є

Page 21: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Now consider the altered machine obtained if the production X → aYd where replaced by (say) X → aYa. In this case the first configuration of state 2 would be Y → t.W {a}. It would then follow that states 2 and 8 were compatible and could safely be combined to form:

Y → t.W {a, e}.

Z → t.u {c, d}

W → .uV

Page 22: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The Journal paper describing this method of combining states contains a formal proof of its correctness. But seeing our’s is a practically oriented course, we will just consider an informal justification based on a few examples to supply a flavor of the reasoning involved

Page 23: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The main argument is that if the parsing machine containing the states S and S, as described in the defn. of compatibility, has no conflicts, and S and S are compatible, then the parsing machine obtained by combining them will also have no conflicts.

Page 24: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The argument is by contradiction. Let’s consider examples of the various ways that two configurations in the combination of S and S could have conflicts or lead to conflicts between other pairs of configurations in states reachable from S. In each case we hope to show that either the parsing machine as it was before S and S were combined contained conflicts in the first place or that S and S could not in fact have been compatible.

Page 25: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Case 1. Let configs 1 and 2 of the combined state formed from states S and S’ be: A → r B.uv {a,b} C → t B.uv {a,c}

Seeing that the machine as it was before thecombination contained no conflicts, and specifically did not contain a conflict in the uv successor of these states, either (1) state S must have contained the a in its version of

config1, while state S contained the a in its version of config 2, or

(2) vice-versa.

Page 26: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Case 1 contd.

A → r B.uv {a,b}

C → t B.uv {a,c}

In either case neither condition (a) nor (b) of the defn.

would then be true for the two configs, and since

condition (c) is also not true, states S and S’ could not

have been compatible in the first place.

Page 27: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Case 2. Let configs 1 and 2 of the combined state be:

A → r B.uv {a,b}

D → t B.Ca

C →.uv {a}

Either S or S must contain A → r B.uv {a.. },

in which case the original parsing machine would have had a conflict at its uv-successor. This is in contradiction to our assumption that the original parsing machine was conflict-free.

Page 28: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Case 3. Let configs 1 and 2 be:

A → s B.Ea

E →.uv {a}

D → t B.Ca

C →.uv {a}

Here again the original parsing machine would have had conflicts in the uv-successors of both S and S

Page 29: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

Case 4. Let configs 1 and 2 be:

A → r B.uv

D → t B.uvr

Here too the original parsing machine would have had conflicts in the uv-successors of both S and S. In this case the conflict would have been between a reduction and a transition.

Page 30: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

EXERCISE

Construct an LR(1) parsing machine for the

grammar on the next slide, combining compatible states as you encounter them

Page 31: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

program → main ; statement_list end main;

statement_list → statement_list statement

| statement

statement → assign_statement

| while_statement

| do_statement

assign_statement → identifier = identifier

while_statement → while ( condition )

statement_list wend

condition → identifier = identifier

do_statement → do identifier = number to

number ; statement_list end do ;