Programming Language Syntax and Language Implementationblk/cs3490/ch02/ch02slides.pdf · –A small...

Chapter 2 Programming Language Syntax and Language Implementation

Programming Languages and Paradigms

J. Fenwick, B. Kurtz, C. Norris

(to be published in 2012)

Introduction • In this chapter you will learn about

– A small programming language named Wren

– How to generate Wren Intermediate Code by hand

– Specification of language syntax using BNF

– Lexical analysis and parsing techniques

– Compiling and code generation

– Interpretation of intermediate code

• We start off with a case study that introduces the programming language Wren and Wren Intermediate Code, a stack-based assembly-like language

Finding the GCD • We will use three simple arithmetic programs

– Finding the greatest common division

– Find the product and quotient/remainder for integers

• Euclid’s Algorithm

while m != n

if m < n then

n := n - m

else

m := m – n

return m

m n

84 35

49 35

14 35

14 21

14 7

7 7

GCD = 7

A gcd program in Wren program gcd is

var m,n : integer;

begin

read m; read n;

while m <> n do

if m < n then

n := n - m

else

m := m - n

end if

end while;

write m

end

Declarations appear at the top of the program

Read is used for console input

Describe some of the syntactic differences with C-based languages

Write is used for console output

Alternative Algorithms for GCD • Consider the given algorithm

– Why would this algorithm be slow for certain input values? (give a specific example)

– How could the algorithm be made more efficient? Be specific

• Writing a gcd algorithm in Java

– By using recursion it is possible to write a two line method in Java to find the gcd

– Try to write that method now

A Second Wren Program program product is

var a,b,p : integer;

begin

read a; read b; p := 0;

while b > 0 do

if (b - (b/2) * 2) > 0 then

p := p + a

end if;

a := a * 2;

b := b / 2

end while;

write p

end

Express this test condition logically

a b p

5 13 5

10 6 5

20 3 25

40 1 65

80 0 65

A Third Wren Program program quotient is

var x,y,r,q,w : integer;

begin

read x;

read y;

r:=x;

q:=0;

w:=y;

while w <= r do

w := 2 * w

end while;

while w > y do

q := q * 2;

w := w / 2;

if w <= r then

r := r - w;

q := q + 1

end if

end while;

write q;

write r

end

w q r

8 0 42

16 0 42

32 0 42

64 0 42

32 1 10

16 2 10

8 5 2

X = 42 Y = 8

Wren Intermediate Code get A get B push A push A mul push B push B mul sub pop Result put Result halt The above program produces the following interaction: enter a > 5 enter b > 3 Result = 16 program halted

3

5 3 3 9

5 5 25 25 25 25 16 empty

If Command • The label structure for the two alternative if

command is: <code for Boolean expression and test>

jf L1 % jump to else when false

<code for the true alternative of if command>

j L2 % jump unconditionally to L2

L1 label % the else code is next

<code for the false alternative of if command>

L2 label % end if is here

Wren Intermediate Code get A get B push A push B sub tstlt jf L1 push B pop MAX j L2 L1 label push A pop MAX L2 label put MAX halt The above program produces the following interaction: enter a > 5 enter b > 3 Max = 5 program halted

A 5

B 3

MAX 5

3

empty 5 5 2 0 empty 5 empty

Test instructions compare the top of stack with zero

While Command L1 label % top of the while loop

<code for Boolean expression and test>

jf L2 % jump out of loop if false

<code for the body of the while command>

j L1 % jump unconditionally to top of loop

L2 label % end while is here

Wren Intermediate Code get num push 0 pop count L1 label push num push 0 sub tstgt jf L2 push num push 2 div pop num push count push 1 add pop count j L1 L2 label put count halt The above program produces the following interaction: enter num > 20 count = 4 program halted

0 2

0 empty 20 20 20 1 empty 20 20 10

1 0

empty 0 0 1 empty 10 10 10 1 empty

As a homework exercise complete the stack trace for this program

Code generation for the gcd program - 1

begin

read m; get m

read n; get n

<<code for while command>>

while m <> n do

if m < n then

n := n – m

else

m := m – n

end if

end while;

write m put m

end halt

begin

read m; get m

read n; get n

L1 label

while m <> n do push m

push n

sub

tstne

jf L2

if m < n then <<code for if command>>

n := n – m

else

m := m – n

end if

end while; j L1

L2 label

write m put m

end halt

begin

read m; get m

read n; get n

L1 label


push n

sub

tstne

jf L2

if m < n then push m

push n

sub

tstlt

jf L3

n := n – m <<code for assignment>>

j L4

else L3 label

m := m – n <<code for assignment>>

end if L4 label

end while; j L1

L2 label

write m put m

end halt

Notice that label numbers must be unique throughout the program

begin

read m; get m

read n; get n

L1 label


push n

sub

tstne

jf L2

if m < n then push m

push n

sub

tstlt

jf L3

n := n – m push n

push m

sub

pop n

j L4

else L3 label

m := m – n push m

push n

sub

pop m

end if L4 label

end while; j L1

L2 label

write m put m

end halt

Homework exercise: Hand compile the product program into WIC

U.S. telephone numbers G2.2.A = (T, N, P, S) where

T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (, -, ) },

N = {<PhoneNumber>, <CountryCode>, <AreaCode>, <Prefix>, <Extension>, <Digit>},

P = {

<PhoneNumber> ::= <CountryCode> <AreaCode> <Prefix> - <Extension>

<PhoneNumber> ::= <AreaCode> <Prefix> - <Extension>

<CountryCode> ::= <Digit> <Digit>

<AreaCode> ::= ( <Digit> <Digit> <Digit> )

<Prefix> ::= <Digit> <Digit> <Digit>

<Extension> ::= <Digit> <Digit> <Digit> <Digit>

<Digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

}, and

S = <PhoneNumber>.

Leftmost Derivation Step 1: <PhoneNumber>

Step 2: <AreaCode> <Prefix> - <Extension>

Step 3: ( <Digit> <Digit> <Digit> ) <Prefix> - <Extension>

Step 4: ( 8 <Digit> <Digit> ) <Prefix> - <Extension>

Step 5: ( 8 0 <Digit> ) <Prefix> - <Extension>

Step 6: ( 8 0 0 ) <Prefix> - <Extension>

Step 7: ( 8 0 0 ) <Digit> <Digit> <Digit> - <Extension>

Step 8: ( 8 0 0 ) 5 <Digit> <Digit> - <Extension>

Step 9: ( 8 0 0 ) 5 5 <Digit> - <Extension>

Step 10: ( 8 0 0 ) 5 5 5 - <Extension>

Step 11: ( 8 0 0 ) 5 5 5 - <Digit> <Digit> <Digit> <Digit>

Step 12: ( 8 0 0 ) 5 5 5 - 1 <Digit> <Digit> <Digit>

Step 13: ( 8 0 0 ) 5 5 5 - 1 2 <Digit> <Digit>

Step 14: ( 8 0 0 ) 5 5 5 - 1 2 1 <Digit>

Step 15: ( 8 0 0 ) 5 5 5 - 1 2 1 2

How can you tell that this is the proper choice?

Parse Tree

• A rightmost derivation Step 1: <PhoneNumber>

Step 2: <AreaCode> <Prefix> - <Extension>

Step 3: <AreaCode> <Prefix> - <Digit> <Digit> <Digit> <Digit>

And so forth

• would produce the same parse tree

A Grammar for Expressions – version 1 G2.2.x = (T, N, P, S) where

T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, / },

N = {<Expression>, <Operator>, <Number>, <Digit>},

P = {

<Expression> ::= <Number>

<Expression> ::= <Expression> <Operator> <Expression>

<Operator> ::= + | - | * | /

<Number> ::= <Digit> | <Digit> <Number>

<Digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

}, and

S = <Expression>.

Do you spot a potential problem with this grammar?

Grammar Ambiguity

Review of Operator Precedence • Multiplicative operations have higher precedence

than additive operations

• Where in the parse tree would multaplicative operations appear relative to additive operations?

• How could the grammar be modified to make this happen?

A Grammar for Expressions – version 2 G2.2.F = (T, N, P, S) where

T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, / },

N = {<Expression>, <Term>, <Number>, <Digit>},

P = {

<Expression> ::= <Term>

<Expression> ::= <Term> + <Expression>

<Expression> ::= <Term> - <Expression>

<Term> ::= <Number>

<Term> ::= <Number> * <Term>

<Term> ::= <Number> / <Term>


<Digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

}, and

S = <Expression>.

A Remaining Problem • How are operators of the same precedence

processed?

– Consider 34 – 12 + 16

– What would be the result if minus is done first?

– What would be the result if plus is done first?

– Which choice is correct?

• What associativity does version 2 of the grammar have?

• How can this be corrected?

A Grammar for Expressions – version 3 G2.2.G = (T, N, P, S) where

T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, / },

N = {<Expression>, <Term>, <Number>, <Digit>},

P = {

<Expression> ::= <Term>

<Expression> ::= <Expression> + <Term>

<Expression> ::= <Expression> - <Term>

<Term> ::= <Number>

<Term> ::= <Term> * <Number>

<Term> ::= <Term> / <Number>


<Digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

}, and

S = <Expression>.

Homework Exercise Draw the complete parse tree for the expression

2 – 3 + 4 * 5 Use <E>, <T>, <N> and <D> for the internal nodes

A Homework Exercise • Suppose exponentiation (written with a ^) is added

to a programming language

– What is the precedence of exponentiation?

– Consider the expression 2 * 3 ^ 2

– How would the BNF change to give the correct result?

– What is the associativity for exponentiation?

– Consider the expression 4 ^ 3 ^ 2

– How would the BNF change to give the correct result?

• Your answer to this homework exercise will be version 4 of the grammar for expressions that handles exponentiation correctly

Syntax Diagrams

An alternative to BNF. Sometimes used in CS1/CS2 level classes to express the syntax of a programming language.

Extended BNF (EBNF) • EBNF uses special meta-symbols

– “*” to indicate 0 or more occurrences of a symbol

– “+” indicates 1 or more occurrences

– This replaces recursion with repetition

• Some Examples

– BNF notation: <Number> ::= <Digit> | <Digit> <Number>

– EBNF notation: <Number> ::= <Digit>+

– BNF notation: <AlphanumSeq> ::= empty | <Alphanum> <AlphanumSeq>

<Identifier> ::= <Alpha> <AlphanumSeq>

– EBNF notation: <Identifier> ::= <Alpha> <Alphanum>*

BNF for Wren - 1

BNF for Wren - 2

Introduction to Lexical Analysis • A lexical analyzer, also known as a scanner, process the

source code one character at a time

– The goal is to produce a sequence of tokens that can be passed to a parser in order to build an abstract syntax tree

– A single character, such as ‘;’, may produce a token named semicolon

– An identifier will produce an identifier token, such as ide(count)

– A number will produce a number token, such as num(123)

– A FSA (finite state automata) is a convenient way to represent the scanning process

– Here is a FSA for identifier; starts with a alpha followed by any sequence of alphanumerics

– What would the FSA for a number look like?

Handling of Reserved Words • Every programming language contains reserved words

– Normally a declared identifier name cannot be a reserved word

– Consider the case of a reserved word int, it must be distinguished from variable names that are a proper prefix, such as the variable i, and variable names for which int is a proper prefix, such as integral

– The following FSA handles this situation

• The scanner as a large FSA

– The production of tokens must be unambiguous

– Some character sequences, such as 123a, should produce errors since this is not an identifier nor a number

The Output of a Scanner for Wren Enter name of source file: gcd.wren

program gcd is

var m,n : integer;

begin

read m; read n;

while m <> n do

if m < n then n := n - m

else m := m - n

end if

end while;

write m

end

Scan successful

[program,ide(gcd),is,var,ide(m),comma,ide(n),colon,integer,

semicolon,begin,read,ide(m),semicolon,read,ide(n),semicolon,

while,ide(m),neq,ide(n),do,if,ide(m),less,ide(n),then,ide(n),

assign,ide(n),minus,ide(m),else,ide(m),assign,ide(m),minus,

ide(n),end,if,end,while,semicolon,write,ide(m),end,eop]

Recall our phone number example G2.2.A = (T, N, P, S) where

T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (, -, ) },

N = {<PN>, <CC>, <AC>, , <E>, <D>},

P = {

1: <PN> ::= <CC> <AC> - <E>

2: <PN> ::= <AC> - <E>

3: <CC> ::= <D> <D>

4: <AC> ::= ( <D> <D> <D> )

5: ::= <D> <D> <D>

6: <E> ::= <D> <D> <D> <D>

7-16: <D> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

}, and

S = <PN>.

Here is the Parsing Stack Parsing Stack

(top of stack on left)

Input stream

Commentary

<PN> (800)555-1212 Initial state

<AC>-<E> (800)555-1212 Replaced <PN>: Two possible productions but next input character is ( so choose production #2.

(<D><D><D>)-<E> (800)555-1212 Replaced <AC> by production #4

<D><D><D>)-<E> 800)555-1212 Top of stack is a terminal that matches next input character: pop stack, read character.

8<D><D>)-<E> 800)555-1212 Replaced <D>: Ten possibilities but input character suggests using <D>::=8.

<D><D>)-<E> 00)555-1212 Matched top of stack with input: pop and read.

0<D>)-<E> 00)555-1212 Replaced <D>

<D>)-<E> 0)555-1212 Stack matches input: pop,read.

0)-<E> 0)555-1212 Replaced <D>

)-<E> )555-1212 Stack matches input: pop,read.

-<E> 555-1212 Stack matches input: pop,read.

<D><D><D>-<E> 555-1212 Replaced 

5<D><D>-<E> 555-1212 Replaced <D>

<D><D>-<E> 55-1212 Stack matches input: pop,read.

5<D>-<E> 55-1212 Replaced <D>

<D>-<E> 5-1212 Stack matches input: pop,read.

5-<E> 5-1212 Replaced <D>

-<E> -1212 Stack matches input: pop,read.

<E> 1212 Stack matches input: pop,read.

<D><D><D><D> 1212 Replaced <E>

1<D><D><D> 1212 Replaced <D>

<D><D><D> 212 Stack matches input: pop,read.

2<D><D> 212 Replaced <D>

<D><D> 12 Stack matches input: pop,read.

1<D> 12 Replaced <D>

<D> 2 Stack matches input: pop,read.

2 2 Replaced <D>

Empty-stack End-of-file Stack matches input: pop,read.

Parse successful! Stack is empty and input consumed.

Unsuccessful Parses • The parse on the previous slide was successful

• Not all parses will be successful; an unsuccessful parse is realized if

– A non-terminal on the stack can not be replaced

– A terminal on the stack is not matched by the next terminal in the input

– The stack becomes empty but the input has not be completely used

More on Parsing Techniques • The textbook covers the following topics

– Use of FIRST sets for each nonterminal

– Left factoring

– Elimination of left recursion

– Recursive descent parsing

– Bottom up parsing

• These topics are important if you study compilers but we will not cover them in lecture at this time

• We will cover elimination of left recursion and recursive descent parsing when we build a scanner and parser in Prolog

Abstract Syntax Tree

A Simplified Tree • The Wren statement is if a <> 0 then x := x / a end if

• The simplified tree, shown below, contains all the necessary information for code generation

Abstract Production Rules for Wren • Notice the use of EBNF notation such as * and +

Output of a Wren Parser Parse successful

prog([dec(integer,[m,n])],[read(m),read(n),while(bexp(neq,

ide(m),ide(n)),[if(bexp(less,ide(m),ide(n)),

[assign(n,exp(minus,ide(n),ide(m)))],

[assign(m,exp(minus,ide(m),ide(n)))])]),write(ide(m))])

prog

decSeq cmdSeq

dec read read while write

integer varList m n <finish> m

m n

Complete the parse tree for the while loop as a homework problem

Three Address Intermediate Code • Given the Wren command

if a <> 0 then x := x / a end if

• This can be represented by the following three-address code:

(<>, t1, a, 0), // perform the test

(jump-false, t1, Label1, nil), // if test failed(t1 is false) // jump over then block

(/, x, x, a) // here is the then block

(label, Label1, nil, nil) // end of the if-then

Stack-based Intermediate Code • Used to Build Virtual Machines such as the JVM for

bytecode or the Common Intermediate Language in the .NET environment

• Advantages

– Compact object code

– Simple compilers and interpreters

– Minimal processor state

• Disadvantages

– More memory references

– More instructions, slower interpretation

– Does not take direct advantage of registers

Hand compiled code vs. machine compiled get m; get n; L1 label; push m; push n; sub; tstne; jf L2;

push m; push n; sub; tstlt; jf L3; push n; push m; sub; pop n;

j L4; L3 label; push m; push n; sub; pop m; L4 label; j L1;

L2 label; put m; halt;

[[get,m],[get,n],[L1,label],[push,m],[pop,T1],[push,n],[pop,T2],

[push,T1],[push,T2],sub,tstne,[jf,L2],[push,m],[pop,T1],

[push,n],[pop,T2],[push,T1],[push,T2],sub,tstlt,[jf,L3],

[push,n],[pop,T1],[push,m],[pop,T2],[push,T1],[push,T2],sub,

[pop,n],[j,L4],[L3,label],[push,m],[pop,T1],[push,n],[pop,T2],

[push,T1],[push,T2],sub,[pop,m],[L4,label],[j,L1],[L2,label],

[push,m],[pop,T1],[put,T1],halt]

• Why is the machine compiled code so much longer?

Code Generation for if Command

Homework Exercise • A single alternative if command contains a true task

but no false task

• The BNF is if <boolean expression> then <command sequence> end if

• Draw the flow diagram and then propose the code generation scheme for the single alternative if

Code generation for a while command

Context Sensitivity • BNF is NOT context sensitive

– Each production rule has a single nonterminal on the left-hand side

– This means it cannot be context sensitive

• Name several instances where context sensitivity is essential for a programming language?

• Context sensitive grammars are possible but they are overly complex

• Attribute grammars can solve these problems and retain the simplicity of a BNF grammar

What are attribute grammars? • Can transport information anywhere within a

abstract syntax tree

– Inherited attributes pass information to nodes at lower levels in the tree

– Synthesized attributes pass information from child nodes back up to parent nodes

• Use of attribute grammars in Wren

– Insure label numbers are unique for the entire program, uses inherited and synthesized attributes

– Insure temporary variable numbers are unique within a single expression, uses inherited attributes

Using Attributes for Label Numbers

Executing a WIC program (0)get m (1)get n (2)L1 label (3)push m (4)pop T1 (5)push n

(6)pop T2 (7)push T1 (8)push T2 (9)sub (10)tstne (11)jf L2

(12)push m (13)pop T1 (14)push n (15)pop T2 (16)push T1

(17)push T2 (18)sub (19)tstlt (20)jf L3 (21)push n (22)pop T1

(23)push m (24)pop T2 (25)push T1 (26)push T2 (27)sub (28)pop n

(29)j L4 (30)L3 label (31)push m (32)pop T1 (33)push n

(34)pop T2 (35)push T1 (36)push T2 (37)sub (38)pop m

(39)L4 label (40)j L1 (41)L2 label (42)push m (43)pop T1

(44)put T1 (45)halt

• Instructions are numbered for reference purposes

• There are two stages for executing this program

– Read in the WIC code and perform all necessary preprocessing steps

– Execute the program until a halt instruction is encountered

Preprocessing

m 0

n 0

T1 0

T2 0

L1 2

L2 41

L3 30

L4 39

• Reading WIC from a file

• Put the instructions into an indexed list of instructions

• Put the label instruction in <operator, operand> format

• Building a Symbol Table

• Create a table for all variables and temps

• Initialize all values to zero

• Building a Jump Table

• Every label has a unique name

• The jump table stores the location of the label

• Creating the Runtime Stack and the Program Counter

Running the Interpreter The basic operation of the interpreter is:

(1) fetch the instruction as specified by the PC

(2) increment the PC in anticipation of sequential instruction execution

(3) execute the current instruction; some instructions will change values in the ST or on the runtime stack, the jump instructions may change the PC value

(4) if the last instruction is not halt, go to step (1)

Groups of Instructions - 1 • Input, Output and Halt

– get prompts the user to enter a value and stores it in the symbol table (ST)

– put prints out the value of a variable from the ST

– halt stops the interpreter

• Push and Pop

– push places either a literal value (e.g., push 0) or a variable value from the ST on top of the runtime stack

– pop removes a value from the top of the runtime stack and stores in the specified variable in the ST

Groups of Instructions - 2 • Arithmetic Instruction operation

– Pop the right hand operand off the stack

– Pop the left hand operand off the stack

– Perform the operation (check for 0 divisors)

– Push the result back onto the stack

• Logical Instruction operation

– and & or instructions perform the same four steps as arithmetic instructions

– The not instruction negates the value on top of the stack (0 1 or 1 0)

Groups of Instructions - 3 • Test Instruction operation

– Pop the value off the top of the stack

– Perform the indicated comparison of that value with 0 (choices: eq, ne, lt, le, gt, ge)

– Either push a 0 or a 1 onto the top of the stack based on whether the comparison is false or true

• Jump Instructions

– Unconditional jump: look up the label location in the jump table and place in the program counter

– Conditional jump: remove the value from the stack

• If it is false (= 0) look up the label location in the jump table and place in the program counter

• If it is true (= 1) do nothing

Preprocess and Interpret this WIC get A

get B

push A

push B

sub

tstlt

jf L1

push B

pop MAX

j L2

L1 label

push A

pop MAX

L2 label

put MAX

halt

1. Show the indexed instruction list after preprocessing 2. Show the initial Symbol Table 3. Show the Jump Table 4. Execute the instructions one-by-one a. give the instruction being executed b. show the symbol table, stack, and program counter after the instruction is executed

Summary of Homework Problems • Complete the stack trace of the program on slide 12

• Hand compile the product program into WIC

• Draw the complete parse tree for the expression 2 – 3 + 4 * 5 Use <E>, <T>, <N> and <D> for the internal nodes

• Develop version 4 with exponentiation (^) of the grammar for expressions

• Complete the parse tree for the while loop in the gcd program

• Specify code generation for a single alternative if

• Complete the preprocessing and interpretation of the program shown on the previous slide

Programming Language Syntax and Language Implementationblk/cs3490/ch02/ch02slides.pdf · –A small...

Documents

Transcript of Programming Language Syntax and Language Implementationblk/cs3490/ch02/ch02slides.pdf · –A small...