CPS 506 Comparative Programming Languages
description
Transcript of CPS 506 Comparative Programming Languages
Compiling Process Steps
2
• Program Lexical Analysis–Convert characters into a stream
of tokens
• Lexical Analysis Syntactic Analysis–Send tokens to develop an
abstract representation or parse tree
Compiling Process Steps (con’t)
3
• Syntactic Analysis Semantic Analysis– Send parse tree to analyze for semantic
consistency and convert for efficient run in the architecture (Optimization)
• Semantic Analysis Machine Code– Convert abstract representation to
executable machine code using code generation
Formal Methods and Language Processing
• Meta-Language– A language to define other languages
• BNF (Backus-Naur Form)– A set of rewriting rules ρ
– A set of terminal symbols ∑
– A set of non-terminal symbols Ν
– A start symbol S є Ν
– ρ : Α ω
– Α є Ν and ω є (Ν U Σ)
– Right-hand side: a sequence of terminal and non-terminal symbols
– Left-hand side: a non-terminal symbol
4
BNF (con’t)
• The words in Ν : grammatical categories– Identifier, Expression, Loop, Program, …
– S : principal grammatical category
– Symbols in Σ : the basic alphabet
– Example 1:
binaryDigit 0
binaryDigit 1
• or
binaryDigit 0 | 1– Example 2:
Integer Digit | Integer Digit
Digit 0|1|2|3|4|5|6|7|8|9
5
BNF (con’t)
• Parse Tree
• DerivationInteger Integer Digit Integer Digit Digit Digit Digit Digit 2 Digit Digit 28 Digit 281
Integer
Integer
Integer Digit
Digit
Digit
1
8
2
6
BNF (con’t)
• Lexeme: The lowest-level syntactic units
• Tokens : A set of all grammatical categories that define strings of non-blank characters (Lexical Syntax)– Identifier (variable names, function names,…)
– Literal (integer and decimal numbers,…)
– Operator (+,-,*,/,…)
– Separator (;,.,(,),{,},…)
– Keyword (int, if, for, where,…)
7
BNF (con’t)
// comments …void main ( ) {
float p;p = 3.14 ;
}
Comment
Keyword
Identifier
Operator
Separator
Literal
8
Regular Expressions
10
• An alternative for BNF to define a language lexical rules– x : A character
– “abc” : A literal string
– A | B : A or B
– A B : Concatenation of A and B
– A* : Zero or more occurrence of A
– A+ : One or more occurrence of A
– A? : Zero or one occurrence of A
– [a-z A-Z] : Any alphabetic character
– [0-9] : Any digit
– . : Any single character
• Example
Integer : [0-9]+
Identifier : [a-z A-Z][a-z A-Z 0-9]*
Syntactic Analysis
11
• Primary tool: BNF• Input: Tokens from lexical analysis• Output: Parse• Syntactic categories– Program
• Declaration• Assignment• Expression• Loop• Function definition
Syntactic Analysis (con’t)
12
• ExampleArithmetic Expression Term | Arithmetic Expression +
Term | Arithmetic Expression – Term
Term Factor | Term * Factor | Term / Factor
Factor Identifier | Literal | ( Arithmetic Expression )
Syntactic Analysis (con’t)
13
• Example2 * a - 3
Arithmetic Expression
Term
Term
Factor
Factor
3
Identifier
Literal
Arithmetic Expression
Term
Factor
Literal
Integer
-
*
2
Integer
Letter
a
Syntactic Analysis (con’t)
14
• BNF limitations–Declaration of identifiers?– Initial value of identifiers?
• In statically typed languages–Using Type System for the first
problem–Detect in compile time or run
time
Ambiguous Grammar
15
• A string is parsed into two or more various trees• Example
Exp Identifier | Literal | Exp – ExpInput: A – B – COutput: 1- A – (B – C)
2- (A – B) – C• Another example is “dangling else”– Using BNF rules– Using extra-grammatical rules
Operator Precedence
16
<expr> <id> + <expr> | <id> * <expr>
| ( <expr> ) | <id>
A = B + C * A A = B + (C * A)
A = B * C + A A = B * (C + A)
Solution
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
A = B + C * A A = B + (C * A)
A = B * C + A A = (B * C) + A
Associativity of Operators
17
A + B + C A * B * C A / B / C …
• Left Associativity– Left Recursive: In a grammar rule, LHS also appears at the
beginning of its RHS
<expr> <expr> + <term> | <term>
A + B + C (A + B) + C
• Right Associativity– Right Recursive: In a grammar rule, LHS also appears at the
end of its RHS
<factor> <exp> ** <factor> | <exp>
<exp> ( <expr> ) | <id>
A + B ** C A + (B ** C)
Extended BNF (EBNF)
18
• Optional part of an RHS
<if_stmt> if ( <expr> ) <statement> [ else <statement> ]
• Repetition, or recursion, part of an RHS
<id_list> <id> { , <id_list> }
• Multiple choice option of an RHS
<term> <term> ( * | / | % ) <factor>
• Optional use of * and +
<id_list> <id> { , <id_list> }*
<integer> {0 | … | 9}+
Extended BNF (EBNF) (con’t)
19
• opt subscript
Conditional Statement if ( Expr ) Statement { else Statement }opt
• Syntax Diagram
FactorTerm
* | /
Case Study
20
• A BNF or EBNF for one grammar, such as Expression, different Literals, or if Statement in Java, C, C++, or Pascal• BNF or EBNF for floating point
numbers in Java, C, C++• BNF or EBNF for loop statements
in one language
Abstract Syntax
21
• Consider the following codes:
Although syntax are different, they are essentially equivalent
• Abstract Syntax is a solution to show the essential elements of a language
• PascalWhile i < 10 dobegin
i := i+ 1;end;
• C or Javawhile (i < 10) {
i = i + 1;}
Abstract Syntax (con’t)
22
• General FormAbstract Syntax Class = list of essential components
• ExampleLoop = Expression test; Statement body
• A Java class for abstract syntax of loop class Loop extends Statement {
Expression test;Statement body;
}
Member
Element
Abstract Syntax (con’t)
23
• More examplesAssignment = Variable target; Expression source
• A Java class for abstract syntax of Assignment class Assignment extends Statement {
Variable target;Expression source;
}
Member
Element
Abstract Syntax Tree
24
• A tree to show the abstract syntax treeExamplex = 2; x := 2;
Assignment = Variable target; Expression source
Statement
Assignment
x
Variable Expression
2
Value
Recursive Descent Parser
25
• A top-down parser to verify the syntax of a stream of text from left to right
• It contains several recursive methods, each of which implements a rule of the grammar
• More details and parsing algorithms in Compiler course
Exercises
26
1.Modify the following grammar to add a unary minus operator that has higher precedence than either + or *.
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
Exercises
27
2.Consider the following grammar:
<S> <A> a <B> b
<A> <A> b | b
<B> a <B> | a
Which of the following sentences are in the language generated by this grammar?
1. baab
2. bbbab
3. bbaaaaa
4. bbaab
Exercises
28
3. Convert the following EBNF to BNF:
S A { bA }
A a [b]A
4. Using grammar in question 1, add the ++ and – unary operators of Java.
5. Using grammar in question 1, show a parse tree and a leftmost derivation for each of the following statements:
a) A = (A+B) * C
b) A = B * (C * (A + B))