Semantic Analysis -...

CS2210: Compiler Construction

Semantic Analysis

CS2210: Compiler Construction Semantic Analysis

Compiler Phases and Errors

o Lexical analysis

â source code→ tokensâ detects inputs with illegal tokensâ does the input program use words defined in the

"dictionary" of the language?

o Syntax analysis

â tokens→ abstract syntax tree (AST)â detects inputs with incorrect structureâ is the input program grammatically correct?

o Semantic analysis

â AST→ (Modified) AST + Symbol Tableâ detects semantic errors (errors in meaning)â does the input program "make sense"?



o Lexical analysisâ source code→ tokensâ detects inputs with illegal tokensâ does the input program use words defined in the


o Syntax analysis

â tokens→ abstract syntax tree (AST)â detects inputs with incorrect structureâ is the input program grammatically correct?

o Semantic analysis






o Syntax analysisâ tokens→ abstract syntax tree (AST)â detects inputs with incorrect structureâ is the input program grammatically correct?

o Semantic analysis






o Syntax analysisâ tokens→ abstract syntax tree (AST)â detects inputs with incorrect structureâ is the input program grammatically correct?

o Semantic analysisâ AST→ (Modified) AST + Symbol Tableâ detects semantic errors (errors in meaning)â does the input program "make sense"?


Why is Semantic Analysis Required?

o Because programs use symbols (a.k.a. identifiers)â Identifiers require context to figure out the meaning

o Consider the English sentence: "He ate it."â This sentence is syntactically correctâ But it makes sense only in the context of a previous

sentence: "Sam bought a fish."

o Semantic analysisâ Associates identifiers with objects they refer to

"He"→ "Sam""it"→ "fish"

â Checks whether identifiers are used correctly"He" and "it" refer to some object: def-use check"it" is a type of object that can be eaten: type check


Why is Semantic Analysis Required?

o Context cannot be analyzed using a CFG parser

o Associating IDs to objects require expressing the pattern:

{wcw |w ∈ (a|b)∗}

The 1st w represents the definition of the ID,The c represents arbitrary intervening code,The 2nd w represents the use of that ID.This cannot be expressed using a context free grammar.

o Requires semantic analysis in a separate pass to analyze:â Def-use relationshipsâ Type consistency when using IDsâ Inheritance relationships between typesâ ...


A Simple Semantic Check

“Matching identifier definitions with uses”

o Important analysis in most languages

o If there are multiple definitions, which one to match?

void foo(){

char x;...{

int x;

...}x = x + 1;

}

?

?


A Simple Semantic Check

“Matching identifier definitions with uses”

o Important analysis in most languages

o If there are multiple definitions, which one to match?void foo(){

char x;...{

int x;

...}x = x + 1;

}

?

?


Scope

o A binding is the association of a use of an identifier to thedefinition of that identifierâ Which variable (or function) an identifier is referring to

o The scope of a definition is a portion of a program in whichthe binding is validâ Uses of identifier in the scope is bound to that definition

o Some properties of scopesâ The definition of an identifier is restricted by its scopeâ Scopes for the same identifier can never overlap - there is

always exactly one binding per identifier useâ If an identifier use is not within the scope of any definition,

the identifier is said to be undefined

o Two types: static scope and dynamic scope


Static Scope

o Static scope depends on the program text, not run-timebehavior (also known as lexical scoping)â C/C++, Java, Objective-C

o Rule: Refer to the closest enclosing definition

void foo(){

char x;

...{

int x;

...}x = x + 1;

}


Dynamic Scope

o Dynamic scoping depends on bindings formed during theexecution of the programâ Perl, Bash, LISP, Scheme

o Rule: Refer to the closest binding in the current executionvoid foo(){

(1) char x;(2) if (...) {(3) int x;(4) ...

}(5) x = x + 1;

}

o Which x ’s definition is the closest?â case (a): ...(1)...(2)...(5)â case (b): ...(1)...(2)...(3)...(4)...(5)


Static vs. Dynamic Scoping

o Most languages that started with dynamic scoping (LISP,Scheme, Perl) added static scoping afterwards

o Why?â With dynamic scoping...

There may be multiple possible bindings for a variableImpossible to determine bindings at compile timeAll bindings have to be done at execution time

â Static scoping is easier for human beings to understandBindings readily apparent from lexical structure of code

â Statis scoping is easier for compilers to understandCompiler can determine bindings at compile timeCompiler can translate identifier to a single memory locationResults in generation of efficient code


Symbol Table


Symbol Table

o A compiler data structure that tracks information about allidentifiers (symbols) in a programâ Each entry represents a definition of that identifierâ Maps use of IDs to definitions at any given point in programâ Contents updated whenever scopes are entered or exitedâ Built by either...

Traversing the parse tree from top to bottomUsing semantic actions as an integral part of parsing process

o Usually discarded after generating binary codeâ By then, all references to symbols have been mapped to

memory locations alreadyâ For debugging, symbols may be included in binary

To map memory locations back to symbol names whenusing debuggersFor GCC, using ‘gcc -g ...” includes debug symbol tables


Maintaining Symbol Table

o Basic ideaint x; ... void foo() { int x; ... x=x+1; } ... x=x+1 ...

â Before processing foo:Add definition of x , overriding old definition of x if any

â After processing foo:Remove definition of x , restoreng old definition of x if any

o Operationsenter_scope() start a new nested scopeexit_scope() exit current scope

find_symbol(x) find the information about xadd_symbol(x) add a symbol x to the symbol tablecheck_symbol(x) true if x is defined in current scope


Symbol Table Structure

o What data structure to choose ?â Listâ Binary treeâ Hash table

o Tradeoffs: time vs spaceâ Let us first consider the organization w/o scope


List

o Linked list or arrayid1 info1 id2 info2

id1 info1

id2 info2

... ...

o Array: no space wasted, insert/delete: O(n), search: O(n)

o Linked list: extra pointer space, insert/delete: O(1), search: O(n)

â Optimization: move recently used identifier to the headâ Frequently used identifiers are found more quickly


Binary Tree

v_city

v_salary v_age

Discussion:

o Use more space than array/list

o But insert/delete/search is O(log n) on balanced tree

o In the worst case, tree may reduce to linked list

â Then insert/delete/search becomes O(n)


Binary Tree

v_city

v_salary v_age

v_age

v_city

v_salary

Discussion:

o Use more space than array/list

o But insert/delete/search is O(log n) on balanced tree

o In the worst case, tree may reduce to linked listâ Then insert/delete/search becomes O(n)


Hash Table

id1 info1 id2 info2 nil

id3 info3

o hash(id_name)→ indexâ A hash function decides mapping from identifier to indexâ Conflicts resolved by chaining multiple IDs to same index

o Memory consumption from hash table (N << M)â M: the size of hash tableâ N: the number of stored identifiers

o But insert/delete/search in O(1) timeâ Can become O(n) with frequent conflicts and long chains


Adding Scope Information to the Symbol Table

o To handle multiple scopes in a program,â (Conceptually) need an individual table for each scopeâ Symbols added to the table may not be deleted just

because you exited a scope

class X { ... void f1() {...} ... }class Y { ... void f2() {...} ... }X v;call v.f1();

â Without deleting symbols, how are scoping rules enforced?� Keep a list of all scopes in the entire program� Keep a stack of active scopes at a given point


Dealing with Scopes Using Stack

P1P2

P3

P4P1’s Symbol Table

o Scopes are defined by nested lexical structuresâ Push pointer to symbol table when entering scopeâ Pop pointer to symbol table when exiting scopeâ Search from top of the active symbol table stack

o Above shows example with nested function definitions



P1P2

P3


P2’s Symbol Table

P3’s Symbol Table





P1P2

P3


P2’s Symbol Table

P3’s Symbol Table

P4’s Symbol Table




Dealing with Scopes Using Chaining

o Disadvantages of stacking symbol tablesâ Inefficient searching with deeply nested scopesâ Inefficient use of memory due to multiple hash tables

(Must size hash tables for avg anticipated size of scope)

o Solution: Single symbol table for all scopes using chaining

h(k)

h(i)

k info1 nl=2 k info2 nl=1 nil

i info3 nl=2

o Does not maintain list of all scopesâ Cannot refer back to old scopes laterâ But still useful for implementing block scopes


Dealing with Scopes Using Chaining

o Operation:â Insert: insert (ID, current nesting level) at front of chainâ Search: fetch ID at the front of chainâ Delete: when exiting level k , remove all symbols with level k

(IDs for each nesting level are maintained in a separate list.not shown in pic)

- Exiting scope discards it (suitable for block scopes)- Exiting scopes becomes slightly more expensive+ Lookup requires only a single hash table access+ Savings in memory due to single large hash table


What Information is Stored in the Symbol Table

o Entry in symbol Table:string kind attributes

â String — the name of identifierâ Kind — function, variable, type name, parameter

o Attributes vary with the kind of symbolsâ variable→ type, address of variableâ function→ prototype, address of function body

o Vary with the languageâ Fortran’s array→ type, dimensions, dimension sizes

real A(3,5) /* size required for static allocation */â C’s array→ type, dimensions, optional dimension size

int A[][5] /* A can be dynamically allocated */


Symbol Table Attribute List

o Type information might be arbitrarily complicatedâ In C: struct {

int a[10];char b;real c;

}

o Store all relevant attributes in an attribute listid array type 1st dimension lower bound upper bound

2nd dimension lower bound upper bound

id struct total size field1 type size

field2 type size


Now we have a Symbol Table with type information.How is this information useful for semantic analysis?


Type Checking


Types and Type Checking

o What is a type?Type = a set of values + a set of operations on these values

o What is type checking?Verifying and enforcing type consistencyâ Only legal values are assigned to a typeâ Only legal operations are performed on a type

o Some type checking examples:â Given char *str = "Hello";, str[2] is legal, str/2 is notâ Given const int pi = 3.14;, pi/2 is legal, pi=2 is not


Static Type Checking vs. Dynamic Type Checking

o Static type checking: type checking at compile timeâ Infers program is type consistent through code analysisâ E.g. int a, b, c; a = b + c; can be proven type consistent

because the addition of two ints is an int

o Dynamic type checking: type checking at execution time

â Type consistency by checking types of runtime values(Usually stored in the form of "type tags" alongside values)

â E.g. Java array bounds check:Is int A[10], i; ... A[i] = i; type consistent?

â E.g. C++/Java downcasting to a subclass:Is dynamic_cast<Child*>(parent); type consistent?

o Static type checking is always more desirable. Why?

â Always desirable to catch errors before runtimeâ Dynamic type checking carries runtime overhead





o Dynamic type checking: type checking at execution timeâ Type consistency by checking types of runtime values

(Usually stored in the form of "type tags" alongside values)â E.g. Java array bounds check:

Is int A[10], i; ... A[i] = i; type consistent?â E.g. C++/Java downcasting to a subclass:

Is dynamic_cast<Child*>(parent); type consistent?

o Static type checking is always more desirable. Why?

â Always desirable to catch errors before runtimeâ Dynamic type checking carries runtime overhead





o Dynamic type checking: type checking at execution timeâ Type consistency by checking types of runtime values

(Usually stored in the form of "type tags" alongside values)â E.g. Java array bounds check:

Is int A[10], i; ... A[i] = i; type consistent?â E.g. C++/Java downcasting to a subclass:

Is dynamic_cast<Child*>(parent); type consistent?

o Static type checking is always more desirable. Why?â Always desirable to catch errors before runtimeâ Dynamic type checking carries runtime overhead


Static vs. Dynamic Typing

o Statically typed: C/C++, Javaâ E.g. int x; /* type of x is int */â Types are explicitly defined or can be inferred from code+ Most type checking happens at compile time→ fast code

o Dynamically typed: Python, JavaScript, PHP

â E.g. var x; /* type of x unknown */â Type is a runtime property decided only during executionâ Same variable may go through different types in its lifetime

- Most type checking must happen at runtime→ slow code+ Flexibility results in shorter code / less programmer effort→ Suitable for scripting or prototyping languages

o In this lecture, we will focus on static type checking ofstatically typed languages, but same techniques can beapplied to dynamic typing


Static vs. Dynamic Typing

o Statically typed: C/C++, Javaâ E.g. int x; /* type of x is int */â Types are explicitly defined or can be inferred from code+ Most type checking happens at compile time→ fast code

o Dynamically typed: Python, JavaScript, PHPâ E.g. var x; /* type of x unknown */â Type is a runtime property decided only during executionâ Same variable may go through different types in its lifetime

- Most type checking must happen at runtime→ slow code+ Flexibility results in shorter code / less programmer effort→ Suitable for scripting or prototyping languages

o In this lecture, we will focus on static type checking ofstatically typed languages, but same techniques can beapplied to dynamic typing


Rules of Inference

o What are rules of inference?â Inference rules have the form

if Precondition is true, then Conclusion is trueâ Below concise notation used to express above statement

PreconditionConclusion

â For example: Given E3→ E1 + E2,if expressions E1, E2 are int types (Precondition),expression E3 is legal and has int type (Conclusion)

o Type checking via inferenceâ Start from variable and constant types at bottom of treeâ Repeatedly apply inference rules while working up the tree

until entire program is inferred legal


Notation for Inference Rules

o By tradition inference rules are written as

Precondition1, ..., Preconditionn

Conclusion

â The precondition/conclusion has the form “e:T”

o Meaningâ If Precondition1 and ... and Preconditionn are true,

then Conclusion is true.â “e:T” indicates “e is legal and is of type T”â Example: rule-of-inference for add operation

e1: inte2: int

e1+e2:int

Rule: If e1, e2 are ints then e1+e2 is legal and is an int


Two Simple Rules

[Constant]i is an integer

i: int

[Add operation]e1: inte2: int

e1+e2:int

o Example: given “2 is an integer” and “3 is an integer”, isthe expression “2+3” legal? Then, what is the type?

2 is an integer2: int


2+3:int

o This type of reasoning can be applied to the entire program


More Rules

[New]

new T: T

[Not]e: Boolean

not e: Boolean

o However,[Var?]

x is an identifierx: ?

â the expression itself insufficient to determine typeâ solution: provide context for this expression


Type Environment

o A type environment gives type info for free variablesâ A variable is free if not defined inside the expressionâ It is a function mapping Identifiers to Types

Set of definitions active at the current point in codei.e. Set of symbols and their types in symbol table


Type Environment Notation

Let O be a function (mapping) from Identifiers to Types,the sentence O e:T

is read as “under environment O, expression e has type T”


O e1: intO e2: int

O e1+e2: intO(x) = TO x: T

– “if 5 is an integer constant, expression 5 is an integer underany environment”– “if e1 and e2 are int under O, then e1+e2 is int under O”

– “if variable x is mapped to int in O, expression x is int under O”


Definition Rule

o Notation for definition: let x: T in e(Meaning: define x as type T in expression e)

[Definition w/o initialization]

O[T0/x] e1: T1O let x: T0 in e1: T1

O[T0/x] means, O is modified to return T0 on argument xand behave as O on all other arguments:O[T0/x](x) = T0

O[T0/x](y) = O(y) when x6=y

o Translation: If expression e1 is type T1 when x is mappedto type T0 in the current environment, then expression “letx: T0 in e1” is type T1 in the current environment.


Definition Rule with Initialization

[Definition with initialization (initial try)]

O[T0/x] e1: T1

O e0: T0

O let x: T0 ← e0 in e1: T1

o The rule is too strict (i.e. correct but not complete)Example

class C inherits P ...let x:P← new C in ...

� the above rule does not allow this code


Subtyping

o Subtyping is a relation ≤ on typesâ X ≤ Xâ if X inherits from Y, then X ≤ Yâ if X ≤ Y and Y ≤ Z, then X ≤ Z

o An improvement of our previous rule

[Definition with initialization]

O[T0/x] e1: T1

T ≤ T0

O e0: T

O let x: T0 ← e0 in e1: T1

â Both versions of definition rules are correctâ The improved version checks more programs


Assignment

o A correct but too strict rule[Assignment]

O(id) = T0O e1: T1T1 ≤ T0

O id← e1: T0

â The rule does not allow the below codeclass C inherits P { ... }let x: C, y: P inx← y← new C

â Type of y← new C becomes Pâ x← y is illegal, since P ≤ C not satisfied


Assignment

o An improved rule[Assignment]

O(id) = T0O e1: T1T1 ≤ T0

O id← e1: T1

â Now the rule allows the below codeclass C inherits P { ... }let x: C, y: P inx← y← new C

â Type of y← new C becomes C


If-then-else

o Considerif e0 then e1 else e2

The result can be either e1 or e2

The type is either e1’s type or e2’s type

â The best that we can do (statically) is the super type largerthan e1’s type and e2’s type

o Least upper bound (LUB)â Z = lub(X,Y) — Z is the least upper bound of X and Y iff

X≤Z ∧ Y≤Z ; Z is an upper boundX≤W ∧ Y≤W =⇒ Z≤W ; Z is least among all upper bounds


If-then-else

[If-then-else]

O e0: BoolO e1: T1O e2: T2

O if e0 then e1 else e2 fi: lub(T1,T2)

o The rule allows the below codelet x:float, y:int, z:float inx← if (...) then y else z/* Assuming lub(int, float) = float */


Implementing Type Checking on AST


Error Recovery

o Just like other errors, we should recover from type errorsâ One type error can have a cascade effect

let y: int← x+2 in y+3If x is undefined —- reports error “x type undefined”then, x+2 is illegal —- reports error “x+2 type undefined”then, y+3 is illegal —- reports error “y+3 type undefined”...

o Solution: assign no-type to first ill-typed expressionâ no-type is compatible with all types

Can be converted to any type to satisfy the type rulePrevents cascade of type errors due to one error

â Report the place where no-type is generatedReport the root cause of the problem


Wrong Definition Rule (case 1)

o Consider a hypothetical let rule

[Wrong Definition with initialization (case 1)]

O e0: TT ≤ T0

O e1: T1O let x: T0 ← e0 in e1: T1

â How is it different from the the correct rule?O e1: T1 instead of O[T0/x] e1: T1

â The following good program does not pass checklet x: int← 0 in x+1

â Reason: Precondition O e1: T1 (O x+1: int) is not satisfiedâ With correct rule: Precondition O[int/x] x+1: int is satisfied





O e0: TT0 ≤ T

O[T0/x] e1: T1O let x: T0 ← e0 in e1: T1

â How is it different from the the correct rule?T0 ≤ T instead of T ≤ T0

â The following bad program passes the checkclass C inherits P { only_in_C() { ... } }let x: C← new P in x.only_in_C()

â Reason: Precondition T0 ≤ T (C ≤ P) is satisfiedâ With correct rule: Precondition P ≤ C is not satisfied





O e0: TT ≤ T0

O[T/x] e1: T1O let x: T0 ← e0 in e1: T1

â How is it different from the the correct rule?O[T/x] e1: T1 instead of O[T0/x] e1: T1

â The following good program does not pass the checkclass C inherits P { ... }let x: P← new C in x← new P

â Reason: Precondition O[T/x] e1: T1 (O[C/x] x← new P: P)is not satisfied, since P ≤ C is not satisfied

â With correct rule: O[P/x] x← new P: P is satisfied


Discussion

o Type rules must walk a tight rope betweenâ making the type system unsound

(bad programs are accepted as well typed)â or, making the type system less usable

(good programs are rejected)

o What do we mean by a “good” program?â A program where all operations performed on all values are

type consistent at runtimeâ Runtime value types may differ from defined variables typesâ let x: C← new P in x.only_in_C() is a bad programâ let x: C← new P in x.in_P() is a good programâ Most type systems will reject both, even the good program

o Q: Can type systems be perfectly sound and usable?


Designing a Good Type System

o Type system has two conflicting design goalsâ Soundness: prevent “bad” programs from passingâ Usability: allow programmer to easily express a “good”

program within the boundaries of the type system

o Goal: Maximum usability without compromising soundness

â An example where the two goals collideclass Count {

int i = 0;Count inc() { i=i+1; return this; }

}class Stock inherits Count { ... }class Main {

Stock a← (new Stock).inc(); // Type error!}


What Went Wrong?

o What is (new Stock).inc()’s type?â Dynamic type — Stockâ Static type — Count

â The type checker “looses” the type informationâ This makes inheriting inc() useless

Redefining inc() for each subclass to return the correct typedefeats the whole purpose of inheritance

o SELF_TYPE can make type system more usable


SELF_TYPE: More Usability

o What is SELF_TYPE?â The type (or class) on which the method is calledâ If we define inc() as SELF_TYPE inc() { i=i+1; return this; },

Type of (new Stock).inc() would be Stock, not Countâ SELF_TYPE is a static type (apparent from program text)→ can be checked at compile time

o Usable: Allows more good programs to pass withoutburden to programmer

o Sound: Does not compromise type consistency

o In practice,â Designed into type systems of some languages (e.g. Scala)â Or emulated by others using templates / generics (e.g.

C++, Java...)


Can Static Type Systems ever be Perfect?

o Fundamentally impossible to express all runtime behaviorin a type system due to control divergenceâ E.g. Type of if-then-else is LUB of two types, a conservative

estimate of runtime behaviorx← if (y > 0) then 10 else "Hello"z← if (y > 0) then x/2 else x.append("World") // Type error?

â Above is a good program but will not pass type check

o Reason why statically typed languages sometimes resortto dynamic type checkingâ Why C++ / Java programmers are sometimes forced to

downcast with dynamic type checkingâ Programmer knows object is of child type at runtime in that

particular execution path, but type system doesn’t


Can Static Type Systems ever be Perfect?

o Must choose between usability and soundness

o Dynamically typed languages (Python, JavaScript, PHP ...)

+ Usable: No fiddling with type system to make programs run- Sound: No type checking done before running

o Statically typed languages (Java, C++, ...)+ Sound: Can discover many type errors before running- Usable: Must sometimes devise elaborate type systems

(using inheritance, generic programming, ...), to makeprograms run

o In between: gradual typing (Cython, TypeScript, ...)â Allows a choice between static and dynamic typing


Syntax Directed Translation


What is Syntax Directed Translation?

o To drive semantic analysis tasks based on the language’ssyntactic structure

o What is meant by semantic analysis tasks?â Generate / maintain symbol tableâ Add symbol definitions to symbol table with type informationâ Map symbol uses in AST to symbol definitions in tableâ Type check each expression based on mappingâ Generate intermediate representation (IR)

o What is meant by syntactic structure?â Structure of program given by context free grammar (CFG)


How is Syntax Directed Translation Performed?

o How?â Attach attributes to grammar symbols/parse treeâ Evaluate attribute values using semantic actions

o We already did some of this in Project 2:â Attached attributes to grammar symbols

tptr — “tree pointer” of a non-terminal symbolBy the time program.tptr is evaluated, the parse tree is built

â Evaluated attributes using semantic actions{ ... $$=makeTree(ProgramOp, leftChild, rightChild); ... }


Attributes?

o Attributes can represent anything depending on taskâ A stringâ A typeâ A numberâ A memory location

o An attribute grammar is a grammar augmented byassociating attributes (and rules to compute them) witheach grammar symbol


Specification and Implementation

o Specification: Syntax Directed Definitions (SDD)â Set of semantic rules attached to each productionâ Semantic rules: compute attributes or side-effects

E.g., computing the type of an expression (attribute)E.g., inserting a definition to the symbol table (side-effect)In essence, attribute grammar + side-effects

o Implementationâ Using Parse Trees

Apply semantic actions after parsingActions applied at each node during parse tree traversal

â Using Syntax Directed Translation Scheme (SDTS)Apply semantic actions in the course of parsingAugment grammar with semantic actions embedded withinRHS of productions


Specification and Implementation: Example

o Computing a prefix notation of expression (assume || is theconcatenation operator)

o Specification using semantic rulesE→ E1 + E2 RULE: E.out = ’+’ || E1.out || E2.outE→ id RULE: E.out = id.lexval

o One implementation using semantic actionsE→ E1 + E2 {E.out = ’+’ || E1.out || E2.out}E→ id {E.out = id.lexval}

o Another implementation using semantic actionsE.in — current prefix before producing EE.out — current prefix after producing EE→ {E1.in = E.in || ’+’} E1 + {E2.in = E1.out} E2 {E.out = E2.out}E→ id {E.out = E.in || id.lexval}


Specification: Syntax Directed Definition (SDD)

o Two types of attributes:â Synthesized attributes: attributes are computed from

attributes of children nodesP.synthesized_attr = f(C1.attr, C2.attr, C3.attr)

â Inherited attributes: attributes are computed fromattributes of sibling and parent nodes

C2.inherited_attr = f(P1.attr, C1.attr, C3.attr)

P

C1 C2 C3

Synthesized attribute

P

C1 C2 C3

Inherited attribute


Specification: Syntax Directed Definition (SDD)

o Attribute grammar

A → α β γ

Inherited

SynthesizedSDD has rule of the form for each CFG production

b = f(c1, c2, ..., cn)either

1. If b is a synthesized attributed of A,ci (1≤i≤n) are attributes of grammar symbols of its RightHand Side (RHS); or

2. If b is an inherited attribute of one of the symbols of RHS,cis are attribute of A and/or other symbols on the RHS


Synthesized Attribute Example

o Exampleâ Each non-terminal symbol is associated with val attributeâ Each grammar rule is associated with a semantic rule

Production Rules

L → EE → E1 + TE → TT → T1 * FT → FF → ( E )F → digit

Semantic Rules

print(E.val)E.val = E1.val + T.valE.val = T.valT.val = T1.val * F.valT.val = F.valF.val = E.valF.val = digit.lexval


Inherited Attribute Example

o Example:We can use inherited attributes to track type informationâ T — synthesized attribute “type”â L — has inherited attribute “in”

Production Rules

D → T LT → intT → realL → L1 , idL → id

Semantic Rules

L.in = T.typeT.type = integerT.type = realL1.in = L.in, addtype (id.entry, L.in)addtype(id.entry, L.in)


Attribute Parse Tree

o Parse tree showing values of attributesâ Parse tree annotated or decorated with attributesâ Attributes computed at each node

o Properties of attribute parse tree:â Terminal symbols — have synthesized attributes only,

which are usually provided by the lexical analyzerâ Start symbol — does not have any inherited attributes

E

E

T

T

F

digit2

* F

digit3

+ T

F

digit4

E.val=10

E.val=6

T.val=6

T.val=3

F.val=3

digit=3

F.val=3

digit=3

T.val=4

F.val=4

digit=4


Implementation 1: Using Parse Trees

o Creates an attribute parse tree from the given parse treeâ Traverse tree in a certain order and evaluate semantic

actions at each nodeâ Traversal order can be arbitrary as long as it adhers to

dependency relationships between attribute values


Dependency Graph

o Directed graph where edges are dependency relationshipsbetween attributesâ Needs to be acyclic such that there exists a traversal order

for evaluationi.e. all necessary information must be ready when evaluatingan attribute at a node

D

T

int

L1

L2

L3

id

, id

, id


Dependency Graph

o Directed graph where edges are dependency relationshipsbetween attributesâ Needs to be acyclic such that there exists a traversal order

for evaluationi.e. all necessary information must be ready when evaluatingan attribute at a node

D

T

int

L1

L2

L3

id

, id

, id

T.type=int L1.in=T.type

L2.in=L1.in

L3.in=L2.in

addtype(id.entry, L3.in)




Implementation 2: Using SDTS

o Using parse trees+ Flexible: works for all SDDs except for dependency cycles- Inefficient: needs (one or more) traversals of parse tree

o Is it possible to perform evaluation while parsing?â Embed semantic actions in grammar using SDTSâ What are some potential problems?

Parser may not have even "seen" some nodes yetSome dependencies may not exist at time of evaluation

â Different parsing schemes see nodes in different ordersTop-down parsing — LL(k) parsingBottom-up parsing — LR(k) parsing

o For certain classes of SDDs, using SDTS is feasibleâ if dependencies of SDD are amenable to parse orderâ In other words, a Left-Attributed Grammar


Left-Attributed Grammar

o An SDD is L-attributed if each of its attributes is

eitherâ a synthesized attribute of A in A→ X1...Xn,

orâ an inherited attribute of Xj in A→ X1...Xn that

depends on attributes of symbols to its left i.e. X1...Xj−1

and/or depends on inherited attributes of Acannot depend on attributes of symbols to its right i.e.Xj+1...Xn


Left-Attributed Grammar

o An L-Attributed grammarâ may have synthesized attributesâ may have inherited attributes but only from left sibling

attributes or inherited attributes of the parent

o Evaluation orderâ Left-to-right depth-first traversal of the parse tree

Evaluate inherited attributes while going down the treeEvaluate synthesized attributes while going up the tree

â Order for both top-down and bottom-up parsers

o Can be evaluated using SDTS w/o parse tree


Syntax Directed Translation Scheme (SDTS)

o Syntax: A→ α { action1 } β { action2 } ...

o Meaning: Actions are executed “at that point” in the RHSâ Top-down: Right after previous symbol consumedâ Bottom-up: Right after previous symbol pushed to stack

(when the ’dot’ reaches the action)

o Actions for synthesized attributes occur at end of RHSâ Children attributes for A are only available at the endâ What we did when building parse tree for project 2, since all

parse tree attributes are synthesized

o Actions for inherited attributes occur at middle of RHSâ Inherited attributes for β can be computed at action1

Inherited attribute of A available from beginningLeft sibling attributes for α available by then



o Example: arithmetic expression after left-recursion removalE → T RR → + T RR → - T RR → εT → ( E )T → num

o Both inherited and synthesized attributes are usedâ T — synthesized attribute T.valâ R — inherited attribute R.i

synthesized attribute R.sâ E — synthesized attribute E.val



o Evaluating attributes using SDTS

E → T {R.i=T.val} R {E.val=R.s}R → + T {R1.i=R.i+T.val} R1 {R.s=R1.s}R → - T {R1.i=R.i-T.val} R1 {R.s=R1.s}R → ε {R.s=R.i}T → ( E ) {T.val=E.val}T → num {T.val=num.val}

E

T

num

R1

+ T R2

num + T R3

num ε

T.val=num R1.i=T.val

num





E

T

num

R1

+ T R2

num + T R3

num ε

R1.i=T.val

T.val = num R2.i=R1.i+T.val

num





E

T

num

R1

+ T R2

num + T R3

num ε

E.val=R1.s

R1.i=T.val R1.s=R2.s

R2.i=R1.i+T.val R2.s= R3.s

T.val = num R3.i=R2.i+T.val R3.s= R3.i

num


Syntax Directed Translation SchemeImplementation


Translation Scheme for Bottom-Up Parsing

When using LR parsing (bottom-up parsing),

o it is natural and easy to evaluate synthesized attributes

T

T * F digit4

F digit3

digit2

T.val=6

T.val=3 F.val=3 digit4

F.val=3 digit3

digit2

parsing stack:

(state) (symbol) (attribute)

S? $ -S? T T.val=6





E

T

T * F digit4

F digit3

digit2

E.val=6

T.val=6


F.val=3 digit3

digit2

parsing stack:


S? $ -S? E E.val=6





E +

T F

T * F digit4

F digit3

digit2

E.val=6

T.val=6 F.val=4


F.val=3 digit3

digit2

parsing stack:


S? $ -S? E E.val=6

S? + -S? F F.val=4





E + T

T F

T * F digit4

F digit3

digit2

E.val=6 T.val=4

T.val=6 F.val=4


F.val=3 digit3

digit2

parsing stack:


S? $ -S? E E.val=6

S? + -S? T T.val=4





E

E + T

T F

T * F digit4

F digit3

digit2

E.val=10

E.val=6 T.val=4

T.val=6 F.val=4


F.val=3 digit3

digit2

parsing stack:


S? $ -S? E E.val=10




o it is not natural to evaluate inherited attributes

int , id

, id

id

T

D

L1

L2

L3

int


L2.in=L1.in

L3.in=L2.in

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

parsing stack:


S? $ -





int , id

, id

id

parsing stack:


S? $ -





int , id

, id

id

T

int

T.type=int

parsing stack:


S? $ -S? T T.type=int





int , id

, id

id

T

int

T.type=int

addtype(id,L3.in)

parsing stack:



S? id id.type=L3.in





int , id

, id

id

T

int

T.type=int

addtype(id,L3.in)

?

parsing stack:



S? id id.type=L3.in





int , id

, id

id

T

L2

L3

int

T.type=int

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

parsing stack:



S? id id.type=L3.in





int , id

, id

id

T

L2

L3

int

T.type=int

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

?parsing stack:



S? id id.type=L3.in


Evaluating Inherited Attributes using LR

o Recall

o Only applies to L-Attributed grammars� What is L-attributed grammar?

o Claim: the information is in the stack, we just do not knowthe exact location

o Solution: let us hack the stack to find the location


D → T LT → int {stack[top]=integer}T → real {stack[top]=real}L → L , id {addtype(stack[top],stack[top-3])}L → id {addtype(stack[top],stack[top-1])}

int , id

, id

id

T

D

L1

L2

L3

int


L2.in=L1.in

L3.in=L2.in

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

parsing stack:


S? $ -



int , id

, id

id

parsing stack:


S? $ -



int , id

, id

id

T

int

T.type=int

parsing stack:





int , id

, id

id

T

int

T.type=int

addtype(id,L3.in)

parsing stack:



S? id id.type=stack[top-1]



int , id

, id

id

T

int

T.type=int

addtype(id,L3.in)

stack[top-1]

parsing stack:






int , id

, id

id

T

L2

L3

int

T.type=int

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

parsing stack:



S? L3 L3.in=int

S?,




int , id

, id

id

T

L2

L3

int

T.type=int

addtype(id,L3.in)

addtype(id,L1.in)

addtype(id,L2.in)

stack[top-3]parsing stack:



S? L3 L3.in=int

S?,



Marker

o Given the following SDD, where |α| != |β|A→ X α Y | X β YY→ γ {... = f(X.s)}

o Problem: cannot generate stack location for X.s since X isat different relative stack locations from Y

o Solution: introduce markers M1 and M2 that are at thesame relative stack locations from YA→ X α M1 Y | X β M2 YY→ γ {... = f(M12.s)}M1→ ε {M1.s = X.s}M2→ ε {M2.s = X.s}(M12 = relative stack locations of M1 and M2, or top-|γ|)

o A marker intuitively marks a stack location that isequidistant from the reduced non-terminal


Example

o How to add the marker ?

Example 1:S → a A { C.i = A.s } CS → b A B { C.i = A.s } CC → c { C.s = f(C.i) }

Solution:S → a A { C.i = A.s } CS → b A B { M.i=A.s } M { C.i = M.s } CC → c { C.s = f(C.i) }M → ε { M.s = M.i }

That is:S → a A CS → b A B M CC → c { C.s = f(stack[top-1]) }M → ε { M.s = stack[top-2] }


How to Add the Marker?

1. Identify the stack location(s) to find the desired attribute2. Is there a conflict of location?

â Yes, add a marker;â No, no need to add.

3. Add the marker in the place to remove locationinconsistency

Example:

S → a A B C E DS → b A F B C F DC → c {/* C.s = f(A.s) */}D → d {/* D.s = f(B.s) */}


Answer

S → a A B C E DS → b A D M B C F DC → c {/* C.s = f(stack[top-2]) */}D → d {/* D.s = f(stack[top-3]) */}M → ε {/* M.s = f(stack[top-2]) */}

o Regarding C.s, from stack[top-2], and stack[top-3].... add a Marker

o Regarding D.s, always from stack[top-3]... no need to add


o How about Top-Down Parsing?


Translation Scheme for Top-Down Parsing

o Predictive Recursive Descent Parsers: Straightforwardâ Synthesized Attribute: Return value of function call for

non-terminal is synthesized attributeAll function calls for children nodes would have completed bythe time this function call returnsAll dependent values would have been computed

â Inherited Attribute: Pass as argument to function call fornon-terminal inheriting attribute

L-Attributed grammar guarantees that dependent attributescome from left sibling attributes or parent inherited attributesLeft sibling function calls would have completed and parentinherited attribute would have been passed in as argumentAll dependent values would have been computed

o Now let’s focus on table-driven LL Parsers


Translation Scheme for LL Parsing

When using LL parsing (top-down parsing),

o it is natural to evaluate inherited attributes

D

T L1

int id , L2

id , L3

id


int L2.in=L1.inaddtype(id,L1.in)

L3.in=L2.inaddtype(id,L2.in)

addtype(id,L3.in)

parsing stack:

(symbol) (attribute)





Dparsing stack:


D





D

T L1T.type=int L1.in=T.type

parsing stack:


L1 L1.in=()

T T.type=int





D

T L1T.type=int L1.in=T.type

parsing stack:


L1 L1.in=int





D

T L1

id , L2


L2.in=L1.inaddtype(id,L1.in)

parsing stack:


L2 L2.in=L1.in

,id id.type=L1.in




o it is not natural to evaluate synthesized attributes

E1

E2 + T1

T2 F1

T3 * F2 digit4

F3 digit3

digit2

E.val=10

E.val=6 T.val=4

T.val=6 F.val=4


F.val=3 digit3

digit2

parsing stack:


o Solution

1. Always push a ’dummy’ stack item below a non-terminal tohold RHS with place holders for children attributes

2. Update dummy item whenever a child attribute is computedand popped from stack

3. When all children attributes have been computed, computesynthesized attribute and pop from stack





E1

parsing stack:


E1

o Solution








E1

parsing stack:


E1 E1.val=?

E.val=?

o Solution








E1

E2 + T1

parsing stack:


E2 E2.val=?

+

T1 T1.val=?

E.val=?

E.val=? T.val=?

o Solution








E1

E2 + T1

parsing stack:


E2 E2.val=?

+

T1 T1.val=?

E.val=?

E.val=? T.val=?

o Solution1. Always push a ’dummy’ stack item below a non-terminal to

hold RHS with place holders for children attributes2. Update dummy item whenever a child attribute is computed

and popped from stack3. When all children attributes have been computed, compute

synthesized attribute and pop from stack





E1

parsing stack:


E1.val ???

E1









E1

E2 + T1

parsing stack:


E1.val E2.val + T1.val

E2.val ???

E2

+

T1.val ???

T1





Semantic Analysis -...

Documents

Transcript of Semantic Analysis -...