Joey Paquet, 2000, 2002, 2007, 20081 Concordia University Department of Computer Science COMP...
-
Upload
charles-snow -
Category
Documents
-
view
218 -
download
0
Transcript of Joey Paquet, 2000, 2002, 2007, 20081 Concordia University Department of Computer Science COMP...
Joey Paquet, 2000, 2002, 2007, 2008 1
Concordia UniversityDepartment of Computer
Science
COMP 442/6421Compiler Design
Joey Paquet, 2000, 2002, 2007, 2008 2
Course Description
• Instructor– Name: Dr. Joey Paquet– Office: EV-3-221– Phone: 7831– e-mail: [email protected]– Web: www.cse.concordia.ca/~paquet
Joey Paquet, 2000, 2002, 2007, 2008 3
Course Description
• Topic– Compiler organization and implementation. – Lexical, syntax and semantic analysis. Code
generation.
• Outline– Design and implementation of a simple
compiler.– Lectures related to the project.
Joey Paquet, 2000, 2002, 2007, 2008 4
Course Description
• Grading– Assignments (4) : 40%– Final Examination : 30%– Final Project : 30%
• Late assignment penalty: 50% per working day• Assignments and project are graded on:
Correctness, Completeness, Design, Style, Documentation.
Joey Paquet, 2000, 2002, 2007, 2008 5
Project Description
• Design and coding of a simple compiler– Individual work– Divided in four assignments– Final project is graded at the end of the
semester, during a final demonstration– Testing is VERY important and up to you
Joey Paquet, 2000, 2002, 2007, 2008 6
Project Description
• A complete compiler is a fairly complex and large program: from 10,000 to 1,000,000 lines of code.
• Programming one will force you to go over your limits.
• It uses most of the elements of the theoretical foundations of Computer Science.
• It will probably be the most complex program you have ever written.
Joey Paquet, 2000, 2002, 2007, 2008 7
Introduction to Compilation
• A compiler is a translation system. • It translates programs written in a high
level language into a lower level language, generally machine (binary) language.
source code compiler
targetcode
Source language Target languageTranslator
Joey Paquet, 2000, 2002, 2007, 2008 8
Introduction to Compilation
• The only language that the processor understands is binary.
a: Register addition (from a symbol table)b: First operand (R1) c: Second operand (R3)d: Third operand (R15)
000100000100111111
a b c d
Joey Paquet, 2000, 2002, 2007, 2008 9
Introduction to Compilation• Assembly language is the first higher level
programming language.• 000100000100111111 <=> Add R1,R3,R15• There is a one-to-one correspondence between lines of
code and the machine code lines.• A op-code table is sufficient to translate assembly
language into machine code.
Joey Paquet, 2000, 2002, 2007, 2008 10
Introduction to Compilation
• Compared to binary, it greatly improved the productivity of programmers. Why?
• Though a great improvement, it is not ideal: – Not easy to write– Even less easy to read and understand– Extremely architecture-dependent
Joey Paquet, 2000, 2002, 2007, 2008 11
Introduction to Compilation
• A compiler translates a given high-level language into assembler or machine code.
X=Y+Z;
L 3,Y Load working register with YA 3,Z Add Z to working registerST 3,X Store the result in X
000010010010110001001001010100100100101001
Joey Paquet, 2000, 2002, 2007, 2008 12
FORTRAN: The first compiler
• The problems with assembly led to the development of the first compiler: FORTRAN.
• Stands for FORmula TRANslation.• Developed between 1954 and 1957 at
IBM by a team led by John Backus. • This was an incredible feat, as the
theory of compilation was not available at the time.
Joey Paquet, 2000, 2002, 2007, 2008 13
Paving down the road• In parallel to that, Noam Chomsky was investigating on the structure
of natural languages. • His studies led the way to the classification of languages according to
their complexity (aka the Chomsky hierarchy).• This was used by various theoreticians in the 1960s and early 1970s to
design a fairly complete set of solutions to the parsing problem. • These solutions have been used ever since.
• As the parsing solutions became well understood, efforts were devoted to the development of parser generators.
• The most commonly known is YACC (Yet Another Compiler Compiler).• Developed by Steve Johnson in 1975 for the Unix system.
Joey Paquet, 2000, 2002, 2007, 2008 14
Compilation vs. Interpretation
• A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement.– Advantage: immediate response– Drawbacks: inefficient with loops, restricted
to single-file programs.
Joey Paquet, 2000, 2002, 2007, 2008 15
Compiler’s Environment
• Building an executable from multiple files
sourcecode
compiler objectcode
executablecode
linker
run-timelibraries
compiledmodules
Joey Paquet, 2000, 2002, 2007, 2008 16
Phases of a Compiler
front-end
back-end
target code
intermediatecode
syntax treetoken stream annotatedtree
optimized target code
source code
target codegeneration
high-leveloptimization
syntacticanalysis
lexicalanalysis
semanticanalysis
low-leveloptimization
Joey Paquet, 2000, 2002, 2007, 2008 17
Lexical analysis
• Transforms the initial stream of characters into a stream of tokens – keywords : while, to, do, int, main– identifiers : i, max, total, i1, i2– literals : 123, 12.34, “Hello”– operators : +, *, and, >, <– punctuation : {, }, [, ], ;
Joey Paquet, 2000, 2002, 2007, 2008 18
Syntactic analysis
• Attempts to build a valid parse tree from the grammatical description of the language.
S
id =
idid
*
;
E
E
E
Distance = rate * time;
Joey Paquet, 2000, 2002, 2007, 2008 19
Semantic Analysis• The semantics of a program is its meaning. • It is possible to have syntactically valid
program that does not have any meaning.
• Semantic analysis has two parts: – Semantic checking: Validating the semantics of a
syntactically valid program and gathering information about the meaning of its constitents (attributes).
– Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation.
Joey Paquet, 2000, 2002, 2007, 2008 20
Semantic Translation: example
• Breaks the statements into small pieces corresponding roughly to machine instructions.
x = a*y+z;t1 = a*y;t2 = t1+z;x = t2;
Joey Paquet, 2000, 2002, 2007, 2008 21
High-Level Optimization• The generated intermediate representation is often
inefficient because of bad structure or redundancy.
• This kind of optimization is not bound to the target machine’s architecture.
t1 = a*y;t2 = t1+z;x = t2;
t1 = a*y;x = t1+z;
Joey Paquet, 2000, 2002, 2007, 2008 22
Target Code Generation
• Translates the optimized intermediate representation into the target code (normally machine language or assembler).
t1 = a*y;x = t1+z;
LE 4,a a in register 4ME 4,y multiply by yAE 4,z add zSTE 4,x store register 4 in x
Joey Paquet, 2000, 2002, 2007, 2008 23
Passes, Front End and Back End
• A pass consists in reading a high-level version of the program and writing a new lower-level version.
• Several passes are often needed:– To resolve forward references– To limit the memory used by the different
phases.
Joey Paquet, 2000, 2002, 2007, 2008 24
Low-Level Optimization
• The generated target code is analyzed for inefficiencies such as dead code or code redundancy.
• Care is taken to exploit as much as possible the CPU’s capabilities.
• This phase is heavily architecture dependent.
• Lots of research is still done in this very complex area.
Joey Paquet, 2000, 2002, 2007, 2008 25
Passes, Front End and Back End• The front-end is composed of: Lexical, Syntactic,
Semantic analysis and High-level optimization.• In most compilers, most of the front-end is driven by the
Syntactic analyzer. • It calls the Lexical analyzer for tokens and generates an
abstract syntax tree when syntactic elements are recognized.
• The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process.
• It has little or no concern with the target machine.
Joey Paquet, 2000, 2002, 2007, 2008 26
Passes, Front End and Back End
• The back-end is composed of: Code generation and low-level optimization.
• Uses the intermediate representation generated by the front-end to generate target machine code.
• Heavily dependent on the target machine.
• Independent on the programming language compiled.
Joey Paquet, 2000, 2002, 2007, 2008 27
System Support
• Symbol table– Central repository of identifiers (variable or
function names) used in the compiled program.
– Contains information such as the data type or value in the case of constants.
– Used to identify undeclared or multiply declared identifiers, as well as type mismatches.
– Provides temporary variables for intermediate code generation.
Joey Paquet, 2000, 2002, 2007, 2008 28
System Support
• Error handling procedures– Implement the compiler’s response to errors
in the code it is compiling.– Provides useful insight to the user about
where is the error and what it is.– Should find all errors in the whole program.– Can attempt to correct some errors and only
give a warning.
Joey Paquet, 2000, 2002, 2007, 2008 29
System Support
• Run-time system– Some programming languages concepts
raise the need for dynamic memory allocation. What are they?
– The running program must then be able to manage its own memory use.
– Some will require a stack, others a heap. These are managed by the run-time system.
Joey Paquet, 2000, 2002, 2007, 2008 30
Writing of Early Compilers
• The first C compiler
minimal Ccompiler source assembler
executableC compiler(minimal)
C compiler(minimal)
full Ccompiler source
executableC compiler
(full)
Joey Paquet, 2000, 2002, 2007, 2008 31
Writing Cross-Compilers
• A Unix-MacIntosh C cross compiler
Mac C compilersource code
in Unix C
Unix Ccompiler
Mac C complierusable on Unix
Mac C complierusable on Unix
Mac C compilersource code
in Unix C
Mac C complierusable on Mac
Joey Paquet, 2000, 2002, 2007, 2008 32
Writing Retargetable Compilers
• Two methods: – Make a strict distinction between front-end
and back-end, then use different back-ends.– Generate code for a virtual machine, then
build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project.
Joey Paquet, 2000, 2002, 2007, 2008 33
Summary
• The first compiler was the assembler, a one-to-one direct translator.
• Complex compilers were written incrementally, first using assemblers.
• All compilation techniques are well known since the 60’s and early 70’s.