Chap. 10, Intermediate Representations J. H. Wang Dec. 14, 2015.

Post on 19-Jan-2016

222 views 0 download

Transcript of Chap. 10, Intermediate Representations J. H. Wang Dec. 14, 2015.

Chap. 10, Intermediate Representations

J. H. WangDec. 14, 2015

Outline

• Overview• Java Virtual Machine• Static Single Assignment Form

Overview

• Ch.7: AST• Ch.8-9: Semantic analysis• Ch.10: Intermediate representation• Ch.11: Code synthesis for virtual

machines• Ch.12: Runtime support• Ch.13: Target code generation

Overview

• Semantic gap between high-level source languages and target machine language

• Examples– Early C++ compilers

• cpp: preprocessor• cfront: translate C++ into C• C compiler

Another Example

• LaTeX– TeX: designed by Donald Knuth– dvi: device-independent intermediate

representation– Ps: PostScript– pixels

• Portability enhanced

Challenges

• Challenges– An intermediate language (IL) must be

precisely defined– Translators and processors must be

crafted for an IL– Connections must be made between

levels so that feedback from intermediate steps can be related to the source program

• Other concerns– Efficiency

The Middle-End

• Front-end: parser• Back-end: code generator• Middle-end: components between front-

and back-ends• Compiler suites that host multiple source

languages and target multiple instruction sets obtain great leverage from a middle-end– Ex: s source languages, t target languages

• s*t vs. s+t

Additional Advantages

• An IL allows various system components to interoperate by facilitating access to information about the program– E.g. variable names and types, and source line

numbers could be useful in the debugger• An IL simplifies development and testing

of system components• The middle-end contains phases that

would otherwise be duplicated among the front- and back-ends

• It allows components and tools to interface with other products

• It can simply the pioneering and prototyping of news ideas

• The ILs and its interpreter can serve as a reference definition of a language

• Interpreters written for a well-defined IL are helpful in testing and porting compilers

• An IL enables the crafting of a retargetable code generator, which greatly enhances its portability– Pascal: P-code– Java: JVM– Ada: DIANA (Descriptive Intermediate Attributed

Notation for Ada)

Java Virtual Machine

• Java class files: binary encodings of the data and instructions in a Java program

• Design principles (borrowed from JVM reference)– Compactness

• Instructions in nearly zero-address form– A runtime stack is used– Operands are implicit

» E.g.: iadd instructionL pops two items and pushes the sum onto TOS (tops of stack)

– A loss of runtime performance• Multiple instructions to accomplish the same effect

– To push 0 on TOS» iconst_0: 1 byte» ldc_w 0: 3 bytes

– Safety• An instruction can reference storage only if

it’s of the type allowed by the instruction, and only if the storage is located in an area appropriate for access

• From security’s point of view, purely zero-address form is problematic

– The registers that could be accessed by a load instruction may not be known until runtime

– JVM: not zero-address» E.g. iload 5

• When a class file is loaded, many other checks are performed by the bytecode verifier

Contents of a Class File

• A class file is organized into sections called attributes that contain various information about the compiled class– Types: primitive and reference types– (Fig. 10.4)

• Primitive type: a single character• Reference type t: Lt;

– E.g.: String type in java.lang package: Ljava/lang/String;

• Array: [a

– Constant pools• tagged union

– int, float, java.lang.String

• Referenced by its ordinal position, not byte-offset

– 1 byte for some instructions, e.g. ldc– 2 bytes for some instructions, e.g. ldc_w

JVM Instructions

• Arithmetic• Register traffic• Registers and types• Static fields• Instance fields• Branching• Other method calls• Stack operations

Arithmetic

• Popping operands from the runtime stack, computing result, and pushing the result on TOS– E.g. iadd

• int: 32-bit, 2’s complement

– For other primitive types• fadd(float)• ladd(long)• dadd(double)

– Subtraction, multiplication, division, …

Register Traffic• JVM has an unlimited number of virtual

registers– Usually allocated in a method’s stack frame

• JVM registers typically host a method’s local variables– Registers starting from 0 are set aside for a

method’s parameters• JVM registers are untyped

– iload 2: push• iload_2: abbreviated (2 bytes)

– istore 10: pop– fload n: for float values– aload and astore: for reference types (32 bits

for object references)

Registers and Types

• Static analysis (or bytecode verification)– To ensure that values flow in and our of

registers without compromising Java’s type systems

• JVM appears to be stricter than Java language– E.g. Type conversion

• i2f: from 2’s complement to IEEE floating point format

Static Fields

• A class’s static fields are present in every instance of the class

• getstatic name type: push– E.g.: getstatic java/lang/System/out

Ljava/io/PrintStream;– Only 3 bytes in representation

• One: getstatic opcode• Two: 16-bit integer specifying a constant-

pool entry

• putstatic: pop

Instance Fields

• A class can declare instance field for which instance-specific storage is allocated

• getfield name type: push– E.g.: getfield Point/x I

• putfield: pop– putfield Point/x I

Branching

• Instructions to alter the control flow of the executing program– Unconditional

• goto: (3 bytes)• goto_w: 5 bytes

– Conditional branches• Comparison against 0: ifeq, ifne, iflt, ifle, ifgt,

ifge• Comparison of non-zero values: if_icmpeq,

if_icmpne, if_icmplt, if_icmple, if_icmpgt, if_icmpge

Static Method Calls

• Static methods: are common to all instances of some type t– E.g.: Math.pow(double a, double b)

• invokestatic – invokestatic java/lang/Math/pow(DD)D– Parameters pushed on the stack in left-

to-right order– 3 bytes

• Method signature: (DD)D

Instance-Specific Method Calls

• invokevirtual– invokevirtual

java/io/PrintStream/print(Z)V– An instance must be pushed on the

stack before the method’s parameters: this

Other Method Calls

• invokespecial• A constructor call is special

– An uninitialized reference to an object instance is pushed on TOS• <init> method

– No return value• Methods called by invokespecial are

dispatched based on the actual (runtime) type of the instance

• Invokespecial can also be used to invoke a private method, for efficiency reasons only

Stack Operations

• Instructions specifically for manipulating items near the TOS– To facilitate shorter instruction sequences

for common program fragments– dup2: duplicate the top two cells to

accommodate long and double types– dup: nicely accommodates multiple

assignments• x=y=z=value

– Pop, swap– dup_x1: in embedded assignment

Static Single Assignment Form

• (omitted)

Thanks for Your Attention!