Optimizing Compilers CISC 673 Spring 2011 Inlining

Post on 24-Feb-2016

39 views 1 download

description

Optimizing Compilers CISC 673 Spring 2011 Inlining. John Cavazos University of Delaware. Background. Inlining is important Removes call overhead Enables optimization opportunities Can be detrimental Increased compilation time Increased register pressure Cache effects. - PowerPoint PPT Presentation

Transcript of Optimizing Compilers CISC 673 Spring 2011 Inlining

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2011Inlining

John CavazosUniversity of Delaware

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Background Inlining is important

Removes call overhead Enables optimization opportunities

Can be detrimental Increased compilation time Increased register pressure Cache effects

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Interprocedural Optimization Some optimizations are disrupted

by calls Constant propagation might

stop at call site Possible solution: interprocedural

optimization Optimization that involves more

than one function Gets complicated (e.g., when

functions not in same file)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Replace a function call with body of

called function Assumed to be beneficial to a certain

point Enables optimizations

Constant folding, Common subexpression elimination, better global register allocation

Optimizations can outweigh call overhead reduction

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Advantages Eliminates call disruption

No register save/restore required

Call overhead removed Allows context-specific tailoring Eliminates call barrier for

analysis/optimizations

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Disadvantages Eliminates benefits

Resets state for register allocation Increase register pressure

Procedure calls (reuse) keep code size small

Compilation time increases Larger functions

Code bloat

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining for Object Oriented Plays a particular important role

in optimization of OO languages High ratio of calls (and

overhead) Many methods are short

(e.g., setter/getter) Issues mapping virtual calls to

concrete implementations Requires inserting a run-time

type test

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining example

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining example

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Transformation Easy Actual transformation is easy

Rewrite call site with callee’s body

Rewrite formal parameter names with actual parameter names

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Decision Hard Resource constraint decision Code size

must whole program and procedure

Excessive code growth leads to excessive compilation time (important for JITs!)

Profitability depends on specific context Can callee be tailored and

optimized Each decision affects profitability

and resources available later!

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Decision Hard Consider following call graph

Assign each edge a type {inline, no-inline} Choice at each edge affects other

decisions Each decision has a profit and a cost

(in terms of resources)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Inlining Decision Procedures Some decisions are obvious Inline small procedures

Code smaller than linkage Inline procedures called only once Still lots of experimental work to do!

Cavazos 2005, Waterman 2006 Cooper, Hall, & Torczon or Davidson

& Holler

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Adaptive Decision Making How should we determine a good

decision heuristic? Cavazos proposed an adaptive

solution Train a heuristic

Specialized for a given hardware or benchmark

Prior Art Ad hoc (manually-constructed) heuristic

based on program properties Combine ad hoc heuristics into a single a

single test applied at each call site – applied in a fixed order

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Proposed Solution Use machine learning Features predict which methods

to inline Heuristic function controls

inlining Tune heuristic to :

Different compilation scenario Different architecture

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Applying Genetic Algorithms Cross-validation

Evolve heuristic over set of benchmarks

Test on a different set of benchmarks

Average high performance Self-validation

Evolve heuristic for one benchmark

Best performance for benchmark

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

High Performance Compiler IBM Jikes RVM• Java JIT Compiler• Tuned for Server Applications

Commercial quality Used by Several Hundred Researchers Over 100 Publications Several papers on Inlining

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Default Inlining Heuristic Small methods

Always inline Medium-sized methods

Use static heuristic (IBM) Large methods

Never inline

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Default Inlining Heuristicif (calleeSize >

CALLEE_MAX_SIZE)return NO

if (calleeSize < ALWAYS_INLINE_SIZE)return YES

if (inlineDepth > MAX_INLINE_DEPTH)return NO

if (callerSize > CALLER_MAX_SIZE)return NO

return YES

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Genetic Algorithms Tune parameters of IBM

heuristic Individual

Vector of Integers Fitness is benchmark running

time Tuning time

Few hours per benchmark Few days per suite

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Parameters Tuned by GA

Metric to Evaluate an Individual

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Genetic Algorithms Primer

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Scenarios and Metrics Scenarios

Adaptive Optimizing

Metrics Running Time Total Time

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Experimental Setup High-Performance Java compiler

Jikes RVM 2.3.3 Intel Pentium 4, 2.6 GHz PowerPC G4, 500 MHz (not shown) Training Set

SPEC JVM benchmarks Test Set

DaCapo benchmarks + SPEC JBB

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Adaptive Scenario(SPEC JVM98)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Adaptive Scenario(DaCapo+JBB)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing Scenario(SPEC JVM98)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing Scenario(DaCapo+JBB)

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Self-Tuned Results

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Conclusions

Out-performs well-tuned heuristic 37% total time reduction on Intel 7% total time reduction on PowerPC

Automatically tunes compiler heuristic Compilation Scenario Different Architectures