Learning A Better Compiler

Predicting Unroll Factors using Supervised Classification

Integrating CPU and L2 Cache Voltage Scaling using Machine Learning

Predicting Unroll Factors

• Loop Unrolling sensitive to unroll factor

• Current solution: expert design– Difficult: Hand-tuned heuristics– Must be rewritten frequently

• Predict parameters with machine learning– Easy: data collection takes ~1wk

• No human time

– Algorithm does not change with compiler

Loop Unrolling

• Combines multiple iterations loop body

• Fewer Iterations Less Branching

• Allows other transformations:– Exposes adjacent memory locations– Allows instruction reordering across

iterations

Unroll Factors

• How many iterations to combine?

• Too few?– Provides little benefit

• Too large– Increased cache pressure– Increase live rangeregister pressure

QuickTime™ and a decompressor

are needed to see this picture.

Optimal Unroll Factors

Classification Problems

• Input a vector of features– E.g. nest depth, # of branches, # of ops

• Output a class– E.g. unroll factor, 1-8

• No prior knowledge required– Meaning of features/classes– Relevance of features– Relationships between features

Nearest Neighbors

• Paper describes Kernel Density Estimator

• All dimensions normalized to [0,1]

• Given a test point p:– Consider training points “close” to p

• Within fixed distance, e.g. 0.3

– Majority vote among qualifying training points

Nearest Neighbors

Support Vector Machine

• Assume two classes, easily generalized

• Transform data– Make classes linearly separable

• Find line to maximize sep. margin

• For test point:– Perform transformation– Classify based on learned line

Maximal Margin

Non-Linear SVM

Some Features

• # operands• Live range size• Critical path length• # operations• Known tripcount• # floating point ops• Loop nest level• # branches

• # memory ops• Instruction fan-in in

DAG• # instructions• Language: C, fortran• # memory ops• # Implicit instructions• & more (38 total)

Results: No Software Parallelism

Results: With Software Parallelism

Big Idea: Easy Maintenance• Performance improvements modest

– Sometimes worse, sometimes much better– Usually little change

• Requires no re-tuning to change compiler– Gathering data takes ~1wk, no human time

• General mechanism– Can be applied to all parameters– No model of system needed

• Can be applied to new transformations where expert knowledge is unavailable

Integrated CPU and L2 Cache Voltage Scaling using Machine

Learning

Dynamic Voltage Control

• Monitor system

• When activity is low, reduce power– Also reduces computational capacity– May need more energy if work takes longer

Multiple Clock Domains

• Adjust separate components independently

• Better performance/power– E.g. CPU-bound application may be able to

decrease power to memory and cache without affecting performance

• More complex DVM policy

Motivation

• Applications go through phases

• Frequency/voltages should change too

• Focus on core, L2 cache– Consume large fraction of total power

• Best policy may change over time– On battery: conserve power– Plugged in: maximize performance

Learning a DVM Policy

• Compiler automatically instruments code– Insert sampling code to record perf. Counters– Instrument code only to gather data

• Use machine learning to create policy

• Implement policy in microcontroller

ML Parameters

• Features– Clock cycles per instruction– L2 accesses per instruction– Memory access per instruction

• Select voltage to minimize:– Total energy– Energy*delay

Machine Learning Algorithm

• Automatically learn set of if-then rules– E.g: If (L2PI >= 1) and (CPI <=0) then

f_cache=1GHz

• Compact, expressive

• Can be implemented in hardware

Results

• Compared to independently managing core and L2:– Saves 22% on average, 46% max

• Learns effective rules from few features• Compiler modifications instrument code• Learned policy offline• Implemented policy in microcontroller

Conclusion

• Machine learning derives models from data automatically

• Allows easy maintenance of heuristics

• Creates models that are more effective than hand-tuned

Learning A Better Compiler

Documents

Transcript of Learning A Better Compiler

Better Learning Through Structured Teaching

Better Relationships, Better Learning, Better Behaviourperthhigh.net/wp-content/uploads/2018/06/Better... · Better Relationships, Better Learning, Better Behaviour 4 | P a g e V7

Better Driving via Machine Learning

Machine Learning Algorithms for Choosing Compiler Heuristicsgpekhime/Thesis/MScThesis.pdf · Abstract Machine Learning Algorithms for Choosing Compiler Heuristics Gennady G. Pekhimenko

Better Learning, Better Teaching, Better Schools

Machine Learning in Compiler Optimisation

CS 132 Compiler Constructionpalsberg/course/cs132/lec.pdf · Compiler construction is a microcosm of computer science artiﬁcial intelligence greedy algorithms learning algorithms

Hummingbird: A Tensor Compiler for Uniﬁed Machine Learning ...

Writing better code withhelp from the compiler · help from the compiler ... •Use library, compiler and linker tools. 26 ... Writing better code withhelp from the compiler ...

Learning Analytics: Making learning better?

Intel High Level Synthesis Compiler Standard Edition ...Getting Started Guide Get up and running with the Intel HLS Compiler by learning how to initialize your compiler environment

Make Learning Better

Better Teaching Better Learning Better Achievement For All

Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning

Technology for better learning?

Synergies for better learning

Jason Brownlee -Better deep learning

Technology as Catalyst for Better Learning, Better Teaching

Writing better code with help from the compiler · Writing better code with help from the compiler Thiago Macieira LinuxCon North America – August/2014

Better Learning for Europe - LU