Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou...

39
Machine Learning for Computer Systems Henry Hoffmann

Transcript of Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou...

Page 1: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Machine Learning for Computer Systems

Henry Hoffmann

Page 2: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Machine Learning Overview

1

• What kind of answers do we want?

• What kind of data can we gather?

• What linear algebra do we use?

• How do you formulate the problems so you don’t have to “stir” too much?

Page 3: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Example Problem Formulation

2

Meet latency constraints with minimal energy via system

configs

Require the power and performance profile for applications

Learn to estimate these values

Config: an

allocation of

hardware

resources to

an

application

Page 4: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Example of a System Configuration Space 𝑪

3

2.26 Hz

Memory Controller 1

Memory Controller 2

Clock Speed

Cores

Memory controller

Page 5: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Machine Learning Overview II

4

• We want to drive the system to a certain power or performance

• We want to build a model that can map measured features to a configuration

• Configuration should achieve the target

• Note: ignoring reinforcement learning for now

PerformancePowerconfiguration

Learned Model

Targets

Data, measurements

Page 6: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

What Data Can We Measure?

5

• High-level outcomes:• Metrics of importance to architecture users: throughput, latency, energy, etc

• Low-level outcomes:• Metrics of importance to architects: IPC, MPKI, Branch mispredicts, etc

• Parameters:• The things in the architecture that we can change to affect outcomes

Page 7: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

What Answers Can We Obtain?

6

• Prediction:• Given some subset of measurements, what values will the remaining measurements

take?

• Structure:• How do the measured values interact and affect each other?

• A note on correlation vs causation:• Correlations may be helpful for prediction accuracy

• Causal relationships should be more robust but require structural learning

Page 8: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

What Linear Algebra Can We Use?

7

• Too many options!

• Some examples to come:• Regression

• Regularized Regression

• Recommender Systems/Matrix Completion

• Reinforcement Learners: Neural Networks and others

• Evolutionary Learners

Page 9: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Regression ModelingLee and Brooks ASPLOS 2006

8

PerformancePower

Learned Model:

Weights for each Feature

No Target

Microarchitectural Parameters

Performance:Low-level outcome

Features:Architectural Parameters

Model:Weights on the Parameters

IPC L1Size

LLC Size

Issue Width

Register FileSize

w0w1w2w3

Problem: Reduce number of simulations required during microarchitectural design time by predicting performance

Page 10: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Regression Modeling with too Many FeaturesZhu and Reddi HPCA 2013

9

PerformancePower

configuration:big or LITTLE core

clockspeed

Learned Model:

Weights for each Feature

1s LatencyMinimize Energy

Web page content

Latency Tag DOMnodes

ImageSizes

Image

w0w1w2w3

Problem: Energy efficient rendering of webpages on mobile devices

Page 11: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Regression Modeling with too Many FeaturesZhu and Reddi HPCA 2013

10

PerformancePower

configuration:big or LITTLE core

clockspeed

Learned Model:

Weights for each Feature

1s LatencyMinimize Energy

Web page content

Latency Tag DOMnodes

ImageSizes

Image

w0w1w2w3

Problem: Energy efficient rendering of webpages on mobile devices

𝑘=0𝑛 𝑤𝑖

2 ≤ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑Regularized regression adds constraints and turns this into an optimization problem

Page 12: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Reinforcement LearningBitirgen et al. MICRO 2010

11

Performance/Power

configuration:processor speed,

cache capacity,memory bandwidth

Learned Model:

Performance Predictions

Minimize Energy

Features: Low-Level Outcomes & Parameters

Problem: Find the most energy efficient configuration of a processor for multiapp workloads

Page 13: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Reinforcement LearningBitirgen et al. MICRO 2010

12

Page 14: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Recommender SystemsDelimitrou and Kozyrakis ASPLOS 2013,2014

13

PerformancePower

configuration:processor type

co-location

Learned Model:

Application Preferences

QoSMinimize Cost

Other applications’ high-level outcomesSmall number of new application’s high-level outcomes

Problem: Assignment of jobs to machines in a heterogeneous datacenter

Page 15: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Recommender Systems

14

1 3 5 5 4

5 4 4 2 1 2

2 4 1 2 3

4 3 5 2 4 5

4 2 4 2 4 1

2 4 1 3 2 3

3 4 2 2 5 3

1 3 3 2 4 2

Movie

sUsers

?

Ratings 1 to 5

Unknown ratings

Page 16: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Recommender Systems

15

1 3 5 5 4

5 4 4 2 1 2

2 4 1 2 3

4 3 5 2 4 5

4 2 4 2 4 1

2 4 1 3 2 3

3 4 2 2 5 3

1 3 3 2 4 2

Movie

sUsers

?

Performance/Power:

high-level outcomes

Unknown value

Ratings 1 to 5

Unknown ratings

Page 17: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Reinforcement LearningIpek et al. ISCA 2008

16

Utilization

configuration:next command

to issue

Learned Model:

Reward for taking an action in a given state

Maximize Utilization

Features: Memory Bus Utilization

Problem: Schedule DRAM commands to maximize utilization

Page 18: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Reinforcement LearningIpek et al. ISCA 2008

17

Page 19: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Summary of Examples

18

Approach Inputs Outputs Key Technique References

Regression MicroarchitecturalParameters

PerformancePower

Cubic Regression with Splines

Lee and Brooks.ASPLOS 2006

Regularized Regression Webpage Features PerformancePower

LassoElasticNetCubic Regression

Zhao and Reddi.HPCA 2013

Neural NetworkRegression

Performance,Miss Rates,Resource Allocation

AggregatePerformance for Application Mix

Neural Network Bitirgen, Ipek, and Martinez. MICRO 2010.

Recommender Systems

High-level Outcomes Performance Nuclear NormMatrix Completion

Delimitrou and Kozyrakis. ASPLOS 2013 & 2014.

ReinforcementLearning

Memory-busUtilization

Memory-busUtilization

SARSA model Ipek et al. ISCA 2008

Also: Penney and Chen.“A Survey of Machine Learning Applied to Computer Architecture Design.” arXiv 2019

Page 20: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

A Note on Overhead

19

• Overhead is not monolithic

• Two components of overhead:• Number of samples required

• Computation per sample

• These generally work against each other• Fewer samples more work to extract meaning from samples

• Less computation per sample more samples needed to learn

Page 21: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Summary of Examples

20

• Diversity• Inputs could be any of High-, Low-level features, Parameters

• Same for outputs

• All use different linear algebra

• For each example, we could find another paper that solves the same problem with a different underlying technique

• Commonality• Predictions are only part of solving systems problems

• Predictor is a module that is used by the rest of the system to make decisions

• Predictor is not aware of the underlying problem structure

Page 22: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Predicting vs. Structural LearningDing et al. ISCA 2019

21

Model A Model B

Built for Prediction Structure

Optimal points Just far enough True data

Non-optimal points True data Very far

Goodness of fit 99% 0

Energy over optimal 22% ❌ 0 ✅

Page 23: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

22

Model A Model B

Built for Prediction Structure

Optimal points Just far enough True data

Non-optimal points True data Very far

Goodness of fit 99% 0

Energy over optimal 22% ❌ 0 ✅

Key Insight:

High accuracy != good system

results

Predicting vs. Structural LearningDing et al. ISCA 2019

Page 24: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Case Study

23

• Problem:• Meet latency with minimal energy through resource management

• Learning:• Use published methods to estimate latency and power

• Mostly recommender based, but also Bayesian methods

• Structure:• Constrained optimization problem

• Known Issues• Improve accuracy with small data sets

Page 25: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Generating Data with a GMMC

om

pu

ter

Syste

m C

on

figu

ratio

ns

Known

Applications

Divide Known Data

24

Page 26: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Generating Data with a GMMC

om

pu

ter

Syste

m C

on

figu

ratio

ns

Known

Applications

Divide Known Data Learn GMMs

Behavior

Den

sity

Behavior

Den

sity

25

Page 27: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Generating Data with a GMMC

om

pu

ter

Syste

m C

on

figu

ratio

ns

Known

Applications

Divide Known Data Learn GMMs

Behavior

Den

sity

Behavior

Den

sity

Behavior

Den

sity

BehaviorD

en

sity

Swap Max and Min

26

Page 28: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Generating Data with a GMMC

om

pu

ter

Syste

m C

on

figu

ratio

ns

Known

Applications

Divide Known Data Learn GMMs

Behavior

Den

sity

Behavior

Den

sity

Swap Max and Min

Behavior

Den

sity

BehaviorD

en

sity

Generate new data

27

Page 29: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Generating Data with a GMM

28

Com

pu

ter

Syste

m C

on

figu

ratio

ns Known

Applications

Divide Known Data Learn GMMs

Behavior

De

nsity

Behavior

Den

sity

Swap Max and Min

Behavior

De

nsity

Behavior

Den

sity

Generate new data

Known

Applications

Concatenate

New

Applic

ation

Page 30: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Multi-phase SamplingC

om

pute

r S

yste

m C

onfigura

tions

Known Applications

New

Applic

ation

Matrix Completion with

Sample Size N/2

29

Input: Configuration-Application data matrix, Sampling budget N

Page 31: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Multi-phase SamplingC

om

pute

r S

yste

m C

onfigura

tions

Known Applications

New

Applic

ation

Matrix Completion with

Sample Size N/2Estimated

Behavior for New

Application

30

Input: Configuration-Application data matrix, Sampling budget N

Page 32: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Multi-phase SamplingC

om

pute

r S

yste

m C

onfigura

tions

Known Applications

New

Applic

ation

Matrix Completion with

Sample Size N/2

Estimated

Behavior for New

Application

3

4

1

5

2

6

8

7

Select N/2

Best Configs

31

Input: Configuration-Application data matrix, Sampling budget N

Page 33: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Multi-phase SamplingC

om

pute

r S

yste

m C

onfigura

tions

Known Applications

New

Applic

ation

Matrix Completion with

Sample Size N/2

Estimated

Behavior for New

Application

3

4

1

5

2

6

8

7

Select N/2

Best Configs

New

Applic

ation

Known Applications

Com

pute

r S

yste

m C

onfigura

tio

ns

Matrix Completion with N/2 original

samples and N/2 estimated best configs

32

Input: Configuration-Application data matrix, Sampling budget N

Page 34: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Experimental Setup

33

Mobile Server

System Ubuntu 14.04 Linux 3.2.0 system

Architecture ARM big.LITTLE Intel Xeon E5-2690

# Applications 21 22

# Configurations 128 1024

Page 35: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Learning Models and Frameworks

34

Learning Models Category

MCGD MC

MCMF MC

Nuclear MC

WNNM MC

HBM Bayesian

First comprehensive study of

matrix completion (MC)

algorithms for systems

optimization task

Page 36: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Learning Models and Frameworks

35

Learning Models Category

MCGD MC

MCMF MC

Nuclear MC

WNNM MC

HBM Bayesian

Frameworks Definitions

Vanilla Basic learners

GM Generative model

MP Multi-phase sampling

MP-GM Combine GM and MP

First comprehensive study of

matrix completion (MC)

algorithms for systems

optimization task

Page 37: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Improve Prediction Accuracy w/ GM

Mobile Serve

rAverage percentage points of accuracy improvement

36

High

is

Better

Page 38: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Improve Energy Savings w/ MP

Mobile Server

Average energy improvement

37

Lowe

r is

Better

Page 39: Machine Learning for Computer Systemsiacoma.cs.uiuc.edu/mcat/ml.pdfRecommender Systems Delimitrou and Kozyrakis ASPLOS 2013,2014 13 Performance Power configuration: processor type

Summary

38

• Applying ML to systems requires:• Data

• Answers

• Linear Algebra

• Many Examples of Learning for Systems:• Huge diversity of techniques

• Common thread is that predictions by themselves are not enough

• Structure:• Often constrained optimization problem

• Structure of many systems problems means that:• Accuracy does not provide better systems results

• Understanding the structure will lead to better outcomes