Using Small-Scale History Data to Predict Large-Scale ...

31
Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application Wenju Zhou University of Science and Technology of China 2020.05.12

Transcript of Using Small-Scale History Data to Predict Large-Scale ...

Page 1: Using Small-Scale History Data to Predict Large-Scale ...

Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application

Wenju Zhou

University of Science and Technology of China

2020.05.12

Page 2: Using Small-Scale History Data to Predict Large-Scale ...

Outline

ustc 2

Background

Two-level model

Experiment results and analysis

Conclusions

Page 3: Using Small-Scale History Data to Predict Large-Scale ...

Background

Page 4: Using Small-Scale History Data to Predict Large-Scale ...

High Performance Computing (HPC)

ustc 4

- architecture:

computing nodes,

interconnect, ...

- usage:

physical simulations,

molecular modeling, ...

- effect:

provide computing power,

reduce experimental risk, ...fig 1. architecture of HPC

Page 5: Using Small-Scale History Data to Predict Large-Scale ...

Performance Modeling

ustc 5

∙ tuning

∙ scheduling

∙ energy saving

···

features

· system settings

· application input

· execution options

· compiler flags

···

Model

performance

· execution time

· memory

· throughput

· energy

···

fig 2. architecture of HPC

Page 6: Using Small-Scale History Data to Predict Large-Scale ...

Performance Modeling

ustc 6

categories desciption advantaes limitations

simulative methods simulating execution

by simulators

concept machine overhead

analytical methods analysis of code and

system by experts

accuracy &

Interpretability

overhead

empirical methods statistical analysis of

historical data

automation cold start

∙ system complexity

∙ application complexity

∙ interaction between

system and application

attention on

empirical methods

Page 7: Using Small-Scale History Data to Predict Large-Scale ...

Machine Learning in Performance Modeling

ustc 7

A technology to learn

knowledge and experience

from historical data.

A powerful approach for

empirical modeling

methods.

D. N. Hieu, T. T. Minh, T. Van Quang, B. X. Giang, and T. Van Hoai, “A machine learning-

based approach for predicting the execution time of cfd applications on cloud computing

environment,” in International Conference on Future Data and Security Engineering, pp. 40–52,

Springer, 2016

P. Malakar, P. Balaprakash, V. Vishwanath, V. Morozov, and K. Kumaran, “Benchmarking

machine learning methods for performance modeling of scientific applications,” in 2018

IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance

Computer Systems (PMBS), pp. 33–44, IEEE, 2018.

Machine Learning

Page 8: Using Small-Scale History Data to Predict Large-Scale ...

Extrapolation Issue of Machine Learning

ustc 8

- Extrapolation:

testset feature subspace outside trainset feature

subspace.

- Issue

high-accuracy interpolations but low-accuracy

extrapolations —— machine learning can achieve

high-accuracy predictions in trainset subspace, but

prediction accuracy reduces drastically outside

trainset subspace.

Page 9: Using Small-Scale History Data to Predict Large-Scale ...

Two-level Model

Page 10: Using Small-Scale History Data to Predict Large-Scale ...

Problem Description

ustc 10

),,...,,,(: X 321 pxxxx n

Features Label

y

execution time,

is known

from historical

logs, is to

be predicted.

input parameters,

Identically and

Independent

Distributed in

.and testtrain XX

number of

processors,

.cb

,X [c,d] in p

,X [a,b] in p

test

train

trainy

testy

Page 11: Using Small-Scale History Data to Predict Large-Scale ...

Machine Learning Mechanism

ustc 11

Approximate model:

Target model:

Independent and

Identically Distributed

(IID) hypothesis

Page 12: Using Small-Scale History Data to Predict Large-Scale ...

Motivation of Two-Level Model

ustc 12

Issues of one-level model

- Simple algorithms:

· own extrapolation ability in

some way

· cannot learn complex

relations between input

parameters and performance

- Sophisticated algorithms:

· learn accurate relations

between input parameters and

performance

· overfitting on small-scale data

Overview of two-level model

- Interpolation level:

learn accurate interpolation

model and predict small-scale

performance under input

parameters in

- Extrapolation level:

construct model owning

extrapolation ability and predict

with corresponding small-

scale performance predictions

testX

testy

Page 13: Using Small-Scale History Data to Predict Large-Scale ...

Workflow of Two-Level Model

ustc 13

fig 3. workflow of two-level model

Page 14: Using Small-Scale History Data to Predict Large-Scale ...

Prediction Specification

ustc 14

𝑋𝑡𝑒𝑠𝑡𝑖 ∶ (𝑥1𝑖 , 𝑥2𝑖 , 𝑥3𝑖 , . . . , 𝑥𝑛𝑖, 𝑝𝑖)

𝑋𝑡𝑒𝑠𝑡𝑖1 ∶ 𝑥1𝑖 , 𝑥2𝑖 , 𝑥3𝑖 , . . . , 𝑥𝑛𝑖, 𝑝 − 𝑙𝑖𝑠𝑡1

𝑋𝑡𝑒𝑠𝑡𝑖2 ∶ 𝑥1𝑖 , 𝑥2𝑖 , 𝑥3𝑖 , . . . , 𝑥𝑛𝑖, 𝑝 − 𝑙𝑖𝑠𝑡2

𝑋𝑡𝑒𝑠𝑡𝑖𝑡 ∶ 𝑥1𝑖 , 𝑥2𝑖 , 𝑥3𝑖 , . . . , 𝑥𝑛𝑖, 𝑝 − 𝑙𝑖𝑠𝑡𝑡

.

.

.

𝑌𝑡𝑒𝑠𝑡𝑖1

𝑌𝑡𝑒𝑠𝑡𝑖2...

𝑌𝑡𝑒𝑠𝑡𝑖𝑡

𝑦𝑡𝑒𝑠𝑡𝑖

interpolation

extrapolation𝑝 − 𝑙𝑖𝑠𝑡

Page 15: Using Small-Scale History Data to Predict Large-Scale ...

Analysis of Interpolation Level

ustc 15

- Task:

small-scale performance predictions as extrapolation

level training data

- Requirement:

· high accuracy

· random error distribution

fig 4. influence of error distribution

Page 16: Using Small-Scale History Data to Predict Large-Scale ...

Interpoaltion Level Model

ustc 16

· feature randomness

· sample randomness

· high accuracy

· random error distribution

fig 5. random forest

Page 17: Using Small-Scale History Data to Predict Large-Scale ...

Analysis of Extrapolation Level

ustc 17

- Task:

· predict large-scale performance with small-scale

predictions

- Chanllenge:

· extrapolation (scalability) —— scalability may change

with input parameters

· interpolation error —— only several data points for

every input parameter combinations, easy to overfit

causes error amplification

Page 18: Using Small-Scale History Data to Predict Large-Scale ...

Extrapolation Level Model

ustc 18

Performance Modeling Normal Form(PMNF):

Interpolation Error

Scalability

Multi-Task Lasso:

Page 19: Using Small-Scale History Data to Predict Large-Scale ...

Experiment Results & Analysis

Page 20: Using Small-Scale History Data to Predict Large-Scale ...

Experiment

ustc 20

Platform NodeMonte Carlo Benchmark (MCB) Features

Kripke Features

Page 21: Using Small-Scale History Data to Predict Large-Scale ...

Evaluation

ustc 21

Baseline· Random Forest

· Multi-Layer Perceptron

· EPMNF

· Log Rgression

Metrics· MAE

· MAPE

· RMSE

Page 22: Using Small-Scale History Data to Predict Large-Scale ...

Result: Comparison of Different Methods

ustc 22

· extrapolations are harder than interpolations

· the performance of the same method in diffrent applications

varies greatly

· two-level model perform better than one-level model

Page 23: Using Small-Scale History Data to Predict Large-Scale ...

Result: Comparison of Different Methods

ustc 23

fig 6. case study of different methods

Page 24: Using Small-Scale History Data to Predict Large-Scale ...

Result: Single-Task v.s. Multi-Task

ustc 24

Multi-Task Learning

· data amplification

· feature selection

fig 7. residuals of random forest

Page 25: Using Small-Scale History Data to Predict Large-Scale ...

Result: Single-Task v.s. Multi-Task

ustc 25

fig 8. case study of multi-task learning

Page 26: Using Small-Scale History Data to Predict Large-Scale ...

Result: Clustering or Not

ustc 26

K-means Cluster- distance:

- effect:

partition tasks into cluster by distance (relatedness) to

learn high-related tasks jointly.

reasons for insignificance· input sensitivity

· experimental feature values range

· clustering algorithm

Page 27: Using Small-Scale History Data to Predict Large-Scale ...

Result: Clustering or Not

ustc 27

fig 9. case study of clustering

Page 28: Using Small-Scale History Data to Predict Large-Scale ...

Conclusions

Page 29: Using Small-Scale History Data to Predict Large-Scale ...

Conclusions

ustc 29

- analyze the extrapolation problem and issues of one-

level model

- propose a two-level model to predict large-scale

performance with only small-scale historical data

- conduct experiment to validate the effectiveness of

two-level model

Page 30: Using Small-Scale History Data to Predict Large-Scale ...

Future Work

ustc 30

- improve two-level model by choosing more fitting

clustering and multi-task learing algorithms

- improve scalability models with considering system

information to model cross-platform performance

- research whether two-level model works for

extrapoaltion problem caused by input parameter

Page 31: Using Small-Scale History Data to Predict Large-Scale ...

Thanks.Welcome further communication.

Wenju Zhou

[email protected]