RAMSES: Robust Analytical Models for Science at Extreme …

18
RAMSES: Robust Analytical Models for Science at Extreme Scale Presenter: Raj Kettimuthu (Argonne) PI: Ian Foster (Argonne) Co-PIs: Gagan Agrawal (Ohio State), Nagi Rao (ORNL), Brad Settlemyer (LANL), Brian Tierney (LBL), and Don Towsley (UMass)

Transcript of RAMSES: Robust Analytical Models for Science at Extreme …

Page 1: RAMSES: Robust Analytical Models for Science at Extreme …

RAMSES: Robust Analytical Models for Science at Extreme Scale

Presenter: Raj Kettimuthu (Argonne)

PI: Ian Foster (Argonne)Co-PIs: Gagan Agrawal (Ohio State), Nagi Rao (ORNL), Brad Settlemyer (LANL), Brian Tierney (LBL), and Don

Towsley (UMass)

Page 2: RAMSES: Robust Analytical Models for Science at Extreme …

Project Overview

Experiments

Database

Modeling

Estimation

Advisor

Estimators

Evaluators

Tester

Tools Develop easy-to-use tools to provide end-users with actionable advice

Develop and apply data-driven estimation methods: differential regression, surrogate models, etc.

Develop, evaluate, and refine component and end-to-end models

Conduct extensive, automated experiments to test models and build database

Page 3: RAMSES: Robust Analytical Models for Science at Extreme …

Exemplar Science Workflows

§ Five science workflows § Span a broad range of DOE science domains and modeling

problems§ File Transfer§ Light Source Workflows

– Tomographic Reconstruction, Diffuse Scattering § Distributed MapReduce§ In-situ Analysis§ Exascale Simulations

Page 4: RAMSES: Robust Analytical Models for Science at Extreme …

TCP Throughput Profiles

tconcaveregion

convexregion

RTT - ms

Thro

ughp

ut -

Gbps

§ Most common TCP throughput profile convex function of rtt

§ Observed dual-mode profiles: emulated 0-366ms rttconnections

– CUBIC, STCP: Smaller RTT - Concave region, Larger RTT- Convex region

§ Concave regions very desirable – Throughput does not decay as fast, rate of decrease slows down as rtt

Page 5: RAMSES: Robust Analytical Models for Science at Extreme …

Models of single TCP connections

STCP CUBIC

• Models account for increase/decrease rules, rtt, link capacity, max receive window size

• Validation against measurements• Useful for selecting best version and troubleshooting• Future directions: other versions, e.g., UDT, multiple

connections, account for I/O interactions

Page 6: RAMSES: Robust Analytical Models for Science at Extreme …

UDP-Based Transport: UDT

§ For dedicated 10G links, UDT provides higher throughput than CUBIC (linux default) § TCP and UDT Throughput transition-point depends on connection parameters –rtt, loss rate, host – NIC parameters, IP and UDP parameters§ Disk-to-Disk transfers (xdd) have lower transfer rate

xdd-read

xdd-write

CUBICSingle stream

UDT

Page 7: RAMSES: Robust Analytical Models for Science at Extreme …

Data Driven Models for File Transfer

§ Combines historical data with a correction term for current external load

§ Takes three pieces of input § Signature for a given transfer

– Concurrency level – Total known concurrency at source (“known load at source”) – Total known concurrency at destination (“known load at destination”)– File Size

§ Historical data – Transfer concurrency, known loads, and observed throughput for the

source-destination pair § Signatures and observed throughputs from the most recent

transfers for the source-destination pair § It produces an estimated throughput as an output.

Page 8: RAMSES: Robust Analytical Models for Science at Extreme …

Data Driven Models for File Transfer

§ Transfer Scheduling Algorithms– SEAL: Schedule transfers minimize average transfer slowdown– STEAL: Minimize slowdown for best-effort transfers and maximize

bandwidth utilization for batch transfers

Destination ≤1GB >1GB, ≤10GB >10G Overall

Gordon 4.26 9.3 6.67 8.31

Mason 3.55 9.4 8.22 8.76

Yellowstone 2.78 8.0 8.1 6.84

Blacklight 5.96 4.41 5.27 4.93

Darter 7.70 4.03 2.63 4.73

Page 9: RAMSES: Robust Analytical Models for Science at Extreme …

SEAL Evaluation – Turnaround Time 60% Load

Page 10: RAMSES: Robust Analytical Models for Science at Extreme …

Modeling In-situ Analysis

§ How often should we perform the analyses?

§ How often should the analyses output be written?

§ Analyses parameters– Time (Initialization, Auxiliary, Output)– Memory (Fixed, Auxiliary, Output)– Minimum interval between consecutive

steps– Importance – Threshold time for analyses

10

§ System parameters– I/O bandwidth– Rate of computation– Available memory

Problem Size

Net

wor

k ba

ndw

idth

/Pr

oces

s cou

nt

Page 11: RAMSES: Robust Analytical Models for Science at Extreme …

Results: Scheduling Analyses within Threshold

TotalThreshold (sec)

R1 (Radius of gyration)

R2 (Membrane density profile 2D histogram)

R3 (Protein density profile 2D histogram)

% within threshold

200 10 4 7 94.59

100 10 2 3 85.99

60 10 1 2 86.01

20 10 1 0 86.11

10 10 0 0 0.3

Table: Analysis frequencies, analysis times, and corresponding thresholds for 1 billion atoms rhodopsin simulation (1000 steps) in LAMMPS on 32768 cores (2048 nodes) of Mira.

Simulation: Rhodopsin protein benchmark, which consists of a protein embedded in a membrane and solvated with water and ions using LAMMPS.

11

Observation: More than 80% of the allowed threshold is used for analyses, when threshold > 20 s.

R1 R2 R3

Time

Mem

ory

Page 12: RAMSES: Robust Analytical Models for Science at Extreme …

Two In-Situ Modes

12

Time Sharing Mode: Minimizes memory consumption

Space Sharing Mode: Enhances resource utilization when simulation reaches its scalability bottleneck

§ Model computational part (MapReduce-like processing)§ Model memory

– Data locality between simulation and analytics (initial work only)

Page 13: RAMSES: Robust Analytical Models for Science at Extreme …

Performance Modeling with Disk Model for K-means with MATE(File Size = 1GB, K = 50, Num of Iterations = 1)

Modeling Computational Component in MATE/Smart

Page 14: RAMSES: Robust Analytical Models for Science at Extreme …

Modeling Computation Time for Parallel Tomographic Reconstruction

§ Computation– Number of intersected rays, and

horizontal and vertical linest x col^2 x (|sin(θ)|+|cos(θ)|)

14

0100002000030000400005000060000700008000090000

100000

0 10 20 30 40 50 60 70 80 90 100

110

120

130

140

150

160

170

Horizontal Vertical Total

0

0.005

0.01

0.015

0.02

0.025

65000

70000

75000

80000

85000

90000

95000

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

Estimated Real Error Ratio

P0

P2

P1

…P n

T0

T1

… T n

T2

Page 15: RAMSES: Robust Analytical Models for Science at Extreme …

Estimated Execution time vs. Real Reconstruction Time

RAMSES Meeting

15

0

500000000

1E+09

1.5E+09

2E+09

2.5E+09

3E+09

3.5E+09

4E+09

4.5E+09

0

100

200

300

400

500

600

700

80012

8

256

384

512

640

768

896

1024

1152

1280

1408

1536

1664

1792

1920

2048

Real Time

Estimated Exec Time (wrt 2K)

Estimated Computation

Page 16: RAMSES: Robust Analytical Models for Science at Extreme …

Questions?

Page 17: RAMSES: Robust Analytical Models for Science at Extreme …

Additive increase and additive decrease (AIAD) for optimal stream

Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"17

For every step c (fixed number of epoch), do the following:

Page 18: RAMSES: Robust Analytical Models for Science at Extreme …

Tubes (ANL) to DMZ (UChicago)

Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"18