Smart Data Structures

23
SAN FRANCISCO, CA, USA Smart Data Structures Jonathan Eastep David Wingate Anant Agarwal 06/3/2012

description

Smart Data Structures. Jonathan Eastep David Wingate Anant Agarwal. 06/3/2012. Multicores are Complex!. The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings depend - PowerPoint PPT Presentation

Transcript of Smart Data Structures

Page 1: Smart Data Structures

SAN FRANCISCO, CA, USA

Smart Data Structures

Jonathan EastepDavid WingateAnant Agarwal

06/3/2012

Page 2: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Multicores are Complex!

• The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings

depend Application inputs and workloads can be

dynamic Online tuning is necessary but typically

absent

2

Page 3: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

The Big Picture

• Developed a dynamic optimization framework to auto-tune software and minimize burden

• Framework is based on online machine learning technologies

• Demonstrated the framework by designing “Smart Data Structures” for parallel programs

• The framework is general; could apply to systems such as Clouds, OS, Runtimes

3

Page 4: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Smart Data Structures

• Smart Data Structures are parallel data structures that self-optimize to minimize programmer burden

• They use online machine learning to adapt to changing app or system needs and achieve the best performance

• A library of Smart Data Structures open sourced on github (GPL)– github.com/mit-carbon/Smart-Data-Structures

• Publications: [1], [2], [3], [4]

4

Page 5: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

0.0 0.3 0.6 1.0 1.3 1.6 2.0 2.3 2.6 3.0 3.3 3.6 4.0

0.6

0.8

1

1.2

1.4

1.6 x 106

T im e (s e c o nd s )Heart

rate (

beats

per s

econ

d / 1e

6)

O ptima lS martloc kP rio rity lo c k: po lic y 1P rio rity lo c k: po lic y 2S p in!loc k: R eac tiv e L oc kS pin!L oc k: Tes t and S e t

W o rklo ad #1 W o rklo ad #2 W o rklo ad #1

(Item

s pe

r sec

ond)

/ 1e

6A Sketch of The Benefits of SDS• Use a Smart Lock to optimize a master-worker

program Measure rate of completed work items Emulate dynamic frequency scaling due to Intel Turbo Boost® Workload 1: Worker 0 @ 3GHz, others @ 2GHz Workload 2: Worker 3 @ 3GHz, others @ 2GHz

5

gap

IdealSmartlock

Baseline

Page 6: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Outline

• Smart Data Structures Anatomy of a Smart Data Structure Implementation Example Research Challenges and Solutions Online Machine Learning Algorithm Empirical Benchmark Results Empirical Scalability Studies Future Directions Conclusions

6

Page 7: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

What are Smart Data Structures?• Self-aware computing applied to data

structures• Data Structures that self-optimize using

online learning

• We can optimize knobs in other systems too7

• automatically• at runtime

knobs • self-tuned

Storage

AlgorithmInterfac

e• add• remove• peek

Smart DataStructure

E.g. Smart Queue

t1 t2 tn…

knobs • hand-tuned• per system• per app

DataStructure

E.g. Queue

t1 t2 tn…• static

Online LearningStorage

Algorithm

• add• remove• peek

Interface

Page 8: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Smart Data Structure Library

• C++/C Library of Popular Parallel Data Structures

8

• Supported:– Smart Lock– Smart Queue– Smart SkipList– Smart PairHeap– Smart Stack

• Future Work:– Smart DHT

• ML Optimization Type:Lock Acquisition SchedulingTuning Flat Combining

Dynamic Load-Balancing

Page 9: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Smart Queue, SkipList, PairHeap, Stack

• Implementation should leverage best-performing prior work• What are the best? Determine with experiments.• Result: Flat Combining Data Structures from

Hendler et al.• This is contrary to conventional wisdom

• Reason: FC Algorithm minimizes synchronization overheads by combining data structure ops and applying multiple ops at once

9

Page 10: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Serial Data Structure

enq cenq benq a

enq cenq b

enq d

Flat Combining Primer

10

enq a enq b enq c

Lock

WorkingWorking Working

enq d

CombiningWorking

Scancount3210!!!

Page 11: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Smart Queue, SkipList, PairHeap, Stack

• Here the application of learning is to auto-tune a performance-critical knob called the scancount

11

Interface

SmartQueue

Lock

Thread Request

Scancount

Serial QueueE.g.:

• enqueue• dequeue knobs

• number of scans over request records• peek

t1 t2 tn…

ReinforcementLearning

(of a discrete variable)Records

• dynamically tune the time spent combining

Page 12: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Why Does the Scancount Matter?

• Scancount controls how long threads spend as the combiner

• Increasing scancount allows combiner to do more data structure ops within the same lock

• But, increasing scancount increases latency of the combiner’s op

• It’s good to increase scancount up to a point, but after that latency can hurt performance

• Smart Data Structures use online learning to find the ideal scancount at any given time

12

Page 13: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

SDS Implementation

• Goal: minimize application disruption

• Internal lightweight statistics or external application-specific reward signal

• Number of learning threads is one by default; it runs learning engines for all SDS 13

throughput (ops/s)

ExternalPerf. MonitorE.g. Heartbeats

ApplicationThreads

Storage

AlgorithmInterfac

e

Smart DataStructure

E.g. Smart Queue

Online Learning

• add• remove• peek

Rewardstat

t1 t2 tn…

LearningThread

Page 14: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

SDS Implementation

– Machine learning co-optimization framework– Supports joint optimization: multiple

knobs– Supports discrete, gaussian, boolean,

permutation knobs– Designed explicitly to support other

systems than SDS

14

Page 15: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Major SDS Research Challenges

1. How do you find knob settings with best long-term effects?

2. How do you measure if a knob setting is helping?

3. How do you optimize quickly enough to not miss opportunities?

4. How do you manage a potentially intractable search space?

15

Quality Challenge

s

Timeliness

Challenges

Page 16: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Addressing Other Quality Challenges

1. How do you find settings with best long-term effects?

Leverage one of the machine learning technologies for planning

Use online RL to adapt to workload or phase changes

2. How do you measure if a knob setting is helping?

Extensible reward signal interface for performance monitoring

Heartbeats Framework for application-specific perf. evaluations

16

Page 17: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Addressing Timeliness Challenges

3. How to optimize fast enough not to miss opportunities? Choose a fast gradient-based machine learning algorithm Use learning helper thread to decouple learning from app

threads

4. How to manage potentially intractable search space? Relax potentially exponential discrete action space into

continuous one Use a stochastic soft-max policy which enables gradient-based

learning17

Burberry

“Sorry I’m late dear…have you been waiting long?”

Page 18: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Reinforcement Learning Algorithm

• Goal: optimize rate of reward (e.g. heart rate)• Method: Policy Gradients Algorithm

Online, model-free, handles exponential knob spaces

Learn a stochastic policy which will give a probability distribution over knob settings for each knob

Sample settings for each knob from the policy, try them empirically, and listen to performance feedback signal

Improve the policy using a method analogous to gradient ascent• I.e. estimate gradient of the reward wrt policy and

step policy in the gradient direction to get maximum reward

• Balance exploration vs. exploitation + make policy differentiable via stochastic soft-max policy

18

Page 19: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

How Does SDS Perform? • Full sweep over SDS, load: compare against

Static Oracle• Result: near-ideal performance in many cases

• Result: Quality Challenge is met

19

0

500

1000

1500

Smart Queue Ideal StaticSDS DynamicAvg Static

Post Computation (ns)

Thro

ughp

ut (o

ps/m

s)

0200400600800

1000

Smart Pair Heap

Post Computation (ns)†14 threads

Static AvgSDS DynamicStatic Oracle

0200400600800

1000

Smart Skip List

Post Computation (ns)

Page 20: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

What if Workload Changes Rapidly?

• Inject changes in the data structure “load” (i.e. post computation between ops)

• Sweep over SDS, random load schedules, frequencies• Result: Good benefit even when load changes every

10μs

• Result: Quality and Timeliness Challenges are met 20

1/10000

1/1000 1/100 1/100

200400600800Smart Pairing Heap: Sched.

1

Interval Frequency (1/µs)

1/10000

1/1000 1/100 1/100

200400600800

Smart Skip List: Sched. 1

Interval Frequency (1/µs)

0200400600800

1000Smart Queue: Sched.

1

Interval Frequency (1/µs)

Thro

ughp

ut

(ops

/ms)

Dynamic AverageSDS DynamicDynamic Oracle †14 threads

Page 21: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Future Directions

• Extend this work to a common framework to coordinate tuning across all system layers E.g.: application -> runtime -> OS -> HW Scalable, decentralized optimization

methods

21

Page 22: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Conclusions

• Developed a framework to dynamically tune systems and minimize programmer burden via online machine learning

• Demonstrated the framework through a case study of self-tuning “Smart Data Structures”

• Now looking at uses in systems beyond data structures jonathan dot eastep at gmail

• Reinforcement Learning will play an increasingly important role in the development of future software and hardware

22

Page 23: Smart Data Structures

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments

Presentation References

[1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE International Conference on Autonomic Computing (ICAC’10), 2010. Best Student Paper Award (pdf)

[2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009. (pdf)

[3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A Reinforcement Learning Approach to Multicore Data Structures,” 8th IEEE International Conference on Autonomic Computing (ICAC’11), 2011. (pdf)

[4] J. Eastep, “Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures,” Doctoral Dissertation, MIT, May 2011 (pdf)

23