Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts...
Transcript of Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts...
![Page 1: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/1.jpg)
Targeted Resource Management in Multi-tenant Distributed Systems
Jonathan MaceBrown University
Peter BodikMSR Redmond
Rodrigo FonsecaBrown University
Madanlal MusuvathiMSR Redmond
![Page 2: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/2.jpg)
Resource Management in Multi-Tenant Systems
2
![Page 3: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/3.jpg)
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Resource Management in Multi-Tenant Systems
2
![Page 4: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/4.jpg)
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
![Page 5: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/5.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
![Page 6: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/6.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
![Page 7: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/7.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
![Page 8: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/8.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
![Page 9: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/9.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
OSHypervisor
![Page 10: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/10.jpg)
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
OSHypervisor
![Page 11: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/11.jpg)
Containers / VMs
4
![Page 12: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/12.jpg)
Containers / VMs
Shared Systems:Storage, Database,
Queueing, etc.
4
![Page 13: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/13.jpg)
5
![Page 14: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/14.jpg)
5
![Page 15: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/15.jpg)
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
5
![Page 16: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/16.jpg)
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
High-level, centralized policies:Encapsulates resource management logic
5
![Page 17: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/17.jpg)
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
High-level, centralized policies:Encapsulates resource management logic
Abstractions – not specific to resource type, system
Achieve different goals: guarantee average latencies, fair share a resource, etc.
5
![Page 18: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/18.jpg)
Hadoop Distributed File System (HDFS)
6
![Page 19: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/19.jpg)
Hadoop Distributed File System (HDFS)
HDFS NameNode
Filesystem metadata6
![Page 20: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/20.jpg)
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata6
![Page 21: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/21.jpg)
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
Rename
6
![Page 22: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/22.jpg)
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
ReadRename
6
![Page 23: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/23.jpg)
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
ReadRename
6
![Page 24: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/24.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
![Page 25: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/25.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
time
500
0
![Page 26: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/26.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
time
500
0
![Page 27: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/27.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
0
![Page 28: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/28.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
012
0
![Page 29: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/29.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
012
0
![Page 30: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/30.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
9
500
012
0
![Page 31: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/31.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
9
500
012
0500
0
![Page 32: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/32.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0
![Page 33: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/33.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0
![Page 34: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/34.jpg)
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0500
0
![Page 35: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/35.jpg)
Local storage
HDFS DataNode
11
![Page 36: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/36.jpg)
Local storage
HDFS DataNode
11
![Page 37: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/37.jpg)
Local storage
HBase RegionServer
HDFS DataNode
11
![Page 38: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/38.jpg)
Local storage
HBase RegionServer
HDFS DataNode
11
![Page 39: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/39.jpg)
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
![Page 40: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/40.jpg)
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
![Page 41: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/41.jpg)
Local storage
HBase RegionServer
HDFS DataNode
Hadoop YARNNodeManager
Yarn ContainerYarn ContainerHadoop YARN Container
MapReduce
Local storage
HBase RegionServer
HDFS DataNode
Hadoop YARNNodeManager
Yarn ContainerYarn ContainerHadoop YARN Container
MapReduce
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
![Page 42: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/42.jpg)
Goals
12
![Page 43: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/43.jpg)
Co-ordinated control across processes, machines, and services
Goals
12
![Page 44: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/44.jpg)
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Goals
12
![Page 45: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/45.jpg)
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Goals
12
![Page 46: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/46.jpg)
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Real-time and reactive
Goals
12
![Page 47: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/47.jpg)
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Real-time and reactive
Efficient: Only control what is needed
Goals
12
![Page 48: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/48.jpg)
Architecture
13
![Page 49: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/49.jpg)
14
![Page 50: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/50.jpg)
Tenant Requests
14
![Page 51: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/51.jpg)
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
15
![Page 52: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/52.jpg)
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
Unit of resource measurement, attribution, and enforcement
15
![Page 53: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/53.jpg)
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
Unit of resource measurement, attribution, and enforcement
Tracks a request across varying levels of granularityOrthogonal to threads, processes, network flows, etc.
15
![Page 54: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/54.jpg)
Workflows
16
![Page 55: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/55.jpg)
Workflows
16
![Page 56: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/56.jpg)
Workflows Resources
Purpose: cope with diversity of resources
16
![Page 57: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/57.jpg)
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
16
![Page 58: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/58.jpg)
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
2. Identify culprit workflows
16
![Page 59: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/59.jpg)
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
16
![Page 60: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/60.jpg)
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
16
![Page 61: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/61.jpg)
Workflows Resources
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
![Page 62: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/62.jpg)
Workflows Resources
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
![Page 63: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/63.jpg)
Workflows Resources
Slowdown (queue time + execute time) / execute timeeg. 100ms queue, 10ms execute
=> slowdown 11
Load time spent executingeg. 10ms execute
=> load 10
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
![Page 64: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/64.jpg)
Workflows Resources
17
![Page 65: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/65.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
18
![Page 66: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/66.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
18
![Page 67: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/67.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 68: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/68.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 69: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/69.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 70: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/70.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 71: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/71.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 72: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/72.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 73: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/73.jpg)
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
![Page 74: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/74.jpg)
Workflows Resources Control Points
20
![Page 75: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/75.jpg)
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
Workflows Resources Control Points
20
![Page 76: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/76.jpg)
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Workflows Resources Control Points
20
![Page 77: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/77.jpg)
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Policies run in continuous control loop
Po
licy
Po
licy
Po
licy
Workflows Resources Control Points
20
![Page 78: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/78.jpg)
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Policies run in continuous control loop
3. Distributed EnforcementCo-ordinates enforcement using distributed token bucket
Po
licy
Po
licy
Po
licy
Workflows Resources Control Points
Distributed Enforcement
20
![Page 79: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/79.jpg)
Re
tro
Co
ntr
oll
er
AP
I
Distributed Enforcement
Pervasive Measurement
Workflows Resources Control Points
Po
licy
Po
licy
Po
licy
“Control Plane” for resource management
Global, abstracted view of the systemEasier to write
Reusable
21
![Page 80: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/80.jpg)
22
Example: LatencySLOPolicy
![Page 81: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/81.jpg)
22
Example: LatencySLOH High Priority Workflows
Policy
“200ms average request latency”
![Page 82: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/82.jpg)
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)
![Page 83: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/83.jpg)
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
![Page 84: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/84.jpg)
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
attribute interference
![Page 85: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/85.jpg)
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
throttle interfering workflows
![Page 86: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/86.jpg)
23
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
![Page 87: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/87.jpg)
23
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
![Page 88: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/88.jpg)
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
![Page 89: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/89.jpg)
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
![Page 90: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/90.jpg)
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
![Page 91: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/91.jpg)
25
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Weight low priority workflows by their interference with W
Select the high priority workflow Wwith worst performance
Throttle low priority workflows proportionally to their weight
![Page 92: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/92.jpg)
26
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Throttle low priority workflows proportionally to their weight
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
![Page 93: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/93.jpg)
27
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
![Page 94: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/94.jpg)
27
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
![Page 95: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/95.jpg)
Other types of policy…
28
![Page 96: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/96.jpg)
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Policy
28
![Page 97: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/97.jpg)
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Dominant Resource FairnessEstimate demands and capacities from measurements
Policy
Policy
28
![Page 98: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/98.jpg)
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Dominant Resource FairnessEstimate demands and capacities from measurements
Policy
Policy
ConciseAny resources can be bottleneck (policy doesn’t care)Not system specific
28
![Page 99: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/99.jpg)
Evaluation
29
![Page 100: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/100.jpg)
Instrumentation
30
![Page 101: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/101.jpg)
Instrumentation
Retro implementation in JavaInstrumentation Library
Central controller implementation
30
![Page 102: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/102.jpg)
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
30
![Page 103: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/103.jpg)
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
Overheads
31
![Page 104: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/104.jpg)
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
31
![Page 105: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/105.jpg)
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
Overall 50-200 lines per system to modify RPCs
31
![Page 106: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/106.jpg)
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
Overall 50-200 lines per system to modify RPCs
Retro overhead: max 1-2% on throughput, latency
31
![Page 107: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/107.jpg)
Experiments
32
![Page 108: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/108.jpg)
ExperimentsYARN MapReduce
ZooKeeper
Systems
HDFSHBase
32
![Page 109: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/109.jpg)
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
HDFSHBase
32
![Page 110: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/110.jpg)
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
HDFSHBase
32
![Page 111: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/111.jpg)
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
HDFSHBase
32
![Page 112: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/112.jpg)
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
HDFSHBase
32
![Page 113: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/113.jpg)
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
This talk LatencySLO policy results
HDFSHBase
32
![Page 114: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/114.jpg)
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
This talk LatencySLO policy results
HDFSHBase
32
![Page 115: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/115.jpg)
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
33
![Page 116: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/116.jpg)
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
![Page 117: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/117.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
![Page 118: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/118.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
![Page 119: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/119.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
![Page 120: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/120.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
![Page 121: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/121.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
![Page 122: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/122.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 123: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/123.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 124: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/124.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 125: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/125.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 126: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/126.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HDFS NN Queue
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 127: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/127.jpg)
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HDFS NN Queue
HBase Queue
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
![Page 128: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/128.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
+ SLO Policy Enabled
SLO target
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
35
SLOtarget
![Page 129: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/129.jpg)
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
+ SLO Policy Enabled
SLO target
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
35
SLOtarget
![Page 130: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/130.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 131: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/131.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 132: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/132.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 133: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/133.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 134: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/134.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 135: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/135.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 136: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/136.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 137: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/137.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 138: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/138.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 139: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/139.jpg)
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
![Page 140: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/140.jpg)
Conclusion
37
![Page 141: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/141.jpg)
Resource management for shareddistributed systems
Conclusion
37
![Page 142: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/142.jpg)
Resource management for shareddistributed systems
Centralized resource management
Conclusion
37
![Page 143: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/143.jpg)
Resource management for shareddistributed systems
Centralized resource management
Conclusion
Comprehensive: resources, processes, tenants, background tasks
37
![Page 144: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/144.jpg)
Resource management for shareddistributed systems
Centralized resource management
Conclusion
Comprehensive: resources, processes, tenants, background tasks
Abstractions for writing concise, general-purpose policies:
WorkflowsResources (slowdown, load)Control points
37
![Page 145: Retro: Targeted Resource Management in Multi …...Code change increases database usage, shifts bottleneck to unmanaged application-level lock • Nov. 2014 –Visual Studio Online](https://reader036.fdocuments.net/reader036/viewer/2022080718/5f78675c3836c9773c7c5c6a/html5/thumbnails/145.jpg)
Resource management for shareddistributed systems
Centralized resource management
Conclusion
http://cs.brown.edu/~jcmace
Comprehensive: resources, processes, tenants, background tasks
Abstractions for writing concise, general-purpose policies:
WorkflowsResources (slowdown, load)Control points