Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth...
-
Upload
peter-goodman -
Category
Documents
-
view
224 -
download
1
Transcript of Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth...
![Page 1: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/1.jpg)
Reining in the Outliers in Map-Reduce Clusters using Mantri
Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, Edward HarrisYi Lu, Bikas Saha, Edward Harris@Microsoft@Microsoft
Presenter: Weiyue XuPresenter: Weiyue Xu
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementationdesign and implementation
![Page 2: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/2.jpg)
Credits
• Modified version of – http://
www.cs.duke.edu/courses/fall10/cps296.2/lectures/
– research.microsoft.com/en-us/UM/people/srikanth/data/Combating%20Outliers%20in%20Map-Reduce.web.pptx
![Page 3: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/3.jpg)
Outline
• Introduction
• Cause of the Outlier
• Mantri
• Evaluation
• Discussion and Conclusion
![Page 4: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/4.jpg)
log(size of dataset)GB109
TB1012
PB1015
EB1018
log(size of cluster)
104
1
103
102
101
105
HPC,|| databases
mapreduce
MapReduce • Decouples customized data operations from mechanisms to scale• Widely used
• Cosmos (based on SVC’s Dryad) + Scope @ Bing• MapReduce @ Google• Hadoop inside Yahoo! and on Amazon’s Cloud (AWS)
e.g., the Internet, click logs, bio/genomic data
![Page 5: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/5.jpg)
Local write
An Example
How it Works:
Goal Find frequent search queries to Bing
SELECT Query, COUNT(*) AS FreqFROM QueryTableHAVING Freq > X
What the user says:
Read Map Reduce
file block 0
job manager
task
task
tasktask
task
output block 0
output block 1
file block 1
file block 2
file block 3
assign work, get progress
![Page 6: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/6.jpg)
Outliers slow down map-reduce jobs
Map.Read 22K
Map.Move 15K
Map 13K
Reduce 51K
Barrier
File System
We find that:
6
Tackling outliers, we can speed up jobs while using resources efficiently:• Quicker response improves productivity• Predictability supports SLAs• Better resource utilization
![Page 7: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/7.jpg)
From a phase to a job
•A job may have many phases
•An outlier in an early phase has a cumulative effect
•Data loss may cause multi-phase recompute outliers
![Page 8: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/8.jpg)
Delay due to a recompute readily cascades
Why outliers?
reduce
sort
Delay due to a recompute
map
8
Due to unavailable input, tasks have to be recomputed
![Page 9: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/9.jpg)
Outlier
stragglers : Tasks that take 1.5 times the median task in that phaserecomputes : Tasks that are re-run because their output was lost
50% phases have > 10% stragglers and no recomputes10% of the stragglers take >10X longer
Frequency of Outliers
straggler straggler
![Page 10: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/10.jpg)
At median, jobs slowed down by 35% due to outliers
Cost of Outliers
![Page 11: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/11.jpg)
Previous solution
• The original MapReduce paper observed the problem and did not solve it in depth
• Current schemes (e.g. Hadoop, LATE) duplicate long-running tasks based on some metrics
• Drawbacks–Some may be unnecessary –Use extra resources–Placement may be a problem
![Page 12: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/12.jpg)
What this Paper is About
• Identify fundamental causes of outliers
• Mantri: A Cause-, Resource-Aware mitigation scheme– Case by case analysis: takes distinct actions based
on cause– Considers opportunity cost of actions
• Results from a production deployment
![Page 13: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/13.jpg)
Causes of Outliers
• Data Skew: data size varies across tasks in a phase.
![Page 14: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/14.jpg)
Reduce task
Map output
uneven placement is typical in productionreduce tasks are placed at first available slots
• Crossrack Traffic
Causes of Outliers
Rack
![Page 15: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/15.jpg)
•50% of recomputes happen on 5% of the machines
•Recompute increases resource usage
• Bad and Busy MachinesCauses of Outliers
![Page 16: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/16.jpg)
•70% of cross track traffic is reduce traffic•Reduce reads from every map, tasks in a spot with slow network run slower•Tasks compete network among themselves
50% phases takes 62% longer to finish than ideal placement
Causes of Outliers
![Page 17: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/17.jpg)
•Outliers cluster by time–Resource contention might be the cause
•Recomputes cluster by machines –Data loss may cause multiple recomputes
![Page 18: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/18.jpg)
Mantri• Cause aware, and resource aware• Fix each problem with different strategies• Runtime = f (input, network, dataToProcess, ...)
![Page 19: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/19.jpg)
Delay due to a recompute readily cascades
Why outliers?
reduce
sort
Delay due to a recompute
map
19
Mantri [Avoid Recomputations]
![Page 20: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/20.jpg)
Mantri [Avoid Recomputations]• Idea: Replicate intermediate data, use copy if original
is unavailable• Challenge: What data to replicate?• Insight: Cost to recompute vs cost to replicate
M1
M2
tredo = r2(t2
+t1redo)
• Cost to recompute depends on data loss probabilities and time taken, and also recursively looks at prior phases.
• tredo > t rep Mantri preferentially acts on more costly inputs
20
t = predicted runtime of taskr = predicted probability of recomputation at machine
![Page 21: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/21.jpg)
Reduce task
Map output
uneven placement is typical in productionreduce tasks are placed at first available slots
• Crossrack Traffic
Rack
Mantri [Network Aware Placement]
![Page 22: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/22.jpg)
Mantri [Network Aware Placement]• Idea: Avoid hot-spots, keep traffic on a link proportiona
l to bandwidth• Challenges: Global co-ordination, congestion detection• Insights:
– Local control is a good approximation(each job balances its traffic)
– Link utilizations average out on the long task and are steady on the short task
If rack i has di map output need dui and dv
i for upload and download while bu
i and bvi bandwidths available
Place ai fraction of reduces such that: 22
![Page 23: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/23.jpg)
• Data Skew - About 25% of outliers occur due to more dataToProcess (workload imbalance)
• Ignoring these is better than duplicating (state-of-the-art)
Mantri [Data Aware Task Ordering]
![Page 24: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/24.jpg)
Mantri [Data Aware Task Ordering]• Problem: Workload imbalance causes tasks to straggle.• Idea: Restarting outliers that are lengthy is counter-
productive.• Insights:
– Theorem [Graham, 1969]– Scheduling tasks with longest processing time first is
at-most 33% worse than optimal schedule.
24
Builds an estimator T ~ function (dataToProcess) We schedule tasks in descending order of dataToProcess
Theorem [due to Graham, 1969] Doing so is no more than 33% worse than the optimal
![Page 25: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/25.jpg)
Mantri [Resource Aware Restart]• Problem: 25% outliers remain, likely due to
contention@machine• Idea: Restart tasks elsewhere in the cluster asap• Challenge: restart or duplicate?
25
(a)
(b)
(c)
Running task
Potential restart (tnew)
nowtime
trem
√
××
Save time and resources iffP(c trem > (c+1) tnew) > δ
If pending work, duplicate iff save both time and resource Else, duplicate if expected savings are high
Continuously observe and kill wasteful copies
![Page 26: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/26.jpg)
26
Summary• Reduce recomputation:
– preferentially replicate costly-to-recompute tasks• Poor network:
– each job locally avoids network hot-spots• DataToProcess:
– schedule in descending order of data size• Bad machines:
– quarantine persistently faulty machines• Others:
– restart or duplicate tasks, cognizant of resource cost.
![Page 27: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/27.jpg)
Evaluation Methodology
• Mantri run on production clusters
• Baseline is results from Dryad
• Use trace-driven simulations to compare with other systems
![Page 28: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/28.jpg)
Comparing Jobs in the Wild
w/ and w/o Mantri for one month of jobs in Bing production cluster
340 jobs that each repeated at least five times during May 25-28 (release) vs. Apr 1-30 (pre-release)
![Page 29: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/29.jpg)
In Production, Restarts…
![Page 30: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/30.jpg)
In Trace-Replay Simulations, Restarts…
CD
F %
clu
ster
res
ourc
es
CD
F %
clu
ster
res
ourc
es
![Page 31: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/31.jpg)
Protecting Against Recomputes
CD
F %
clu
ster
res
ourc
es
![Page 32: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/32.jpg)
Conclusion
• Outliers are a significant problem
• Happens due to many causes
• Mantri: cause and resource aware mitigation outperforms prior schemes
32
![Page 33: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/33.jpg)
Discussion
• Mantri does case by case analysis for each cause, what if the causes are inter-dependent?
![Page 34: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/34.jpg)
Questions or Comments?
Thanks!
![Page 35: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/35.jpg)
Estimation of trem and tnew
• d: input data size
• dread: the amount read
![Page 36: Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649efd5503460f94c1109e/html5/thumbnails/36.jpg)
Estimation of tnew
• processRate: estimated of all tasks in the phase• locationFactor: machine performance• d: input size