Piccolo – Paper Discussion Big Data Reading Group
description
Transcript of Piccolo – Paper Discussion Big Data Reading Group
![Page 1: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/1.jpg)
Piccolo – Paper Discussion
Big Data Reading Group
9/20/2010
![Page 2: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/2.jpg)
Motivation / Goals
• Rising demand for distributing computation• PageRank, K-Means, N-Body simulation
• Data-centric frameworks simplify programming• Existing models (e.g. MapReduce) are insufficient• Designed for large scale data analysis as opposed to in-memory
computation
• Make in-memory computations fast• Enable asynchronous computation
9/20/2010 Piccolo – Paper Discussion2
![Page 3: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/3.jpg)
Overview
• Global in-memory key-value tables for sharing state• Concurrently running instances of kernel applications modifying
global state• Locality optimized (user specified policies)• Reduced synchronization (accumulation, global barriers)• Checkpoint-based recovery
9/20/2010 Piccolo – Paper Discussion3
![Page 4: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/4.jpg)
System Design
9/20/2010 Piccolo – Paper Discussion4
![Page 5: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/5.jpg)
Table interface
9/20/2010 Piccolo – Paper Discussion5
![Page 6: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/6.jpg)
Optimization
• Ensure Locality• Group kernel with partition• Group partitions• Guarantee: one partition completely on single machine
• Reduce Synchronization• Accumulation to avoid write/write conflicts• No pairwise kernel synchronization• Global barriers sufficient
9/20/2010 Piccolo – Paper Discussion6
![Page 7: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/7.jpg)
Load balancing
• Assigning partitions• Round robin• Optimized for data location
• Work stealing• Biggest task first (master estimates based on number of keys in partition)• Master decides
• Restrictions• Cannot kill running task (modifies shared state, restore is very expensive)• Partitions need to be moved
9/20/2010 Piccolo – Paper Discussion7
![Page 8: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/8.jpg)
Table migration
• Migrate table from wa to wb• Message M1 from master to all workers• All workers flush to wa
• All workers send all new requests to wb
• wb buffers all requests• wa sends paused state to wb
• All workers ackknowledge phase 1 => master sends M2 to wa and wb
• wa flushes to wb and leaves “paused”• wb first works buffered requests then resumes normal operation
9/20/2010 Piccolo – Paper Discussion8
![Page 9: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/9.jpg)
Fault tolerance
• User assisted checkpoint / restore• Chandy Lamport• Asynchronic -> periodic• Synchronic -> barrier
• Problem: When to start barrier checkpoint• Replay log might get very long• Checkpoint might not use enough free CPU time before barrier
• Solution: When first worker finished all his jobs
• No checkpoint during table migration and vice versa
9/20/2010 Piccolo – Paper Discussion9
![Page 10: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/10.jpg)
Applications
• PageRank, k-means, n-body, matrix multiplication• Parallel, iterative computations• Local reads + local/remote writes or local/remote reads + local writes• Can be implemented as multiple MapReduce jobs
• Distributed web crawler• Idempotent operation• Cannot be realized in MapReduce
9/20/2010 Piccolo – Paper Discussion10
![Page 11: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/11.jpg)
Scaling
9/20/2010 Piccolo – Paper Discussion11
Fixed input size
Scaled input size
![Page 12: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/12.jpg)
Comparison with Hadoop / MPI
9/20/2010 Piccolo – Paper Discussion12
• PageRank, k-means (Hadoop)• Piccolo 4x and 11x faster• For PageRank:• 50% in sort• Join data streams• 15% (de)serialization• Read/write HDFS
• Matrix multiplication (MPI)• Piccolo 10% faster• MPI waits for slowest node
many times
![Page 13: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/13.jpg)
Work stealing / slow worker / checkpoints
9/20/2010 Piccolo – Paper Discussion13
• Work stealing / slow worker• PageRank has skewed
partitions• One slow worker (50% CPU)
• Checkpoints• Naïve - start after all workers
finished• Optimized – start after first
worker finished
![Page 14: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/14.jpg)
Checkpoint limits / scalability
9/20/2010 Piccolo – Paper Discussion14
• Hypothetical data center• Typical machine uptime of 1 year• Worst-case scenario• Optimistic?
• Looked different on some older slides
![Page 15: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/15.jpg)
Distributed Crawler
9/20/2010 Piccolo – Paper Discussion15
• 32 Machines saturate 100Mbps• There are single servers doing
this• Piccolo would scale higher
![Page 16: Piccolo – Paper Discussion Big Data Reading Group](https://reader036.fdocuments.net/reader036/viewer/2022062323/568160c3550346895dcfee99/html5/thumbnails/16.jpg)
Summary
• Piccolo provides an easy to use distributed shared memory model• It applies many restrictions• Simple interface• Reduced synchronization• Relaxed consistency• Accumulation• Locality
• But it performs well• Iterative computations• Saves going to disk compared to MapReduce
• A specialized tool for data intensive in-memory computing
9/20/2010 Piccolo – Paper Discussion16