1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai...
-
Upload
sophia-allen -
Category
Documents
-
view
223 -
download
1
Transcript of 1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai...
1
Dryad
Distributed Data-Parallel Programs from Sequential
Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft Research, Silicon Valley
Presented by: Thomas Hummel
2
Introduction System Overview Dryad Graph Program Development Program Execution Experimental Results Future Work
Agenda
Introduction
Problem How to write efficient distributed programs
easily? Environment
Parallel Processors High Speed Links Administered Domain
Ignore Low Level Issues
3
Introduction
Parallel Execution Faster Execution
Automatic Specification Manual Specification
GPU Shader Distributed Databases MapReduce
4
Introduction
5
Graph Model Verticies Are Programs Edges Are Communication Links
Forced Parallelism Mindset Necessary Abstraction
Introduction
6
GPU Shader Low Level Hardware Specific
MapReduce Simplicity Paramount Performance Sacrificed
Database Implicit Communication Algebra Optimized
Introduction
7
Dryad Fine Communication Control Multiple Input/Output Sets Must Consider Resources
Execution Engine Executes DAG Of Programs Outputs Directed To Inputs No Recursion
System Overview
8
Dryad Job DAG Data Passed On Edges Vertex is a Program
Message Structure User Defined Shared Memory TCP Files
System Overview
9
Dryad Job DAG Data Passed On Edges Vertex is a Program
Message Structure User Defined Shared Memory TCP Files
System Overview
10
System Organization Job Manager Name Server Dameon (Work Nodes)
Dryad Graph
11
Graph Description Language “Embedded” in C++ Combine Sub-Graphs
C++ Class Inherited By Vertex Program Program Name Program Factory
Dryad Graph
12
Vertex Creation C++ Class Inherited By Vertex Program Program Name Program Factory One Vertex Is a Graph
Factory Called Program Specific Arguments Applied
Dryad Graph
13
Edge Creation Composition (Combine) Operation Two Graphs Varying Assignment Methods
Dryad Graph
14
Dryad Graph
15
Communication Channel File I/O By Default TCP Shared Memory
Pitfall: Connected Vertices Must Be On Same Process
Deadlock Avoidance DAG Architecture
Program Development
16
Vertex Program Development C++ Base Classes Status And Errors Reported to Job
Manager Standard “Main” Method Channel Readers/Writers
Supplied Via Argument List
Legacy Programs C++ Wrapper
Program Development
17
Pipelined Execution Assuming Sequential Code Event Based Programming Channels Are Asynchronous Thread Pool Optimized For Verticies
Program Execution
18
Job Manager Job Ends If JM Machine Fails Different Schemes Possible To Avoid This Versioning System For Execution
Instances Vertex Execution
Starts When All Input Channels Ready User Can Specify Execution Machine Can Be Re-Run On Failures Job Ends After All Verticies Have Run
Program Execution
19
Fault Tolerance Re-Run Vertex If Failed Channel Re-Creation (File Recreation) TCP/Shared Memory Failures Cause
Failures On All Connected Vertices Staged Execution Allows Intermediate
Error Checking
Experimental Results
20
SQL Operation 10 Computer Cluster Gigabit Connections
Data Mining Operation 1800 Computer Cluster 10 TB Data Set 11 Minute Execution Time
Future Work
21
Scripting Language Nebula Additional Abstraction
SISS Integration SQL Server Integration
Distributed SQL Queries Query Optimizer