The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
-
Upload
milind-bhandarkar -
Category
Data & Analytics
-
view
111 -
download
0
description
Transcript of The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands Labrador 💛 Elephant, Thanks to Hamster
Milind Bhandarkar Chief Scientist, Pivotal Software, Inc.
About Me• http://www.linkedin.com/in/milindb
• Founding member of Hadoop team at Yahoo! [2005-2010]
• Contributor to Apache Hadoop since v0.1
• Built and led Grid Solutions Team at Yahoo! [2007-2010]
• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
• Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)
Hamster
• Hadoop and MPI on the same cluster
• Runtime for OpenMPI applications on YARN
• Available on Pivotal HD
Why MPI ?• Hadoop Dataflow paradigms (MapReduce,
TeZ etc) not suitable for iterative applications
• Message Passing Interface (MPI)
• Mature standard
• Used extensively in HPC
• Huge ecosystem
MPI in Science & Engg
Earth Atmosphere
Chemistry
Biology
Math Nuclear
MPI in Industry
Mechanical �ar
Finance/bank Oil Exploration Cryptography
Spacecraft
OpenMPI
• Mature Open Source implementation of MPI 3.0 Standard (mpi-forum.org)
• New BSD license
• 30+ contributing organizations from academia, research and industry
• http://open-mpi.org
OpenMPI Architecture
Pluggable
Hamster Design• YARN as Resource Manager
• Hamster Application Manager
• Manages MPI jobs
• (tries to) Implement Gang-Scheduling
• Leverages OMPI/ORTE strengths
• Wire-up, Task monitoring, Fast Interconnect
Hamster ArchitectureResource Manager
Scheduler
AMService
Node Manager Node Manager Node Manager
…
Proc/Container
Framework Daemon NSMPI Scheduler HNP
MPI AM
Proc/Container…RM-AM
AM-NM
RM-NodeManagerClient
Client-RM
Aux Srvcs
Proc/Container
Framework Daemon NS
Proc/Container…
Aux SrvcsRM-
NodeManager
Hamster AppMaster• Master daemon for MPI ( similar to JobTracker in
MapReduce)
• Implements and participates in the YARN-RM App lifecycle protocol
• Maintains heartbeat with RM to ensure liveness
• MPI Scheduler - Negotiates resource allocation with YARN-RM
• Head Node Process (HNP) - manages job execution
Hamster Node Service
• User-level daemon per MPI job
• Manages task execution
• Coarse-grained container management
• Bootstrapped by YARN-NM
• Implemented as YARN Auxiliary Service
Why GraphLab on Hadoop ?
• Graph Analytics & Machine Learning only one stage in E2E data pipeline
• ETL/Preprocessing
• Building Graphs from fact & dimension tables
• Publishing analytics results, post-processing
GraphLab 2.2
• Communication patterns based on Data
• Several Toolkits (Graph Analytics + ML Algorithms) available
• Graph-Programming API
• Uses MPI for communication
Pivotal HD
HDFS
HBase Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource
Management & Workflow
Yarn
Zookeeper
Apache Pivotal
Command Center Configure,
Deploy, Monitor, Manage
Spring XD
Pivotal HD Enterprise
Spring
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced Database Services
Distributed In-memory
Store
Query Transactions
Ingestion Processing
Hadoop Driver – Parallel with Compaction
ANSI SQL + In-Memory
GemFire XD – Real-Time Database Services
MADlib Algorithms
Oozie
Virtual Extensions
Graphlab, Open MPI
Performance
Test Environment
• Pivotal Analytics Workbench Cluster
• Pivotal HD 1.1 (Apache Hadoop 2.0.5)
• Hamster - 1.0, OpenMPI-1.7.2
• 515 nodes
• 2x6-core Westmere, 48GB RAM, 12x2TB SATA, Mellanox FDR Infiniband
Null Job• Measures overhead of launching MPI jobs
• Tests scalability of resource allocation, launching and wire-up
• Sub-linear scalability (slightly worse than O(logN)
• Overhead of launching 15000 processes = 1 minute
Total RuntimeTi
me
(Sec
.)
5
18.75
32.5
46.25
60
Process number0 4000 8000 12000 16000
E2E time
Allocation TimeTi
me
(Sec
.)
1
2.25
3.5
4.75
6
Number of Processes0 4000 8000 12000 16000
Allocation Time
Launch TimeTi
me
(Sec
.)
0
7.5
15
22.5
30
Number of processes0 4000 8000 12000 16000
Launch Time
Comparison with OpenMPI
• HPL (HP Linpack for Top-500)
• Number of processes 50—1000
• Hamster 1% slower than OpenMPI
HPL - Hamster vs OpenMPI
Tim
e (S
ec.)
0
30
60
90
120
1000 500 200 50
GraphLab ALS
• Wikipedia dataset
• 4.3 M terms, 3.3M documents, 513M occurrences
• 17 Processes
• 5 Iterations
GraphLab ALSTi
me
(Sec
.)
0
335
670
1005
1340
Hamster OpenMPI
GraphLab PageRank• Twitter Dataset
• 4.1 M nodes, 1.4 B edges
• Data Size : 26GB
• NP = 17
• 50 iterations: 297 seconds
• 100 iterations: 339 seconds
Questions?