Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas
-
Upload
spark-summit -
Category
Data & Analytics
-
view
678 -
download
0
Transcript of Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas
Flintrock: A faster, better spark-ec2
Nicholas Chammas, Spark Summit East 2016
1 / 26
Motivation
Common developer problem:
Give me a working clusterDon't bother me too much with the detailsMake it quick
2 / 26
spark-ec2
Single-purpose command-line toolLaunch and manage Spark clusters on EC2
3 / 26
spark-ec2
Single-purpose command-line toolLaunch and manage Spark clusters on EC2
Common use cases:PrototypingSpark performance testing (spark-perf)
4 / 26
spark-ec2
Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX
e.g. Having to type this out over and over again...
./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1
6 / 26
spark-ec2
Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX
e.g. Having to type this out over and over again...
./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1
Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:
SPARK-4325SPARK-5189
7 / 26
spark-ec2
Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX
e.g. Having to type this out over and over again...
./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1
Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:
SPARK-4325SPARK-5189
spark-ec2 was created as a convenience side-toolNot originally intended to stand as its own project
8 / 26
Why a new tool?
Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.
9 / 26
Why a new tool?
Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.
It's fun (for me to build)
10 / 26
Why a new tool?
Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.
It's fun (for me to build)Perhaps you don't want a framework
You want a single-purpose tool
11 / 26
Why a new tool?
Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.
It's fun (for me to build)Perhaps you don't want a framework
You want a single-purpose toolPerhaps you don't want to be tied to something proprietary
12 / 26
Flintrock
Features
Obsessive focus on speede.g. Launching a cluster with 100 slaves
spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes
14 / 26
Flintrock
Features
Obsessive focus on speede.g. Launching a cluster with 100 slaves
spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes
Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:
flintrock launch test-cluster
15 / 26
Flintrock
Features
Obsessive focus on speede.g. Launching a cluster with 100 slaves
spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes
Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:
flintrock launch test-cluster
AccessibilityInstall via pip - Python 3.5+ required
pip install flintrock
Standalone packages - Python not required!https://github.com/nchammas/flintrock/releases
16 / 26
Flintrock
Commands: launch, login, describe, stop, start, destroy, run-command, copy-fileExamples:run-command
flintrock run-command cluster 'sudo yum install -y expect'
copy-file
flintrock copy-file cluster small-file.json /tmp/
17 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
18 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3
19 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clusters
20 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux
Windows support possible in the future
21 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux
Windows support possible in the futureArchitecture supports multiple providers
Perhaps support for Google Compute Engine will be added later thisyear
22 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux
Windows support possible in the futureArchitecture supports multiple providers
Perhaps support for Google Compute Engine will be added later thisyear
100% open source; Apache 2.0 licensedNot company-backed
23 / 26
Flintrock
Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different
Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux
Windows support possible in the futureArchitecture supports multiple providers
Perhaps support for Google Compute Engine will be added later thisyear
100% open source; Apache 2.0 licensedNot company-backed
Contribute!We have unit and acceptance testsDevelopment done entirely on GitHub
24 / 26
Nicholas Chammas
https://github.com/nchammas/flintrock
Slideshow created using remark / gistdeck.
26 / 26