MIRANTIS 2013 The State of OpenStack Data Processing: Sahara,
Now and in Juno Sergey Lukjanov (Mirantis) Matthew Farrellee (Red
Hat) John Speidel (Hortonworks)
OpenStack Data Processing: Sahara Mission: To provide a
scalable data processing stack and associated management
interfaces. provision and operate Hadoop clusters schedule and
operate Hadoop jobs
Hadoop - Big Data Platform
http://hortonworks.com/hadoop/yarn/
Trends http://www.google.com/trends/
Use cases Self-service provisioning of Hadoop clusters
Utilization of unused compute capacity for bursty workloads Dev
-> Stage -> Prod lifecycle Run Hadoop workloads in few clicks
without expertise in Hadoop ops
Architecture overview Data Sources Savanna Python Client
RESTAPI Cluster Configuration Manager Horizon Keystone Auth Data
Access Layer Swift Savanna Pages Hadoop VM Vendors Plugins Hadoop
VM Hadoop VM Hadoop VM Resources Orchestration Manager Job Sources
Job Manager Heat Nova Glance Cinder Neutron Trove DB
Sahara status Official integrated OpenStack project Supported
Hadoop distros: Vanilla Apache Hadoop Hortonworks Data Platform
Intel Distribution Cloudera Distribution in blueprint Included into
OpenStack distros: RDO - openstack.redhat.com Mirantis OpenStack -
software.mirantis.com
Icehouse release Sahara easily deployed with DevStack
Icehouse release Hadoop 2 available via all plugins
http://hortonworks.com/hadoop/yarn/
Icehouse release HBase (and Sqoop) available via HDP plugin
Spark images w/ diskimage-builder (full plugin in review) Heat for
provisioning i18n translation started Neutron namespaces w/
rootwrap Guest agent implementation started
Elastic Data Processing (EDP) is Saharas take on data
processing workflow management. Goal - let end users (those w/ high
value questions to answer) get answers about data without having to
know a single thing about cluster management. Customers launch
millions of Amazon EMR clusters every year.
http://aws.amazon.com/elasticmapreduce/ Elastic Data Processing
update
Elastic Data Processing update Available with the Hortonworks
Data Platform plugin
Elastic Data Processing update Support for external HDFS
Elastic Data Processing update MapReduce.Streaming and Java
actions
Elastic Data Processing update Job relaunch, with new data and
parameters
Command line interface overview If you can do it with the
Dashboard, you can do it from the command-line Blueprint:
python-savannaclient-cli
Command line interface overview Image management $ sahara ...
Positional arguments: image-add-tag Add a tag to an image.
image-list Print a list of available images. image-register
Register an image from the Image index. image-remove-tag Remove a
tag from an image. image-show Show details of an image.
image-unregister Unregister an image.
Command line interface overview Node group, cluster and job
templates $ sahara node-group-template-create Create a node
group... node-group-template-delete Delete a node group...
node-group-template-list Print a list of available...
node-group-template-show Show details of a node...
cluster-template-create Create a cluster template.
cluster-template-delete Delete a cluster template.
cluster-template-list Print a list of available...
cluster-template-show Show details of a cluster...
job-template-create Create a job template. job-template-delete
Delete a job template. job-template-list Print a list of job...
job-template-show Show details of a job...
Command line interface overview Data sources and job binaries $
sahara ... data-source-create Create a data source that provides
job input receives job output. data-source-delete Delete a data
source. data-source-list Print a list of available data...
data-source-show Show details of a data source. job-binary-create
Record a job binary. job-binary-delete Delete a job binary.
job-binary-list Print a list of job binaries. job-binary-show Show
details of a job binary.
Command line interface overview Clusters and jobs $ sahara ...
cluster-create Create a cluster. cluster-delete Delete a cluster.
cluster-list Print a list of available clusters. cluster-show Show
details of a cluster. job-create job-delete Delete a job. job-list
Print a list of jobs. job-show Show details of a job.
HDP Plugin Overview Full support for all Sahara Functionality
Nova and Neutron network Cluster Scaling Scale Up Swift Integration
Cinder Support Data Locality EDP Apache Ambari REST APIs used for
cluster provisioning Monitoring/Management of clusters via Ambari
Full support for multiple HDP stacks HDP pre-installed or generic
VM images
HDP 1.3.2 NameNode Secondary NameNode DataNode HDFS ZooKeeper
Ambari Server/Agent HCatalog Sqoop Job Tracker Task Tracker
MapReduce Hive MySQL Pig WebHCat Server Oozie Ganglia Nagios HBase
HDP Plugin Stack Support HDP 2.0.6 History Server MapReduce 2 /
YARN Resource Manager YARN Client HDP 2.1 Storm Falcon C om ing
Soon! A vailable A vailable HDP 2.1 + SOLR Cascading R oadm ap
HDP Disk Images Disk Image Builder offers consistent approach
for image creation HDP Plugin provides images and scripts for
(CentOS, RHEL): Plain 1.3.2 2.0.6 2.1 (coming soon) Pre-Packaged
images (1.3.2, 2.0.6) provide images with HDP packages pre-
installed for accelerated provisioning, reduced network traffic
Image Build Scripts allow images to be customized Security Custom
Packages O/S Settings
Ambari Blueprints Two primary goals of Ambari Blueprints
Ability to export a complete description of a running cluster
Provide API based cluster installations based on a self- contained
cluster description Blueprints contain cluster topology and
configuration information Enables Interesting use cases between
physical and virtual, including OpenStack/Sahara
Juno roadmap Further integration with OpenStack ecosystem:
Distributed architecture Guest agents EDP enhancements Merge
dashboard to Horizon To be discussed and confirmed at Design
Summit