Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

32
Apache Bigtop: a crash course in deploying a Hadoop platform Apache Bigtop: a crash course in deploying a Hadoop platform

description

A long time ago in a galaxy far, far away only the chosen few could deploy and operate a fully functional Hadoop cluster. Vendors were taking pride in rationalizing this experience to their customers by creating various distributions including Apache Hadoop. It all changed when Cloudera decided to support Apache Bigtop as the first 100% community driven bigdata management distribution of Apache Hadoop. Today, most major commercial distribution of Apache Hadoop are based on Bigtop. Bigtop has won the Hadoop distributions war and is offering a superset of packaged components. In this talk we will focus on practical advice of how to deploy and start operating a Hadoop cluster using Bigtop’s packages and deployment code. We will dive into the details of using packages of Hadoop ecosystem provided by Bigtop and how to build data management pipelines in support your enterprise applications.

Transcript of Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Page 1: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Apache Bigtop: a crash course in deploying a Hadoop platformApache Bigtop: a crash course in deploying a Hadoop platform

Page 2: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

“3 + 7 + 9 + 1”The complexity of the stack

“3 + 7 + 9 + 1”The complexity of the stack

Page 3: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Hadoop ecosystem• Many components• Gazillions of versions• Lot of patches if you like it hot and dirty

Page 4: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Presenters• Dr. Konstantin Boudnik, Roman Shaposhnik• Initial co-inventors of Apache Bigtop• Active contributors, committers, and PMCs on

multiple Apache TLPs in Hadoop ecosystem• Expertise in

• Compilers• Operating systems, Virtual machines, JVM• Distributed systems• Complex software stacks

Page 5: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Page 6: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Hadoop ≠ HDFS + MR• Today's Hadoop is more than storage + MR• Baby elephant outgrew his cradle

• Hbase• SQL frontends• In-memory processing• Storage caching• Connectors• DSL languages• <your name is here>

Page 7: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Complexity in the extreme

Page 8: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

“One to bring them all”Bigtop is a simple answer “One to bring them all”

Bigtop is a simple answer

Page 9: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Bigtop stack: take it & go• Modify a stack BOM

• Build• Deploy• Configure w/ Puppet (included)• Test (scenarios are provided / easy to add)• Grab an appliance if short of hardware

Page 10: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Rinse and repeat

Page 11: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Field case studies:Pivotal

WANdisco

Field case studies:Pivotal

WANdisco

Page 12: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

From 0 to full stack in 28 days:WANdisco case study

From 0 to full stack in 28 days:WANdisco case study

Page 13: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

New distro in 4 weeks• Major challenges for a new player:

• Need to have a stable development platform

• Offering Apache Hadoop certified binaries• Team with no prior expertise in the field• Very complex Hadoop ecosystem

landscape

Page 14: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Bigtop to help• Define component versions in the BOM• Make component changes if needed

• Run Bigtop build• Deploy the cluster w/ provided puppet• Test the cluster with integration suite• Rinse and repeat as needed

• Seamless integration into CI• Easy provisioning and incremental updates

Page 15: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Page 16: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Page 17: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Page 18: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Page 19: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Hadoop in the cloudPivotal case study

Hadoop in the cloudPivotal case study

Page 20: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Hadoop @Pivotal: history• Started at Greenplum (GPHD)

• A stand-alone Hadoop distribution• Based on a fork of Bigtop 0.3-incubating• Hadoop 1.0.1 based ecosystem• No community interaction

• Graduated as a PHD at Pivotal• A PivotalONE vision• Based on a fork of Bigtop 0.4• Hadoop 2.x based ecosystem• Integrated with HAWQ

Page 21: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Lessons for Pivotal• Think of Bigtop as “Fedora”• Become part of the community • Say “no” to forking• Work on custom requirements upstream• Participate in release planning• Leverage Bigtop's infrastructure internally

Page 22: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Lessons for Bigtop• Promote common build and RE infrastructure

• “Codifying” it• Avoid broken windows syndrome

• Bigtop releases don't patch, but vendors do• Make tests easier to use• Invest in documentation

• Whitepapers, demos, etc.

Page 23: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Road aheadRoad ahead

Page 24: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Packaging/deployment• Deployment environments:

• vanilla servers == packages ?• vanilla Vms/containers == JeOS/baking ?• specializes VMs == Osv

• Baking vs frying• Rolling upgrades

• side-by-side install

Page 25: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Validation• Investing in iTest

• Easier test management and execution• Cluster discovery and management

framework• More test cases

• Lowering entry-level barriers• Less rigid user-facing interfaces: Gradle• Mix-n-match built-in integration tests

• Becoming a TCK for Hadoop ecosystem• Engaging more into trunk testing

Page 26: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Growing the ecosystem• 1st Hadoop distribution including Apache

Spark• But there's more:

• GridGain• HBase indexer• Lipstick (not for Pig)• Ambrose• Launch it all to the Stratosphere?

Page 27: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Growth in last 6 months• Apache Spark: In-memory analytics• Phoenix: Hbase SQL frontend• Groovy• Unification of user-facing interfaces

• Gradle build system• Received proposal in include GridGain

in-memory platform

Page 28: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Community• 23 committers• Over 100 contributors• Quarterly releases• Hackathons and more

Page 29: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Demo (worth 1k words)

Page 31: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Q & AQ & A

Page 32: Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

Thank you@c0sin@rhatr

Thank you@c0sin@rhatr