Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr....

32
Apache Bigtop: a crash course in deploying a Hadoop platform Apache Bigtop: a crash course in deploying a Hadoop platform

Transcript of Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr....

Page 1: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Apache Bigtop: a crash course in deploying a Hadoop platformApache Bigtop: a crash course in deploying a Hadoop platform

Page 2: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

“3 + 7 + 9 + 1”The complexity of the stack

“3 + 7 + 9 + 1”The complexity of the stack

Page 3: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Hadoop ecosystem• Many components• Gazillions of versions• Lot of patches if you like it hot and dirty

Page 4: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Presenters• Dr. Konstantin Boudnik, Roman Shaposhnik• Initial co-inventors of Apache Bigtop• Active contributors, committers, and PMCs on

multiple Apache TLPs in Hadoop ecosystem• Expertise in

• Compilers• Operating systems, Virtual machines, JVM• Distributed systems• Complex software stacks

Page 5: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •
Page 6: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Hadoop ≠ HDFS + MR• Today's Hadoop is more than storage + MR• Baby elephant outgrew his cradle

• Hbase• SQL frontends• In-memory processing• Storage caching• Connectors• DSL languages• <your name is here>

Page 7: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Complexity in the extreme

Page 8: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

“One to bring them all”Bigtop is a simple answer “One to bring them all”

Bigtop is a simple answer

Page 9: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Bigtop stack: take it & go• Modify a stack BOM

• Build• Deploy• Configure w/ Puppet (included)• Test (scenarios are provided / easy to add)• Grab an appliance if short of hardware

Page 10: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Rinse and repeat

Page 11: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Field case studies:Pivotal

WANdisco

Field case studies:Pivotal

WANdisco

Page 12: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

From 0 to full stack in 28 days:WANdisco case study

From 0 to full stack in 28 days:WANdisco case study

Page 13: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

New distro in 4 weeks• Major challenges for a new player:

• Need to have a stable development platform

• Offering Apache Hadoop certified binaries• Team with no prior expertise in the field• Very complex Hadoop ecosystem

landscape

Page 14: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Bigtop to help• Define component versions in the BOM• Make component changes if needed

• Run Bigtop build• Deploy the cluster w/ provided puppet• Test the cluster with integration suite• Rinse and repeat as needed

• Seamless integration into CI• Easy provisioning and incremental updates

Page 15: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •
Page 16: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •
Page 17: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •
Page 18: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •
Page 19: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Hadoop in the cloudPivotal case study

Hadoop in the cloudPivotal case study

Page 20: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Hadoop @Pivotal: history• Started at Greenplum (GPHD)

• A stand-alone Hadoop distribution• Based on a fork of Bigtop 0.3-incubating• Hadoop 1.0.1 based ecosystem• No community interaction

• Graduated as a PHD at Pivotal• A PivotalONE vision• Based on a fork of Bigtop 0.4• Hadoop 2.x based ecosystem• Integrated with HAWQ

Page 21: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Lessons for Pivotal• Think of Bigtop as “Fedora”• Become part of the community • Say “no” to forking• Work on custom requirements upstream• Participate in release planning• Leverage Bigtop's infrastructure internally

Page 22: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Lessons for Bigtop• Promote common build and RE infrastructure

• “Codifying” it• Avoid broken windows syndrome

• Bigtop releases don't patch, but vendors do• Make tests easier to use• Invest in documentation

• Whitepapers, demos, etc.

Page 23: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Road aheadRoad ahead

Page 24: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Packaging/deployment• Deployment environments:

• vanilla servers == packages ?• vanilla Vms/containers == JeOS/baking ?• specializes VMs == Osv

• Baking vs frying• Rolling upgrades

• side-by-side install

Page 25: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Validation• Investing in iTest

• Easier test management and execution• Cluster discovery and management

framework• More test cases

• Lowering entry-level barriers• Less rigid user-facing interfaces: Gradle• Mix-n-match built-in integration tests

• Becoming a TCK for Hadoop ecosystem• Engaging more into trunk testing

Page 26: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Growing the ecosystem• 1st Hadoop distribution including Apache

Spark• But there's more:

• GridGain• HBase indexer• Lipstick (not for Pig)• Ambrose• Launch it all to the Stratosphere?

Page 27: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Growth in last 6 months• Apache Spark: In-memory analytics• Phoenix: Hbase SQL frontend• Groovy• Unification of user-facing interfaces

• Gradle build system• Received proposal in include GridGain

in-memory platform

Page 28: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Community• 23 committers• Over 100 contributors• Quarterly releases• Hackathons and more

Page 29: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Demo (worth 1k words)

Page 31: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Q & AQ & A

Page 32: Apache Bigtop: a crash course in deploying a Hadoop platform · 2017-12-14 · Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop •

Thank you@c0sin@rhatr

Thank you@c0sin@rhatr