Overview SCALE14x 2016. Agenda/Schedule -Apache Bigtop Overview -Apache Spark Overview/Getting...

Post on 19-Jan-2018

232 views 0 download

description

What is Bigtop? Setting the standard for testing, packaging and integration of leading big/fast data components

Transcript of Overview SCALE14x 2016. Agenda/Schedule -Apache Bigtop Overview -Apache Spark Overview/Getting...

Overview

SCALE14x 2016

Agenda/Schedule-Apache Bigtop Overview-Apache Spark Overview/Getting Started-Lunch Break-Apache Ignite-Workshop, tutorial, open time

http://workshops.bigtop.rocks(click on Agenda button)

What is Bigtop?

Setting the standard for testing, packaging and integration of leading big/fast data components

and many other…

Components as Building Blocks

-------------------------------------------------------------------------

Dependency Hell!!

hdfszookeeperhbasekafkaspark...mapredooziehiveetc ---

------

------

------

------

------

------

------

------

------

-

------

------

------

------

------

------

------

------

------

----

------

------

------

------

------

------

------

------

------

----

------

------

------

------

------

------

------

------

------

----

------

------

------

------

------

------

------

------

------

----

------

------

------

------

------

------

------

------

------

----

Build all the Things!!!

The BOMBuild of Materials (BOM)

* List of >=1 components* Gradle for build/actions* Produce sets of debs/rpms

Bigtop OriginsYahoo!, 2010

Created, fostered early Hadoop communityWorking on Hadoop 0.20 stack

2011Yahoo!’s to Cloudera, solving early problems of packaging and maintaining first commercial supported Hadoop distro

Early value addProvide a common foundation for proper integration of growing number of Hadoop family components

Foundation provides solid base for validating applications running on top of the stack(s)

Provide neutral packaging and deployment/config

Early Mission AccomplishedFoundation for commercial Hadoop distros/services

Leveraged by app providers…

What now?

We are done right?1?!?

Industry/Ecosystem Evolution&

New Community Needs/Ideas

Where should we spend our time?,which users should benefit?

Moving beyond oob mapreduce…

Lambda/Stream Architectures

HDFS + Zookeeper +

Get out from the Apache dome

New focus and target end users

Data engineers vs distro builders

Enhance Operations/Deployment

Reference implementations & tutorials

Laying new foundation with 1.0+Self-starter, non-kitchen sink building -Making gradle tooling smarter -Jenkins job autogen -leveraging containers for parallelization

Data data data…Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen

Tutorial/Learning Data sets -githubarchive.org -more tbd…

Deployment/MgmtUpdated puppet modules -newest best practices -next level enhanced security options

Wider range of starter deployment topologies

Include some handling of test/tutorial data

More components…

Sounds interesting, how can I help?

*Join mailing list, ask questions, suggest features, etc

*Contribute (components, tutorials, docs)

*Report bugs

Thank You, Q&A

Nate D’Amicokaiyzen@apache.org@kaiyzen