Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10...

31
Stratosphere Stratosphere Next Generation Big Data Analytics Platform from Europe Márton Balassi Data Mining and Search Group 1 Big Data Business Intelligence Group 1 1 Computer and Automation Research Institute of the Hungarian Academy of Sciences May 11, 2014

Transcript of Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10...

Page 1: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

Stratosphere

StratosphereNext Generation Big Data Analytics Platform from Europe

Márton BalassiData Mining and Search Group1

Big Data Business Intelligence Group1

1Computer and Automation Research Institute of the Hungarian Academy of Sciences

May 11, 2014

Page 2: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

Stratosphere

Table of Contents

Motivation

The 10 commandments for Big Data Analytics

Project info

Page 3: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

Page 4: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapEva Andreasson (Cloudera), 2014

I Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

Page 5: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheap

Matthew Komorowski, 2014I Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

Page 6: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st century

Harvard Business Review, 2012I It’s a piece of cake . . .

Page 7: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

Page 8: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . . Or is it?

Page 9: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereMotivation

The Big Data scene

The Big Data scene

Image courtesy of Matt Turck and Shivon Zilis

Page 10: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

Page 11: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 12: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 13: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 14: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 15: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 16: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 17: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

Page 18: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

Contributors

Page 19: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

1. Thou shalt use declarative programming

1. Thou shalt use declarative programming

K-Means Clustering in Stratosphere’s Scala front-end

Page 20: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

2. Thou shalt accept external (dynamic) sources

2. Thou shalt accept external (dynamic) sources

„In situ” data – no load

Page 21: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

3. Thou shalt use rich primitives

3. Thou shalt use rich primitives

Beyond MapReduce

Page 22: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

3. Thou shalt use rich primitives

3. Thou shalt use rich primitives

Beyond MapReduce

Page 23: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

4. Thou shalt deeply embed UDFs

4. Thou shalt deeply embed UDFs

Flexible and transparent

Page 24: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

5. Thou shalt optimize

5. Thou shalt optimize

Auto-parallelization and optimization as in relational databases

Page 25: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

6. Thou shalt iterate

6. Thou shalt iterate

Needed for most interesting analysis cases

Page 26: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

7. Thou shalt use a scalable and efficient execution engine

7. Thou shalt use a scalable and efficient executionengine

Reliable and robust infrastructure

Page 27: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

8. Thou shalt tackle streaming

8. Thou shalt tackle streaming

Integration of low latency jobs

Page 28: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

9. Thou shalt provide a common API through the whole framework

9. Thou shalt provide a common API through the wholeframework

Batch? BSP? Streaming? You just write the same code. . .

Page 29: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereThe 10 commandments for Big Data Analytics

10. Thou shalt support the lambda architecture

10. Thou shalt support the lambda architecture

Combine the reliability of batch and the speed of streaming toenable real-time queries on large datasets

First hourof input

. . .1 to 2hours

old input

Less thanan hourold input

Output

Batch1 . . . Batchn−1

Streaming1 . . . Streamingn−1 Streamingn

Page 30: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereProject info

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

Page 31: Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

StratosphereProject info

Where to look for us

Where to look for us

Project homepageThe project can be found at stratosphere.eu.The homepage served as a source for the code and most of thepictures presented on these slides.

Data Mining and Search & Big Data BI GroupsThe webpage of Budapest team members’ research groups can befound at dms.sztaki.hu and at bigdatabi.sztaki.hu.

Márton [email protected]