Who Needs Batch?

92
SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Who Needs Batch? By Michael Minella and Gunnar Hillert @michaelminella, @ghillert

Transcript of Who Needs Batch?

Page 1: Who Needs Batch?

SPRINGONE2GX WASHINGTON, DC

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Who Needs Batch? By Michael Minella and Gunnar Hillert

@michaelminella, @ghillert

Page 2: Who Needs Batch?
Page 3: Who Needs Batch?

Michael Minella Twitter: @michaelminella Podcast: http://javaOffHeap.com or @OffHeap Website: http://spring.io

Gunnar Hillert Twitter: @ghillert Website: http://blog.hillert.com Conference: http://devnexus.com

Page 4: Who Needs Batch?
Page 5: Who Needs Batch?

https://github.com/ghillert/s1-2gx2015-who-needs-batch

Page 6: Who Needs Batch?

W E H A V E A N ANNOUNCEMENT

Page 7: Who Needs Batch?

WE HAVE A DISEASE

Page 8: Who Needs Batch?

OBJECTIVIUS SHINIUM SYNDROMUS

Page 9: Who Needs Batch?

Shiny Object Syndrom

SHINY OBJECT SYNDROME

Page 10: Who Needs Batch?

IT MAKES US INNOVATE

Page 11: Who Needs Batch?

ROAD TO RECOVERY

Page 12: Who Needs Batch?

COOL KIDS ARE DOING STREAM PROCESSING

Page 13: Who Needs Batch?

Stream Analytics

Cloud Dataflow

Page 14: Who Needs Batch?
Page 15: Who Needs Batch?
Page 16: Who Needs Batch?

LET’S TAKE A CLOSER LOOK

Page 17: Who Needs Batch?
Page 18: Who Needs Batch?

BATCH AND STREAM DEFINITIONS

Page 19: Who Needs Batch?

LOOK AT COMMON USE CASES

Page 20: Who Needs Batch?

CONCLUSIONS

Page 21: Who Needs Batch?

DEFINITION: STREAM PROCESSING

Page 22: Who Needs Batch?

A system for processing an unbounded set of data in an asynchronous manor.

Page 23: Who Needs Batch?

DEFINITION: BATCH PROCESSING

Page 24: Who Needs Batch?

DEPENDS ON WHO YOU ASK

Page 25: Who Needs Batch?

“Batch … applications run … as special cases of stream processing applications”

- Apache Flink Documentation

Page 26: Who Needs Batch?

Batch processing is the processing of a bounded data set without interruption

or interaction

Page 27: Who Needs Batch?

KEY DIFFERENCES

Page 28: Who Needs Batch?

1 BOUNDED VS UNBOUNDED

Page 29: Who Needs Batch?

2 SYNCHRONOUS VS ASYNCHRONOUS

Page 30: Who Needs Batch?

3 STATEFUL VS STATELESS

Page 31: Who Needs Batch?

STATE OF STREAMING

Page 32: Who Needs Batch?

Stream Analytics

Cloud Dataflow

Page 33: Who Needs Batch?

Stream Analytics

Cloud Dataflow

Page 34: Who Needs Batch?

STATE OF BATCH

Page 35: Who Needs Batch?

JSR-352

Page 36: Who Needs Batch?

WHY THE END OF BATCH?

Page 37: Who Needs Batch?

LATENCY

Page 38: Who Needs Batch?

WHAT DOES THE END OF BATCH MEAN?

Page 39: Who Needs Batch?

L IMIT LATENCY

Page 40: Who Needs Batch?

“ENTERPRISE GRADE”

Page 41: Who Needs Batch?

LOOK AT COMMON USE CASES

Page 42: Who Needs Batch?

E.T.L.

Page 43: Who Needs Batch?

DATABASE TO HDFS

Page 44: Who Needs Batch?

DEMO

Page 45: Who Needs Batch?

APACHE FLINK

Streaming Dataflow Runtime

DataSet API (Batch) DataStream API

Hadoop Table ML Dataflow … Table Dataflow …

Local Remote Yarn Tez Embedded

Page 46: Who Needs Batch?

STREAMING CAN BE USEFUL IN INGESTION USE CASES

Page 47: Who Needs Batch?

TWITTER TO HDFS

Page 48: Who Needs Batch?

DATA SCIENCE

Page 49: Who Needs Batch?

BATCH TRAINING PREDICTIVE MODELS

Page 50: Who Needs Batch?

CONNECTED

CAR

Page 51: Who Needs Batch?

transformer http filter hdfs

type-transformer

python

Gemfire

REST

Hadoop

Gemfire

spark

Page 52: Who Needs Batch?

STREAMING APROXIMIATIONS EXIST ≈

Page 53: Who Needs Batch?

BREADTH AND ACCURACY DON’T MATCH BATCH ≠

Page 54: Who Needs Batch?

WORKFLOW ORCHESTRATION

Page 55: Who Needs Batch?
Page 56: Who Needs Batch?

forEachDir

start

getDirectoryInfo

join

dirSubProcss dirSubProcss …

Page 57: Who Needs Batch?

Condition 1

start

getDirAgeAndNumberOfFiles

Ingest

Condition 2

sendReminderEmail

Archive End

Page 58: Who Needs Batch?

DEMO

Page 59: Who Needs Batch?

HOW IS THIS DONE VIA STREAMING?

Page 60: Who Needs Batch?

NON-INTERACTIVE PROCESSING

Page 61: Who Needs Batch?

SAME AS ETL PROCESSING

Page 62: Who Needs Batch?

PORT SCANNER

Page 63: Who Needs Batch?

DEMO

Page 64: Who Needs Batch?

STREAMING DOESN’T MAKE SENSE

Page 65: Who Needs Batch?

STREAMING DOES MAKE SENSE

Page 66: Who Needs Batch?

WHERE LATENCY IS A PRIORITY

Page 67: Who Needs Batch?

DATA LOSS IS ACCEPTABLE

Page 68: Who Needs Batch?

UNBOUNDED DATASET

Page 69: Who Needs Batch?

BUT DOES NOT IN OTHER USE CASES

Page 70: Who Needs Batch?

HIGHLY COMPLEX

Page 71: Who Needs Batch?

ERROR HANDLING NOT AS ROBUST

Page 72: Who Needs Batch?

DATA GUARANTEES NOT AS ROBUST

Page 73: Who Needs Batch?

WHERE BATCH IS BETTER

Page 74: Who Needs Batch?

F IN ITE DATASET

Page 75: Who Needs Batch?

ROBUST ERROR HANDLING

Page 76: Who Needs Batch?

DATA GUARANTEES BATTLE TESTED

Page 77: Who Needs Batch?

BATCH LOGO SLIDE?

Page 78: Who Needs Batch?

RESOURCES ARE A CONCERN

Page 79: Who Needs Batch?

NOT EITHER OR

Page 80: Who Needs Batch?

COMBINING THE TWO PROVIDES THE MOST POWER

Page 81: Who Needs Batch?

E.T.L.v2

Page 82: Who Needs Batch?

mail http mysql2hdfs

Database to HDFS v2

Page 83: Who Needs Batch?

DEMO

Page 84: Who Needs Batch?

CONSISTENT PROGRAMMING MODEL

Page 85: Who Needs Batch?

SHARE CODE BETWEEN PARADIGMS

Page 86: Who Needs Batch?
Page 87: Who Needs Batch?

BATCH HAS LASTED BECAUSE IT MAKSE SENSE

Page 88: Who Needs Batch?

SPRING XD/DATA FLOW BEST OF BOTH WORLDS

Page 89: Who Needs Batch?

ENTERPRISE GRADE STREAM PROCESSING

Page 90: Who Needs Batch?

BEST BATCH OPTION ON THE JVM

Page 91: Who Needs Batch?
Page 92: Who Needs Batch?

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 92

Check out Spring Cloud Data Flow on https://spring.io

Apache Spark for Big Data Processing – Salon N-P Up next! High Performance Stream Processing – Salon N-P 8:30AM Tomorrow

Spring Integration Extensions Ecosystem – Here! 12:45 Tomorrow

Learn More. Stay Connected.

@springcentral Spring.io/video