Big data analysis in java world

31
Big Data Analysis in Java World by Serhiy Masyutin

Transcript of Big data analysis in java world

Page 1: Big data analysis in java world

Big Data Analysis in Java Worldby Serhiy Masyutin

Page 2: Big data analysis in java world

Agenda

The Big Data Problem Map-Reduce MPP-based Analytical Database In-Memory Data Grid Real-Life Project Q&A

Page 3: Big data analysis in java world

The Big Data Problem

http://www.datameer.com/images/product/big_data_hadoop/img_bigdata.png

Page 4: Big data analysis in java world

The Big Data Problem

Map-Reduce MPP AD IMDG

When do I need it?

In an hour In a minute Now

What do I need to do with it?

Exploratory analytics

Structured analytics

Singular event processing

(some analytics),

Transactions

How will I query and search?

Unstructured Ad hoc SQL Structured

How do I need to store it?

I do, but not required to

I must and I am required to

Temporarily

Where is it coming from?

File/ETL File/ETL Event/Stream/File/

ETLhttp://blog.pivotal.io/pivotal/products/exploring-big-data-solutions-when-to-use-hadoop-vs-in-memory-vs-mpp

Page 5: Big data analysis in java world

The Big Data Problem

Map-Reduce

MPP AD IMDG

Transactions

Customer records

Geo-spatial

Sensors

Social Media

XML, JSON

Raw Logs

Text

Image

Video

more

pro

cessin

g

http://blog.pivotal.io/big-data-pivotal/products/exploratory-data-science-when-to-use-an-mpp-database-sql-on-hadoop-or-map-reduce

Page 6: Big data analysis in java world

The Big Data Problem

Data is not Information

- Clifford Stoll

Page 7: Big data analysis in java world

Map-Reduce

http://jeremykun.files.wordpress.com/2014/10/mapreduceimage.gif?w=1800

Page 8: Big data analysis in java world

Map-Reduce

https://anonymousbi.files.wordpress.com/2012/11/hadoopdiagram.png

Page 9: Big data analysis in java world

Map-Reduce

http://hadoop.apache.org/docs/r1.2.1/images/hdfsarchitecture.gif

Page 10: Big data analysis in java world

Map-Reduce

https://anonymousbi.files.wordpress.com/2012/11/hadoopdiagram.png

Page 11: Big data analysis in java world

Map-Reduce

Volume Variety VelocityMedium-

LargeUnstructure

d dataBatch

processing

Page 12: Big data analysis in java world

MPP Analytical Database

http://www.ndm.net/datawarehouse/images/stories/greenplum/gp-dia-3-0.png

Page 13: Big data analysis in java world

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagram.png

Page 14: Big data analysis in java world

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagramOneNodeDown.png

Page 15: Big data analysis in java world

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagramTwoNodesDown.png

Page 16: Big data analysis in java world

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/DataK-Safety-K2Nodes2And3Failed.png

Page 17: Big data analysis in java world

MPP Analytical Database

JDBC

http://www.ndm.net/datawarehouse/images/stories/greenplum/gp-dia-3-0.png

Page 18: Big data analysis in java world

MPP Analytical Database

Volume Variety VelocitySmall-

Medium-Large

Structured data

Interactive

ASTER DATABASE

Matrix

Page 19: Big data analysis in java world

In-Memory Data Grid

https://ignite.incubator.apache.org/images/in_memory_data.png

Page 20: Big data analysis in java world

In-Memory Data Grid

https://ignite.incubator.apache.org/images/in_memory_data.png

Page 21: Big data analysis in java world

In-Memory Data Grid

https://ignite.incubator.apache.org/images/in_memory_compute.png

Page 22: Big data analysis in java world

In-Memory Data Grid

http://hazelcast.com/wp-content/uploads/2013/12/IMDGEmbeddedMode_w1000px.png

Page 23: Big data analysis in java world

In-Memory Data Grid

Volume Variety VelocitySmall-

MediumStructured

data(Near) Real-

Time

Page 24: Big data analysis in java world

Real-Life Project

Sensor data Currently number of devices

doubles every year Data flow ~200GB/month Target data flow

~500GB/month

Page 25: Big data analysis in java world

Real-Life Project

Requirements

When do I need it? In a minute

What do I need to do with it?

Structured analytics

How will I query and search?

Ad hoc SQL

How do I need to store it? I must and I am required to

Where is it coming from? XML

Page 26: Big data analysis in java world

Real-Life Project

Time-series data RESTful API Extendable analytics Scalability Speed to Market

Page 27: Big data analysis in java world

Real-Life Project

Page 28: Big data analysis in java world

Availability Zone C

Availability Zone B

Availability Zone A

Real-Life Project

Processor

Raw message store

Client API

Collector

Analytic Executor Pool

Analytics API

Clients

Devices

3rd Party Services

Post-Processor

UI

Recent data store

Permanent data store

Page 29: Big data analysis in java world

Real-Life Project

Vertica stores time-series data only Append-only data store Store organizational data separately Use Vertica’s ExternalFilter for data

load R analytics as UDFs on Vertica Scale Vertica cluster accordingly

Page 30: Big data analysis in java world

Real-Life Project

Choose the right tool for the job, late changes are expensive

You can do everything yourself. Should you?

Page 31: Big data analysis in java world

Q&A