Data Lake,beyond the Data Warehouse

44
Data Lake, beyond the Warehouse 1 Cheow Lan Lake, Thailand โกเมษ จันทวิมล February, 3, 2016 Komes Chandavimol Data Science Thailand Meetup#4 Shifting to the 3rd gen platform with Data Lake

Transcript of Data Lake,beyond the Data Warehouse

Page 1: Data Lake,beyond the Data Warehouse

Data Lake, beyond the Warehouse

1 Cheow Lan Lake, Thailand

โกเมษจันทวิมลFebruary, 3, 2016

Komes Chandavimol

Data Science Thailand Meetup#4

Shifting to the 3rd gen platform with Data Lake

Page 2: Data Lake,beyond the Data Warehouse

2http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

Page 3: Data Lake,beyond the Data Warehouse

The Growth of Data

3http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

Page 4: Data Lake,beyond the Data Warehouse

4http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

Page 5: Data Lake,beyond the Data Warehouse

Can these tools support Big Data?

Spreadsheet? Database? Data Mart? Data Warehouse?

5Source: Forrester Research’s James Kobielus

Page 6: Data Lake,beyond the Data Warehouse

The Emergence of Big Data Tools

6http://blogs.forrester.com/category/hadoop

http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf

Page 7: Data Lake,beyond the Data Warehouse

HADOOP

7http://opensource.com/life/14/8/intro-apache-hadoop-big-data

Page 8: Data Lake,beyond the Data Warehouse

Analytics 3.0

Data Mining Tools

8

Data Discovery and Visualization Tools

Tableu.com, RapidMiner.com

Page 9: Data Lake,beyond the Data Warehouse

How to apply to current environment?

9http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Page 10: Data Lake,beyond the Data Warehouse

Traditional Data Warehouse

10http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Page 11: Data Lake,beyond the Data Warehouse

New Data Management Architecture

11http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Page 12: Data Lake,beyond the Data Warehouse

New Data Management Architecture

12http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Page 13: Data Lake,beyond the Data Warehouse

Data Lake

13

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 14: Data Lake,beyond the Data Warehouse
Page 15: Data Lake,beyond the Data Warehouse

Data Lake

A single place to store every type of data in its native format with no fixed limits on account size or file size, high throughput to increase analytic performance and native integration with the Hadoop ecosystem.

15

Reference: James Serra's Blog

Data Lake Development with Big Data , Pradeep Pasupuleti (2015)https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 16: Data Lake,beyond the Data Warehouse

Data Lake Processes

16

www.emc.com

Page 17: Data Lake,beyond the Data Warehouse

Data Lake and Data Warehouse

17Hadoop Distributed Compared,BlazeClan Technology,2015

Page 18: Data Lake,beyond the Data Warehouse

Data Lake and Data Warehouse

18Hadoop Distributed Compared,BlazeClan Technology,2015

Page 19: Data Lake,beyond the Data Warehouse

Data Lakes

19

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key- differences.html

Page 20: Data Lake,beyond the Data Warehouse

Data Lake

Type of Data Raw Data Derived Data Aggregated Data

Type of Environment Discovery Environment Production Environment

20The Definition of Data Lake, John O’Brien(2015)

Page 21: Data Lake,beyond the Data Warehouse

How the Data Lake works?

21http://www.clearpeaks.com/blog/category/tableau

Traditional Enterprise Data warehouse

Page 22: Data Lake,beyond the Data Warehouse

New Data Management Architecture

22http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Page 23: Data Lake,beyond the Data Warehouse

23http://www.kdnuggets.com/2014/05/big-data-landscape-v30-

analyzed.html

Page 24: Data Lake,beyond the Data Warehouse
Page 25: Data Lake,beyond the Data Warehouse

Data Lake Maturity

25The Definition of Data Lake, John O’Brien(2015)

Page 26: Data Lake,beyond the Data Warehouse

4 Maturity Stages of Data Lake

Stage 1 – Pilot Project (Understand the Technology) Stage 2 – Productionize Hadoop and its capabilities Stage 3 – Proactive consolidate data to (Big) Data Analytics Stage 4 – Platform the Data Lake to Core Competency

26The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Page 27: Data Lake,beyond the Data Warehouse

Stage 1 – Pilot Project

Handling data at scale Involves getting the plumbing in place and learning to acquire

and transform data at scale. The analytics may be quite simple, but much is learned about

making Hadoop work the way you desire.

27The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Page 28: Data Lake,beyond the Data Warehouse

Stage 2– Productionize Hadoop and its capabilities

Involves improving the ability to transform and analyze data. Find the tools that are most appropriate to their skillset Acquiring more data and build applications.

28The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Page 29: Data Lake,beyond the Data Warehouse

Stage 3 – Proactive consolidate data to (Big) Data Analytics

Involves getting data and analytics into the hands of as many people as possible.

It is in this stage that the data lake and the enterprise data warehouse start to work in unison, each playing its role.

Started with a data lake eventually added an enterprise data warehouse to operationalize its data.

29The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Page 30: Data Lake,beyond the Data Warehouse

Big Data Analytics

30http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html

Page 31: Data Lake,beyond the Data Warehouse

Data Lake and Big Data Analytics

31http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/

Page 32: Data Lake,beyond the Data Warehouse

Stage 4 – Platform the Data Lake to Core Competency

Enhance Enterprise Capabilities are added to the data lake. Few companies have reached this level of maturity, but many

will as the use of big data grows, Require Data governance, compliance, security, and auditing

(and incorporate to Company Data Strategy)

32

The Technology of the Business Data Lake, Capgemini (2013)

Page 33: Data Lake,beyond the Data Warehouse

Business Data Lake

33

The Technology of the Business Data Lake, Capgemini (2014)

Page 34: Data Lake,beyond the Data Warehouse

34https://shefsite.files.wordpress.com/2014/04/where.jpg

Page 35: Data Lake,beyond the Data Warehouse

35

Page 36: Data Lake,beyond the Data Warehouse

36

http://image.slidesharecdn.com/mapr-db-in-hadoop-nosql-overview-150929062856-lva1-

app6892/95/maprdb-the-first-inhadoop-document-database-12-638.jpg?cb=1443536326

Page 37: Data Lake,beyond the Data Warehouse

37http://www.predictiveanalyticstoday.com/waterline-data-

self-service-for-the-hadoop-data-lake/

Page 38: Data Lake,beyond the Data Warehouse

The Data Lake Unifies Data Discovery, Data Science, and BI 3.0

38

Big Data

Self Serve BusinessData Science

Machine Learning

Visual AnalyticsBusiness Discovery

Deep Learning

Self Serve Business

Hadoop

Feature Engineering

Spark

Business Intelligence 3.0

YARN

Predictive AnalyticsHive

Data Lake

Data Visualization

Graph Analytics

Big Data

Page 39: Data Lake,beyond the Data Warehouse
Page 40: Data Lake,beyond the Data Warehouse

20+ posts relates to “Data Lake” Type “Data Science Thailand” “Data Lake”

40

Page 41: Data Lake,beyond the Data Warehouse

41

Page 42: Data Lake,beyond the Data Warehouse

42http://www.clearpeaks.com/blog/category/tableau

Traditional Enterprise Data warehouse

Page 43: Data Lake,beyond the Data Warehouse

Questions?

43

Page 44: Data Lake,beyond the Data Warehouse

44