Big Data vs Data Warehousing

41
Thomas Kejser [email protected] http://blog.kejser.org @thomaskejser Bigdata vs. Data Warehousing Synergy or Conflict?

description

An attempt to fi

Transcript of Big Data vs Data Warehousing

Page 1: Big Data vs Data Warehousing

Thomas Kejser

[email protected]

http://blog.kejser.org

@thomaskejser

Bigdata vs. Data Warehousing

Synergy or Conflict?

Page 2: Big Data vs Data Warehousing

Thomas Kejserhttp://blog.kejser.org@thomaskejser

• Formerly: Lead SQLCAT EMEA• Now: CTO FusionIo EMEA

• 15 year database experience• Performance Tuner

Who is this Guy?

Page 3: Big Data vs Data Warehousing

Billi

on H

uman

s

Year2000 2050 2100 2150 2200 22505

6

7

8

9

10

Source: United Nations Projections

Human Consciousness Doesn’t Scale

Page 4: Big Data vs Data Warehousing

Text Messages in a Table

CREATE TABLE AllTexts (

Sender BIGINT 8B

, Receiver BIGINT 8B

, SenderLocation BIGINT 8B

, ReceiverLocation BIGINT 8B

, Time DATETIME 8B , SMS VARCHAR(140) 140B

)= 180Bytes

Page 5: Big Data vs Data Warehousing

How much do we text?

• World Average• 6.1 Trillion Text Messages / year• About 80% cell phone coverage• 7 billion people• 3 messages/day/person

• But: • Teenagers: 50 messages/day

Source: Pew Internet Research 2010 & ITU

Page 6: Big Data vs Data Warehousing

How much will we EVER text?

• 9B people acting like teenagers (in 2050)• 50 texts/day

• That’s 450 billion texts/day• 164 Trillion texts/year (20x today)• 180 bytes each• Assume x3 compression

• Approximation: 10 Petabytes/year in 2050

Page 7: Big Data vs Data Warehousing

LOGCapacity GB

Year

Can it be done?

Moore’s Hard Drives

Page 8: Big Data vs Data Warehousing

How Large is this/year?

Hard Disk (4TB) : 2.5”

About 1500 Wine Bottles

Wine Bottle (75cl): 4.0”

Page 9: Big Data vs Data Warehousing

• Calculating:• 2U Storage=24 Disks

(includes compute)• 4TB per Disk• 100TB in 2U (a bit

less)• 10PB = 200U storage

• About six racks

In the Data Center

Page 10: Big Data vs Data Warehousing

Warehouses Serve us Well..

Page 11: Big Data vs Data Warehousing

• Good Management Interfaces

• Standard SQL• with a few extensions

• Appliances• Support system• Homogenous HW

• In chunks

… And it is Becoming a Commodity

Page 12: Big Data vs Data Warehousing

vs.

Page 13: Big Data vs Data Warehousing

PDW vs. Hive – Scan/seek

SELECT count(*) FROM lineitem

Query 1 Query 20

200400600800

100012001400

HivePDW

Secs.

SELECT max(l_quantity) FROM lineitem WHERE l_orderkey > 1000 and l_orderkey < 100000 GROUP BY l_linestatus

Query 1 Query 2

Page 14: Big Data vs Data Warehousing

Hive PDW-U PDW-P0

5001000150020002500300035004000

Series1

Secs.

PDW vs. Hive - Joins

PDW-U: • orders partitioned on c_custkey • lineitem partitioned on l_partkey

PDW-P: • orders partitioned on o_orderkey• lineitem partitioned on

l_orderkey

SELECT max(l_orderkey) FROM ordersJOIN lineitem ON l_orderkey = o_orderkey

Page 15: Big Data vs Data Warehousing

• Thread startup times• Co-location awareness• Files vs. optimized DB memory

structures• Column stores and other DB tech

Generic is good…

… but when there is structure, make use of it!

What does Big Data need to Catch up?

Page 16: Big Data vs Data Warehousing

• What is BigdataVery Unstructured Data

Page 17: Big Data vs Data Warehousing

How many Pictures of Cats?

• Flickr Today: • 300MB/month • 2GB/year• 51M users (too small?)

• Estimate: 102 PB / year

• 10 x text messages

Source: WikiPedia

Page 18: Big Data vs Data Warehousing

How big is this in wine bottles?

Page 19: Big Data vs Data Warehousing

We have learned how to store it!

Page 20: Big Data vs Data Warehousing

• Distributed File System

• Open Source• No more SAN

• The Failure Unit is the Server

What is HDFS?

Page 21: Big Data vs Data Warehousing

Fully unstructured data is boring

…Unless you get money for storing it

Page 22: Big Data vs Data Warehousing

Acquiring Personal Information

Your Semi-structured Data, the Old Fashioned Way

Page 23: Big Data vs Data Warehousing

The Social Angle

Who do you talk to and how often?

Page 24: Big Data vs Data Warehousing

The Reasons

Why do you own a cell phone?

Page 25: Big Data vs Data Warehousing

Your Semi-structured Data, For Free

- at The PubSaturday, 1:39am

Page 26: Big Data vs Data Warehousing

Big Value

Extraction of

of meaning and insight

from semi-structured data

Page 27: Big Data vs Data Warehousing

Extracting Meaning from Humans

Method Examples

Turn semi-structure to structure Image recognition, network proximity and super nodes, social media

Needle in a haystack Extract outliers, Fraud

Herd behaviors Clustering, Pattern Recognition, “Customers who bought this also bought”

Text classification and search Text indexes, syntactic counting, pagerank

Text to structure Semantic analysis, loose structure into structure

Page 28: Big Data vs Data Warehousing

Find New Customers

“Michael, who is respected among his peers, often talks about his new, coolgadgets”

Michael

Thomas

Tommy

Page 29: Big Data vs Data Warehousing

Cross Sell

“Families who own an Aston Martin will often buy a Mini Cooper too”

Page 30: Big Data vs Data Warehousing

Free Information

Page 31: Big Data vs Data Warehousing

Need: Lots of CPU Cores!

Page 32: Big Data vs Data Warehousing

Need: Data Centers!

Page 33: Big Data vs Data Warehousing

Provisioning has to be REALLY fast

Page 34: Big Data vs Data Warehousing

• Get good at • Statistics (again)• Distributed Algorithms• Tuning

• Understand Physical Constraints

• Acquire deep domain knowledge

Things to Learn for the Future

Page 35: Big Data vs Data Warehousing

Something is Changing

Today Tomorrow

YouCAPEX Hardware OPEX Hardware

Page 36: Big Data vs Data Warehousing

The Mother of All Stovepipes

Page 37: Big Data vs Data Warehousing

Data you are afraidto lose

Big Data / Staging(No Model)

Delivery(Model)

Data You actually need

Page 38: Big Data vs Data Warehousing

Synergy

Create Structure for me

Here is a tableWarehouse

Page 39: Big Data vs Data Warehousing

Applying Social Media to Structure

Page 40: Big Data vs Data Warehousing

Data Warehouse

• There is a model• Seek Co-location• Respond in seconds• Calculate first, query after• Expensive HW• Optimise for target HW• Homogenous HW• Pay vendor, expect

optimised

Big Data

• Don’t bother modeling!• Optional Co-Location• Respond in minutes• Calculate while querying• Cheap HW• Good enough on all HW• Heterogeneous HW• Free license, optimise

yourself

Summary

Page 41: Big Data vs Data Warehousing

Q A&