Oracle, Hadoop, and the Big Data Revolution

Post on 15-Jan-2022

13 views 0 download

Transcript of Oracle, Hadoop, and the Big Data Revolution

Oracle, Hadoop, and the Big Data Revolution

Guy HarrisonExecutive Director, R&D Information Management

Introductions

Web: guyharrison.netEmail: guy.harrison@software.dell.comTwitter: @guyharrison

But Seriously

Oracle OpenWorld 2013

What is Big Data?

Three or Four “V”s

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

Data volumes have always been increasing….

2006 Perspective

Though the absolute volumes are boggling…

2.81E+15

1.10E+17

5.48E+18

4.87E+18

1.18E+21

2.13E+21

1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21

Human Brain

Google

Living Human Genomes

Digital information 2008

Total Digital capacity

Digital information created 2011

Gigabyte Terabyte Petabyte Exabyte zettabyte

Oracle OpenWorld 2013

Velocity

Oracle OpenWorld 2013

Variety

– or the industrial Revolution of data

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

Data: now and then Generated

internally

Key to operational

efficiency

1993Generated externally

Key to competitiveness

Source of product

innovation

Changing our world

2013

Big Data is the culmination of cloud, social and mobile

Oracle OpenWorld 2013

Big Data can be deadly

Will Big Data kill retail?

Prevalence of Showrooming

0 10 20 30 40 50 60 70

Consumer Electronics

Home Improvement

Pct

Garter Research G00249458

Survey Analysis: Focus on Customer Basics to Challenge Amazon, as 'Showrooming' Is Universal but Not Unbeatable

Published: 12 February 2013

Why showrooming?

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences?

Some novel defenses

Web analytics for retail

First mover advantage

• The First vendor to offer you a product at a good price has the advantage

• It is totally insufficient to lay a bunch of products on a table in a building

• Only big data analytics can provide this first mover advantage

There’s a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

Security

FinanceGovernment

Science

Healthcare

Insurance

Telecom

Advertising

The Revolution is not over yet

Willy Bowman

Nationality: German

Don’t Mention the WAR!

Buying choices:

Amazon softcover: $45.99

Oracle Performance Survival Guide

Amazon Kindle: $39.99

Say “screw you bookseller” to buy kindle version

Brain Control

Muze

The instrumented human

• Bluetooth Personal Area Network

• 3G/WiFi Wide Area Network

• GPS

• Storage

• Pulse, temp monitor

• Silent alarms

• Pedometer, sleep monitoring

• Compass

• Camera

• Mike/earphones

• Heads up display

• Emotion/Attention monitor

The instrumented world

All of which accelerates what we call Big Data

Oracle OpenWorld 2013

Big Database technologies

Pioneers of Big Data

Google File System (GFS)

Map Reduce BigTable

Google ApplicationsGoogle Software Architecture

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

Schema on Read vsSchema on Write

Data

Analyse

Aggregate

Normaliz

e

Cleanse

Code

ExtractLoad Transform Data

Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

Oracle OpenWorld 2013

Hadoop: Open Source Map-Reduce Stack

Hadoop at Yahoo

Yahoo! Hadoop cluster:• 4000 nodes• 16PB disk• 64 TB of RAM• 32,000 Cores

Hadoop File System (HDFS)

Map Reduce/ YARNHbase

(Database)

ZooKeeper

(Locking)

SQOOP

(RDBMS loader)

Hive

(Query)

Pig

(Scripting)

Flume

(Log Loader)

Oozie (Workflow manager)

Hadoop 1.0 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA, PIG, HIVE)

HDFS (DISTRIBUTED STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODETASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

Hadoop 2.0 YARN*

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA, PIG, HIVE)

*Yet Another Resource Negotiator

Tez1

1Hindi for “fast”

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

HBase A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffer

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

Name Site Counter

Dick Ebay 507,018

Dick Google 690,414

Jane Google 716,426

Dick Facebook 723,649

Jane Facebook 643,261

Jane ILoveLarry.com 856,767

Dick MadBillFans.com 675,230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarry.com

5 MadBillFans.com

NameId SiteId Counter

1 1 507,018

1 3 690,414

2 3 716,426

1 3 723,649

2 3 643,261

2 4 856,767

1 5 675,230

Id Name Ebay Google Facebook (other columns) MadBillFans.com

1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230

Id Name Google Facebook (other columns) ILoveLarry.com

2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767

Hbase Data Model

Oracle OpenWorld 2013

Hive

SQL

JA

VA

RE

SU

LTS

Other SQL-like Hadoop Interfaces

Cloudera Impala MapR Drill Aster

Greenplumb (Pivotal HD)

Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

Pig

Pig Latin

SQL or Hive QL

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

Oracle Exadata

Database servers

64 cores, 576 GB

RAM

Storage Servers

112 cores,

100 TB SAS or

336 TB SATA plus

5 TB SSD

Economies

$4,911

$750

$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

Exadata

Hadoop

Exadata vs Hadoop $$/TB (Hardware only)

Oracle Big Data Appliance

• 18 Sun X4270 M2 servers– 48GB RAM per node (864GB total)

– 2x6 Core CPU per node (216 total)

– 12x2TB HDD per node (216 spindles, 864 TB)

– 40Gb/s Infiniband between nodes

– 10Gb/s Ethernet to datacentre

• Competitive Pricing

www.oracle.com/us/bigdata/index.html

Big Data Appliance Software

• Cloudera Enterprise

• Oracle Enterprise R

• Oracle NoSQL

• Oracle Big Data Connectors

Generating competitive advantage through “Big Data analytics”

Machine LearningPrograms that evolve with “experience”

Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

Collective Intelligence

Google Flu Trends

Collective Intelligence outsmarts Artificial Intelligence?

Oracle OpenWorld 2013

Artificial Intelligence Strikes back

Oracle OpenWorld 2013

Watson is big data AI

Predictive Analytics

y = 0.9715x + 0.7191

-20

0

20

40

60

80

100

120

0 20 40 60 80 100 120

SupervisedMachine Learning

Raw Data Clean

Validate Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

Inmaps.linkedin.com

Unsupervised learning

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Security

•Vulnerability

•Penetration Detection

Fraud Detection

CRM

•Churn

•Defaults

Medical

•Risk analysis

•Diagnosis

•Prognosis

Game optimization

Advertising

•Targeting

•Tailoring

Data Science is hard• Machine learning, collective

intelligence, Hadoop, predictive analytics, R, Weka, Mahout, are HARD

• Small-medium businesses need help to compete

• Data scientists to the rescue?

Data Scientists to the rescue?

Kitenga Analytics Suite

Toad for Hadoop

http://www.toadworld.com/products/

toad-for-hadoop/default.aspx

SharePlex® for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit / Change

Data

HBase RealTime replication

Toad BI Suite

Recommendations For your business

How could data and algorithms transform your business?

What are the technologies that will be most important?

• Mobility

• Cloud

• Hadoop

• Big Data Analytics

Where is the data?• Start collecting now!

Hadoop and NoSQL creates strong career opportunities for DBAs and developers

• Demand will exceed supply for the foreseeable future

Lot’s of opportunities for those with Math & Statistics

• Good time to brush off that statistics textbook and play with R (maybe Oracle Enterprise R?)

Easy to get started with Hadoop• SQOOP

• Hive

• Pig

Recommendations For your career

Oracle OpenWorld 2013

Dell Toad Party!

Dell/TOAD Party, Tue 6:30-9:30p, Tonga Room, Fairmont Hotel

Web: guyharrison.net Email: guy.harrison@software.dell.com

Twitter: @guyharrison

www.toadworld.com