Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data...

37
RRain 1 Examine the Real Cost of Storing & Analyzing your Big Data

description

Are you storing larger than necessary quantities in your data warehouse, RDBMS, and line of business applications? Are you spending a large portion of your budget on Teradata or Netezza with costs continually climbing as data volumes grow? Are you getting the right ROI for all the data you store in your data warehouses? Read this deck to find out: What is the cost of storing your critical Big Data assets? What workloads are best suited for data warehouses, which for Hadoop, and why? Advantages of running Hadoop on scale-out NAS. Importance of Security and Data Governance for critical data assets. How to maintain data warehouse performance even with high growth rates.

Transcript of Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data...

Page 1: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RRain

1

Examine the Real Cost of Storing & Analyzing your Big Data

Page 2: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Speakers

2

John MalloryCTO - Analytics, EMC Isilon

Jyothi Swaroop Director Product Marketing & Alliances

Page 3: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Structured vs. Unstructured Data Growth

Total Capacity Shipped, Worldwide Unstructured Data

80%

74%67%

71 EB 133 EB37 EB

2013 2015 2017

Source: IDC

Page 4: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Hadoop – “New Gateway Drug to Big Data”

4

Mature Platform Adoption Speed-up Enterprise Solutions

Page 5: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

NAS

SAN CLOUD

TAPE

DAS

OBJECT

TRADITIONAL WORKLOADS EMERGING WORKLOADS

HPC

FILE SHARES

BACKUP/ARCHIVE

ANALYTICS

CLOUDAPPS

MOBILE

5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

VALUE?

Page 6: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Cost of Storing Big Data - TCO

6

Source: Winter Corp Report: Big Data – What Does it Really Cost? 2014

Page 7: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

7

Cost of Storing Big Data – 5 yrs

Source: Winter Corp Report: Big Data – What Does it Really Cost?

Page 8: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Traditional

(Row/ Columnar) Data Warehouse

TB 10TB 200TB PBLow Cost to Scale

Qu

ery

Res

po

nse

Hrs

Mins

Secs

Hadoop

Big Data – Cost to Scale vs. Performance

8

Big Data Volume (50TB - PB) Fast Data Load & Massive Scale Fast Query Across Large Scale Flexible Deployment Options

??

Page 9: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

NAS

SAN

TAPE OBJECT

CLOUD

DAS

HPC

FILE SHARES

BACKUP/ARCHIVE

ANALYTICS

CLOUDAPPS

MOBILERainStor-Isilon Active

Archive

TRADITIONAL WORKLOADS EMERGING WORKLOADS

9

Page 10: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor®

10

Derive Business Value from Your Historical Data and Meet Regulatory Demands.

The Data Archive

Page 11: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor® - Proven

11

20 of World’s Largest

CommunicationsProviders

15 Strategic Solution & Technology

Partners

10 of World’s Biggest

Banks & FinancialInstitutions

Page 12: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

EMC Isilon Scale-Out NAS EnvironmentClients and Applications

RESTful APIGET PUT POST DELETE

Gig-e10 Gig-eNetwork

OneFS Operating Environment

Multi-ProtocolClient/Application

Layer Ethernet Layer

Protocols

SMBNFS

FTPHTTP

HDFSfor

Hadoop

RESTfor

Object

Intra-cluster Communication

12

Page 13: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

EMC Isilon - Industry RecognitionIsilon Systems is a successful acquisition for EMC

IDC Marketscape names EMC Isilon a Leader in Scale-Out File Storage Market

- Worldwide Scale-Out File-Based Storage, December 2012

- Critical Capabilities for Scale-Out File System Storage, January 2013

EMC Isilon “Outstanding” in Critical Capabilities for Scale-Out File

- Vendor Rating – EMC, May 2014

13

Page 14: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

14

Solutions

Page 15: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

15

Solutions:Analytical Archive | Compliance Archive(DW Offload) (Tape Avoidance)

TeradataNetezzaOracle ExSybase IQ

Data In Store Query Govern

Data In Store Query Govern Comply WORMSEC 17a-4; Dodd Frank

Source AppEDWDBTape

Page 16: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Analytical Archive: End-to-end

16

QUERY/ANALYZE

SQLBI Tools; Hive, MapReduce

SCALE – EMC Isilon

COMPRESSLOAD/VALIDATE

BillionsRecords/Day

10-40X(90%+)

AVAILABILITYReplication

DWSource

MoveRETAIN /DISPOSE

RulesBased

IN STORE QUERY GOVERN

SECURE - Enterprise-grade

Page 17: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Database Storage - Compression: Up to 40X

Source: Ratios vs. Raw – RainStor Benchmarks using customer data (2012-13)

3X

0

5

10

15

20

25

30

35

40

45

50

6X

40X

8X

Hadoop LZO Compressed Relational

(e.g. Oracle)

Flatfile Gzip

Columnar(e.g. Vertica)

RainStor

7X

17

Page 18: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Simplicity and Ease of Use

Single volume and file system that spans nodes– Directories and files striped across the cluster

Automation:– NO manual intervention

– NO reconfiguration

– NO server or client mount point or application changes

– NO data migrations

– NO RAID

EFFI

CIEN

CY

18

Page 19: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

More scalable than traditional storage systems

Largest and Most Scalable File System

OneFS scales from 18 TB to 20 PB in a single file system, single volume

Under 1 min to scale with no downtime

Page 20: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Document Query

XQUERY

Query - Pick the Best Tool for the Job

20

BI AnalyticsAd-Hoc Query

InteractiveSQL-92

SQL 2013

BI TOOLSDASHBOARD

Hadoop Tools

Hadoop on Scale-out NAS

MAPREDUCEPIG, HIVE

Page 21: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Hadoop & Big Data

21

LOW VALUE DATA

Recommendation Engines Data Sandboxing Log Processing

Audits Regulatory Reporting (Eg. SEC, SOX) Lawful Intercept

Social Media Logs Clickstreams

Credit Card Trade Personal Information

HIGH VALUE DATA

SECURITY?

Page 22: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

22

Security Capabilities & FeaturesSecure Large Volumes of Data on Hadoop

Data Encryption Data Masking ViewsPrivacy

Kerberos Authentication Authorization LDAP / Active Directory Linux PAM Support

Trust

Tamper-proofing Audit Trail Record-level Delete Data Disposition

Integrity

Page 23: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor-Isilon Architecture Overview

23

Apache Projects RainStor

Programming Languages

Computation

Security

Database Storage

Object/Hardware Storage

Vendor Specific

Top of Stack

Standard SQL (with Oracle,

SQLServer, SybaseIQ extensions)

Security and Compliance(Encryption, Masking, Audit Trail, Data Disposition,

Kerberos, LDAP/Active Directory, Immutable)

RainStor Database(up to 40X Data Compression)

HDFS(Hadoop Distributed File System)

MapReduce – Batch(Distributed Programming Framework)

Hive Pig Java

NAS, SAN, CAS, NFS(On-premise, Cloud)

BI Tools, Dashboards (ODBC/JDBC Connectivity)

Visualization Layer

EMC Isilon

Page 24: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor: Hadoop 2.0 Distro Certifications

Cloudera CDH 5.0– Certified April 2014

Hortonworks HDP 2.1– April 2014

“We are delighted with the wide range of technology solution partners that have certified on CDH 5 …it is testament to the maturity of the platform but also the overall market demand,”

Tim Stevens, VP of Business & Corporate Development

Page 25: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

25

SolutionCompliance Archive

Page 26: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

SEC 17a-4(f) Compliance Archive Requirements

26

Records stored in non-erasable media (WORM)

Recording process must be verifiable

Fully Accessible to Authorities & Backed-up

Records should be Recognizable & Identifiable

Downloadable to any acceptable medium

Page 27: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

27

Case Studies

Page 28: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

28

Challenges Cost: Data volumes in disparate trading

applications growing at 70-100% / Year - Storage costs rising @ 60% / Year

Compliance: Must provide high performance EBS and other queries for SEC

Solution A RainStor Archive for storing and reporting

against historical trade data 13 years of history loaded from Sybase IQ Daily feed from trading application to RainStor Runs on low-cost NAS Tier 3 storage and VMs RainStor completely replaced Sybase IQ

90% cost savings - $5MM ROI 6 Projects live - 13 more in Progress

90%Storage Cost Reduction

“ It’s like shrink-wrapping your data…forever!” – VP, Technology

30X Data Compression 3X Faster Query Compared to Sybase

CONFIDENTIAL

Compliance Archiving: Global Investment BankLower Compliant Data Retention Costs by a Factor of 10

BENEFITS

Enterprise Standard for Data Retention with Faster Analytics

Page 29: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Analytical Archiving : Large Multi-national BankRetain Trading Data, Stay Compliant at Lowest Cost

RainStor Active Archive

Equities

BAR400TB

FastForward™

29

FastConnect™

Trades200TB

CONFIDENTIAL

EMC WORMStorage

25X Compression

Meets Query SLAs

BENEFITS

Enterprise Standard for Compliance Driven Analysis

Runs on EMC Centera & Isilon (WORM)

Tape Avoidance

Challenges Cost: Fast data growth and Costly EDW’s

(Teradata & Netezza) - offload history Compliance: Must meet SEC compliance and

retain equities data for query - run on approved WORM / CAS Storage (EMC)

Avoid data on offline tape - reinstate older Teradata data (BAR) and stay compliant.

Solution 43 Equities apps (Oracle; SQL Server) offload

history to RS History offload from Netezza - run on WORM Re-instate Tape and bring online for audits. 43 Apps

Page 30: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor + Isilon + Hadoop – TCO

Compression rate 32X (>96% cost savings)

Utilization Rate >80%

Scalability Up to 20 PB per cluster

Query Performance >= Hadoop on DAS

RainStor + Hadoop + Isilon = Lowest 5yr TCO!

Page 31: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Why RainStor-Isilon?

31

Flexible Architecture – Hadoop, Cloud

Extract EDW data for Active

Archiving

Lower Storage Costs by at least 90%

Gain Deeper Insights – SQL,

Hive, Pig, Search, BI tools

Reliable – High Availability,

Disaster Recovery

Purpose-built Security and Compliance

features

First SQL Compatible, Enterprise-grade Database (native to Hadoop) to run on Isilon Scale-out NAS.

Page 32: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Thank You [email protected]

32

The ActiveArchive

Page 33: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

33

Where Big Data & Archive Come Together

Network EDWApps TapePlatforms

RainStor – EMC Isilon Solution

Page 34: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

RainStor for Teradata Solution - 3 Components

34

FastForward ™

Reinstates from Offline Tape Archives

Handles V2R4, V2R5, V2R6, TD12, TD13

Eliminate Tape.

FastConnect™

Offload history to Active Archive on continuous basis.

Run on Hadoop for Low Cost Scale.

RainStor Core Database:• Highly Efficient Data Store - 20-40X Compression.

Page 35: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

35

Page 36: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

36

Page 37: Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data cost_webcast_final[1]

Next Steps

Contact RainStor to find out more about the joint solution:[email protected]

Contact EMC to find out more:

CONFIDENTIAL 37