Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data...
description
Transcript of Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your Most Important Data...
RRain
1
Examine the Real Cost of Storing & Analyzing your Big Data
Speakers
2
John MalloryCTO - Analytics, EMC Isilon
Jyothi Swaroop Director Product Marketing & Alliances
Structured vs. Unstructured Data Growth
Total Capacity Shipped, Worldwide Unstructured Data
80%
74%67%
71 EB 133 EB37 EB
2013 2015 2017
Source: IDC
Hadoop – “New Gateway Drug to Big Data”
4
Mature Platform Adoption Speed-up Enterprise Solutions
NAS
SAN CLOUD
TAPE
DAS
OBJECT
TRADITIONAL WORKLOADS EMERGING WORKLOADS
HPC
FILE SHARES
BACKUP/ARCHIVE
ANALYTICS
CLOUDAPPS
MOBILE
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
VALUE?
Cost of Storing Big Data - TCO
6
Source: Winter Corp Report: Big Data – What Does it Really Cost? 2014
7
Cost of Storing Big Data – 5 yrs
Source: Winter Corp Report: Big Data – What Does it Really Cost?
Traditional
(Row/ Columnar) Data Warehouse
TB 10TB 200TB PBLow Cost to Scale
Qu
ery
Res
po
nse
Hrs
Mins
Secs
Hadoop
Big Data – Cost to Scale vs. Performance
8
Big Data Volume (50TB - PB) Fast Data Load & Massive Scale Fast Query Across Large Scale Flexible Deployment Options
??
NAS
SAN
TAPE OBJECT
CLOUD
DAS
HPC
FILE SHARES
BACKUP/ARCHIVE
ANALYTICS
CLOUDAPPS
MOBILERainStor-Isilon Active
Archive
TRADITIONAL WORKLOADS EMERGING WORKLOADS
9
RainStor®
10
Derive Business Value from Your Historical Data and Meet Regulatory Demands.
The Data Archive
RainStor® - Proven
11
20 of World’s Largest
CommunicationsProviders
15 Strategic Solution & Technology
Partners
10 of World’s Biggest
Banks & FinancialInstitutions
EMC Isilon Scale-Out NAS EnvironmentClients and Applications
RESTful APIGET PUT POST DELETE
Gig-e10 Gig-eNetwork
OneFS Operating Environment
Multi-ProtocolClient/Application
Layer Ethernet Layer
Protocols
SMBNFS
FTPHTTP
HDFSfor
Hadoop
RESTfor
Object
Intra-cluster Communication
12
EMC Isilon - Industry RecognitionIsilon Systems is a successful acquisition for EMC
IDC Marketscape names EMC Isilon a Leader in Scale-Out File Storage Market
- Worldwide Scale-Out File-Based Storage, December 2012
- Critical Capabilities for Scale-Out File System Storage, January 2013
EMC Isilon “Outstanding” in Critical Capabilities for Scale-Out File
- Vendor Rating – EMC, May 2014
13
14
Solutions
15
Solutions:Analytical Archive | Compliance Archive(DW Offload) (Tape Avoidance)
TeradataNetezzaOracle ExSybase IQ
Data In Store Query Govern
Data In Store Query Govern Comply WORMSEC 17a-4; Dodd Frank
Source AppEDWDBTape
Analytical Archive: End-to-end
16
QUERY/ANALYZE
SQLBI Tools; Hive, MapReduce
SCALE – EMC Isilon
COMPRESSLOAD/VALIDATE
BillionsRecords/Day
10-40X(90%+)
AVAILABILITYReplication
DWSource
MoveRETAIN /DISPOSE
RulesBased
IN STORE QUERY GOVERN
SECURE - Enterprise-grade
Database Storage - Compression: Up to 40X
Source: Ratios vs. Raw – RainStor Benchmarks using customer data (2012-13)
3X
0
5
10
15
20
25
30
35
40
45
50
6X
40X
8X
Hadoop LZO Compressed Relational
(e.g. Oracle)
Flatfile Gzip
Columnar(e.g. Vertica)
RainStor
7X
17
Simplicity and Ease of Use
Single volume and file system that spans nodes– Directories and files striped across the cluster
Automation:– NO manual intervention
– NO reconfiguration
– NO server or client mount point or application changes
– NO data migrations
– NO RAID
EFFI
CIEN
CY
18
More scalable than traditional storage systems
Largest and Most Scalable File System
OneFS scales from 18 TB to 20 PB in a single file system, single volume
Under 1 min to scale with no downtime
Document Query
XQUERY
Query - Pick the Best Tool for the Job
20
BI AnalyticsAd-Hoc Query
InteractiveSQL-92
SQL 2013
BI TOOLSDASHBOARD
Hadoop Tools
Hadoop on Scale-out NAS
MAPREDUCEPIG, HIVE
Hadoop & Big Data
21
LOW VALUE DATA
Recommendation Engines Data Sandboxing Log Processing
Audits Regulatory Reporting (Eg. SEC, SOX) Lawful Intercept
Social Media Logs Clickstreams
Credit Card Trade Personal Information
HIGH VALUE DATA
SECURITY?
22
Security Capabilities & FeaturesSecure Large Volumes of Data on Hadoop
Data Encryption Data Masking ViewsPrivacy
Kerberos Authentication Authorization LDAP / Active Directory Linux PAM Support
Trust
Tamper-proofing Audit Trail Record-level Delete Data Disposition
Integrity
RainStor-Isilon Architecture Overview
23
Apache Projects RainStor
Programming Languages
Computation
Security
Database Storage
Object/Hardware Storage
Vendor Specific
Top of Stack
Standard SQL (with Oracle,
SQLServer, SybaseIQ extensions)
Security and Compliance(Encryption, Masking, Audit Trail, Data Disposition,
Kerberos, LDAP/Active Directory, Immutable)
RainStor Database(up to 40X Data Compression)
HDFS(Hadoop Distributed File System)
MapReduce – Batch(Distributed Programming Framework)
Hive Pig Java
NAS, SAN, CAS, NFS(On-premise, Cloud)
BI Tools, Dashboards (ODBC/JDBC Connectivity)
Visualization Layer
EMC Isilon
RainStor: Hadoop 2.0 Distro Certifications
Cloudera CDH 5.0– Certified April 2014
Hortonworks HDP 2.1– April 2014
“We are delighted with the wide range of technology solution partners that have certified on CDH 5 …it is testament to the maturity of the platform but also the overall market demand,”
Tim Stevens, VP of Business & Corporate Development
25
SolutionCompliance Archive
SEC 17a-4(f) Compliance Archive Requirements
26
Records stored in non-erasable media (WORM)
Recording process must be verifiable
Fully Accessible to Authorities & Backed-up
Records should be Recognizable & Identifiable
Downloadable to any acceptable medium
27
Case Studies
28
Challenges Cost: Data volumes in disparate trading
applications growing at 70-100% / Year - Storage costs rising @ 60% / Year
Compliance: Must provide high performance EBS and other queries for SEC
Solution A RainStor Archive for storing and reporting
against historical trade data 13 years of history loaded from Sybase IQ Daily feed from trading application to RainStor Runs on low-cost NAS Tier 3 storage and VMs RainStor completely replaced Sybase IQ
90% cost savings - $5MM ROI 6 Projects live - 13 more in Progress
90%Storage Cost Reduction
“ It’s like shrink-wrapping your data…forever!” – VP, Technology
30X Data Compression 3X Faster Query Compared to Sybase
CONFIDENTIAL
Compliance Archiving: Global Investment BankLower Compliant Data Retention Costs by a Factor of 10
BENEFITS
Enterprise Standard for Data Retention with Faster Analytics
Analytical Archiving : Large Multi-national BankRetain Trading Data, Stay Compliant at Lowest Cost
RainStor Active Archive
Equities
BAR400TB
FastForward™
29
FastConnect™
Trades200TB
CONFIDENTIAL
EMC WORMStorage
25X Compression
Meets Query SLAs
BENEFITS
Enterprise Standard for Compliance Driven Analysis
Runs on EMC Centera & Isilon (WORM)
Tape Avoidance
Challenges Cost: Fast data growth and Costly EDW’s
(Teradata & Netezza) - offload history Compliance: Must meet SEC compliance and
retain equities data for query - run on approved WORM / CAS Storage (EMC)
Avoid data on offline tape - reinstate older Teradata data (BAR) and stay compliant.
Solution 43 Equities apps (Oracle; SQL Server) offload
history to RS History offload from Netezza - run on WORM Re-instate Tape and bring online for audits. 43 Apps
RainStor + Isilon + Hadoop – TCO
Compression rate 32X (>96% cost savings)
Utilization Rate >80%
Scalability Up to 20 PB per cluster
Query Performance >= Hadoop on DAS
RainStor + Hadoop + Isilon = Lowest 5yr TCO!
Why RainStor-Isilon?
31
Flexible Architecture – Hadoop, Cloud
Extract EDW data for Active
Archiving
Lower Storage Costs by at least 90%
Gain Deeper Insights – SQL,
Hive, Pig, Search, BI tools
Reliable – High Availability,
Disaster Recovery
Purpose-built Security and Compliance
features
First SQL Compatible, Enterprise-grade Database (native to Hadoop) to run on Isilon Scale-out NAS.
33
Where Big Data & Archive Come Together
Network EDWApps TapePlatforms
RainStor – EMC Isilon Solution
RainStor for Teradata Solution - 3 Components
34
FastForward ™
Reinstates from Offline Tape Archives
Handles V2R4, V2R5, V2R6, TD12, TD13
Eliminate Tape.
FastConnect™
Offload history to Active Archive on continuous basis.
Run on Hadoop for Low Cost Scale.
RainStor Core Database:• Highly Efficient Data Store - 20-40X Compression.
35
36
Next Steps
Contact RainStor to find out more about the joint solution:[email protected]
Contact EMC to find out more:
CONFIDENTIAL 37