EMC Big Data | Hadoop Starter Kit | EMC Forum 2014

20
1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved. Delivering Hadoop-as-a-Service To Your Organization

Transcript of EMC Big Data | Hadoop Starter Kit | EMC Forum 2014

1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Delivering Hadoop-as-a-Service To Your Organization

2 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Why Hadoop?

Oil Exploration Medical Imaging

Video Surveillance Mobile Sensors

Smart Grids

Social Media Internet of Things

Dark Data

Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources

3 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Why Hadoop?

Improve Company

Performance

Increase Revenue

Increase Demand

Increase Spend Efficiency

Ad Optimization

Hyper Targeting

Campaign Optimization

Ad Effectiveness

Analytics

Market Mix Modeling

Coupon Redemption

Increase Customer Acquisition

Purchase Funnel Analysis

Increase Customer Engagement

Customer Segmentation

Churn Prevention

Customer Lifetime Value

Increase Basket Size

Affinity Analytics

Next Best Offer

Cross-Sell / Upsell

Manage Demand

Demand Analysis

Price Optimization

Build Brand Equity

Increase Reach

Digital Marketing

Social Media

Improve Customer Loyalty

Social Graph / Influencers

Loyalty Program Analytics

Customer Satisfaction

Customer Care Analytics

Reduce Costs

Click Fraud

Transaction Anomaly Detection

Production Cost / Efficiency

Supply / Demand Forecasting

General and Administrative

Workforce Analytics

Employee Churn

IT / Security Analytics

Save Money Or Make Money

4 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Overview

Hadoop is an open-source framework from Apache that allows for parallel batch processing of very large data sets

MapReduce is the Hadoop process that divides the workload so multiple devices can process it

HDFS is the file system for the data. It provides data protection and locality with multiple mirrors (usually 3 times)

5 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

IT Challenges With Hadoop

• Time consuming and complex creating shadow IT

• Bare metal capacity utilization is low

• Multiple Hadoop Distribution deployments creating data siloes

6 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Typical Enterprise Deployment

• Multiple, siloed clusters to manage

• Redundant common data in separate clusters

• Peak compute and I/O resource is limited to number of nodes in each independent cluster

Production

Test

Experimentation

Dept A: Recommendation engine Dept B: Ad targeting

Production

Test

Experimentation

Log files

Social data Historical cust behavior

7 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

What If You Consolidate & Virtualize?

Production

Test

Production

Test

Experimentation Experimentation

One physical platform to support multiple virtual big data clusters

Experimentation

Production recommendation engine

Production Ad Targeting

Test/Dev

Recommendation engine Ad targeting

8 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter Kit

• Support for major Hadoop distributions

• Quickly deploy, manage, and scale Hadoop clusters

• GUI simplifies management tasks

• Elastic scaling optimizes cluster performance and resource utilization

Consolidate And Virtualized Hadoop With EMC Isilon And Vmware

HDFS

NameNode

Data

name node

name node

name node

name node d

ata

node

Apache

9 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Why Shared Storage For Hadoop?

10 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Bare Metals Deployment

Hadoop DAS Environment

1 Dedicated Storage Infrastructure

– One-off for Hadoop only

2 Lacking Enterprise Data Protection

– No Snapshots, replication, backup

3 Poor Storage Efficiency

– 3X mirroring

4 Fixed Scalability

– Rigid compute to storage ratio

5 Manual Import/Export

– No protocol support

1x

1x

2x

2x

3x

2x

3x

3x

1x

NameNode

11 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Hadoop On EMC Isilon Scale Out NAS

1 Scale-Out Storage Platform

– Multiple applications & workflows

2 End-to-End Data Protection

– SnapshotIQ, SyncIQ, NDMP Backup

3 Industry-Leading Storage Efficiency

– >80% Storage Utilization

4 Independent Scalability

– Add compute & storage separately

5 Multi-Protocol

– Industry standard protocols

– NFS, CIFS, FTP, HTTP, HDFS

12 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

EMC Isilon Addresses Hadoop Challenges

1 Dedicated Storage Infrastructure

– One-off for Hadoop only

2 Lacking Enterprise Data Protection

– No Snapshots, replication, backup

3 Poor Storage Efficiency

– 3X mirroring

4 Fixed Scalability

– Rigid compute to storage ratio

5 Manual Import/Export

– No protocol support

1 Scale-Out Storage Platform

– Multiple applications & workflows

2 End-to-End Data Protection

– SnapshotIQ, SyncIQ, NDMP Backup

3 Industry-Leading Storage Efficiency

– >80% Storage Utilization

4 Independent Scalability

– Add compute & storage separately

5 Multi-Protocol

– Industry standard protocols

– NFS, CIFS, FTP, HTTP, HDFS

13 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Why Virtualize Hadoop?

14 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Hadoop with Virtualization

Combined Storage/ Compute

VM

Hadoop in VM • VM lifecycle

determined by Datanode

• Limited elasticity

• Limited to Hadoop Multi-Tenancy

Storage

Compute

VM

VM

Separate Storage • Separate compute

from data

• Elastic compute

• Enable shared workloads

• Raise utilization

Storage

T1 T2

VM

VM

VM

Separate Compute Tenants • Compute cluster per tenant

• Stronger VM-grade security and resource isolation

• Enable deployment of multiple Hadoop runtime versions

Elastic, Multi-Tenant

15 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Virtualized Hadoop Performance Native vs. Virtual, 32 hosts, 16 disks/host

Source: http://www.vmware.com/resources/techresources/10360

16 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Example Deployment With Pivotal HD

• Pre-requisities – Isilon OneFS version 6.5.5 or

higher

– VMware vSphere 5.0 (or later) Enterprise or Enterprise Plus

• Download Vmware Big Data Extensions (Free)

• Configure Isilon cluster for HDFS (Free license)

• Configure Big Data Extensions to use Pivotal HD

• Deploy Hadoop Cluster

• Run a simple program to test

17 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Data Services Real-time, Interactive, And Batch Processing

18 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

C U S T O M E R P R O F I L E

Results Fast deployment with native Hadoop integration,

enabling rapid launch of new service

Delivered high performance scalability

Simplified platform administration

Challenges Rapidly launch new market intelligence service for

fashion retailers

Support large and growing volumes of Big Data

Solution • Pivotal Greenplum Database

• Pivotal HD

EMC Isilon

Pivotal Data Science Labs

WGSN Retail

“Performance, scalability, and tight integration with Hadoop were the key reasons we chose Isilon. We also felt very comfortable with the partnership between EMC and Pivotal. In the end, the EMC and Pivotal solution offered the ideal balance of storage and compute with the right level of support.”

19 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.

Download Hadoop Starter Now

• Rapid provisioning

• High availability

• Elasticity

• Multi-tenancy

• Portability

https://community.emc.com/docs/DOC-26892