EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
-
Upload
vmware -
Category
Technology
-
view
485 -
download
5
Transcript of EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Delivering Hadoop-as-a-Service To Your Organization
2 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Oil Exploration Medical Imaging
Video Surveillance Mobile Sensors
Smart Grids
Social Media Internet of Things
Dark Data
Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources
3 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Improve Company
Performance
Increase Revenue
Increase Demand
Increase Spend Efficiency
Ad Optimization
Hyper Targeting
Campaign Optimization
Ad Effectiveness
Analytics
Market Mix Modeling
Coupon Redemption
Increase Customer Acquisition
Purchase Funnel Analysis
Increase Customer Engagement
Customer Segmentation
Churn Prevention
Customer Lifetime Value
Increase Basket Size
Affinity Analytics
Next Best Offer
Cross-Sell / Upsell
Manage Demand
Demand Analysis
Price Optimization
Build Brand Equity
Increase Reach
Digital Marketing
Social Media
Improve Customer Loyalty
Social Graph / Influencers
Loyalty Program Analytics
Customer Satisfaction
Customer Care Analytics
Reduce Costs
Click Fraud
Transaction Anomaly Detection
Production Cost / Efficiency
Supply / Demand Forecasting
General and Administrative
Workforce Analytics
Employee Churn
IT / Security Analytics
Save Money Or Make Money
4 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Overview
Hadoop is an open-source framework from Apache that allows for parallel batch processing of very large data sets
MapReduce is the Hadoop process that divides the workload so multiple devices can process it
HDFS is the file system for the data. It provides data protection and locality with multiple mirrors (usually 3 times)
5 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
IT Challenges With Hadoop
• Time consuming and complex creating shadow IT
• Bare metal capacity utilization is low
• Multiple Hadoop Distribution deployments creating data siloes
6 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Typical Enterprise Deployment
• Multiple, siloed clusters to manage
• Redundant common data in separate clusters
• Peak compute and I/O resource is limited to number of nodes in each independent cluster
Production
Test
Experimentation
Dept A: Recommendation engine Dept B: Ad targeting
Production
Test
Experimentation
Log files
Social data Historical cust behavior
7 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
What If You Consolidate & Virtualize?
Production
Test
Production
Test
Experimentation Experimentation
One physical platform to support multiple virtual big data clusters
Experimentation
Production recommendation engine
Production Ad Targeting
Test/Dev
Recommendation engine Ad targeting
8 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit
• Support for major Hadoop distributions
• Quickly deploy, manage, and scale Hadoop clusters
• GUI simplifies management tasks
• Elastic scaling optimizes cluster performance and resource utilization
Consolidate And Virtualized Hadoop With EMC Isilon And Vmware
HDFS
NameNode
Data
name node
name node
name node
name node d
ata
node
Apache
9 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Why Shared Storage For Hadoop?
10 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Bare Metals Deployment
Hadoop DAS Environment
1 Dedicated Storage Infrastructure
– One-off for Hadoop only
2 Lacking Enterprise Data Protection
– No Snapshots, replication, backup
3 Poor Storage Efficiency
– 3X mirroring
4 Fixed Scalability
– Rigid compute to storage ratio
5 Manual Import/Export
– No protocol support
1x
1x
2x
2x
3x
2x
3x
3x
1x
NameNode
11 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Hadoop On EMC Isilon Scale Out NAS
1 Scale-Out Storage Platform
– Multiple applications & workflows
2 End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
3 Industry-Leading Storage Efficiency
– >80% Storage Utilization
4 Independent Scalability
– Add compute & storage separately
5 Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
12 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
EMC Isilon Addresses Hadoop Challenges
1 Dedicated Storage Infrastructure
– One-off for Hadoop only
2 Lacking Enterprise Data Protection
– No Snapshots, replication, backup
3 Poor Storage Efficiency
– 3X mirroring
4 Fixed Scalability
– Rigid compute to storage ratio
5 Manual Import/Export
– No protocol support
1 Scale-Out Storage Platform
– Multiple applications & workflows
2 End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
3 Industry-Leading Storage Efficiency
– >80% Storage Utilization
4 Independent Scalability
– Add compute & storage separately
5 Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
13 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Why Virtualize Hadoop?
14 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Hadoop with Virtualization
Combined Storage/ Compute
VM
Hadoop in VM • VM lifecycle
determined by Datanode
• Limited elasticity
• Limited to Hadoop Multi-Tenancy
Storage
Compute
VM
VM
Separate Storage • Separate compute
from data
• Elastic compute
• Enable shared workloads
• Raise utilization
Storage
T1 T2
VM
VM
VM
Separate Compute Tenants • Compute cluster per tenant
• Stronger VM-grade security and resource isolation
• Enable deployment of multiple Hadoop runtime versions
Elastic, Multi-Tenant
15 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Virtualized Hadoop Performance Native vs. Virtual, 32 hosts, 16 disks/host
Source: http://www.vmware.com/resources/techresources/10360
16 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Example Deployment With Pivotal HD
• Pre-requisities – Isilon OneFS version 6.5.5 or
higher
– VMware vSphere 5.0 (or later) Enterprise or Enterprise Plus
• Download Vmware Big Data Extensions (Free)
• Configure Isilon cluster for HDFS (Free license)
• Configure Big Data Extensions to use Pivotal HD
• Deploy Hadoop Cluster
• Run a simple program to test
17 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Data Services Real-time, Interactive, And Batch Processing
18 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
C U S T O M E R P R O F I L E
Results Fast deployment with native Hadoop integration,
enabling rapid launch of new service
Delivered high performance scalability
Simplified platform administration
Challenges Rapidly launch new market intelligence service for
fashion retailers
Support large and growing volumes of Big Data
Solution • Pivotal Greenplum Database
• Pivotal HD
EMC Isilon
Pivotal Data Science Labs
WGSN Retail
“Performance, scalability, and tight integration with Hadoop were the key reasons we chose Isilon. We also felt very comfortable with the partnership between EMC and Pivotal. In the end, the EMC and Pivotal solution offered the ideal balance of storage and compute with the right level of support.”
19 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Download Hadoop Starter Now
• Rapid provisioning
• High availability
• Elasticity
• Multi-tenancy
• Portability
https://community.emc.com/docs/DOC-26892