Slides: EMC Data Value Tutorial
-
Upload
john-furrier -
Category
Technology
-
view
91 -
download
0
Transcript of Slides: EMC Data Value Tutorial
1© Copyright 2016 EMC Corporation. All rights reserved.
INTRODUCTION TO DATA VALUESTEVE TODD, EMC FELLOW, VP OF STRATEGY AND INNOVATIONJULY 6, 2016, BIS 2016 LEIPZIG GERMANY
2© Copyright 2016 EMC Corporation. All rights reserved.
1. Introduction to Data Value2. Data Value IT Architectures
SESSIONS
3© Copyright 2016 EMC Corporation. All rights reserved.
1. Introduction to Data Value- Introductions: What is Data Value?- San Diego Supercomputer Center- Product to Data Value- Industry Challenge- Emerging Use Cases- Audience Discussion
2. Data Value IT Architectures
SESSION1
4© Copyright 2016 EMC Corporation. All rights reserved.
DR. JIM SHORT, SAN DIEGO SUPERCOMPUTER CENTERARCHITECTING FOR VALUE
Valuation IT ArchitecturesValuation Business Processes
5© Copyright 2016 EMC Corporation. All rights reserved.
Analysis of the vast collections of GIS data we have or have access to is not merely generating new exploration, it has become a salable service on its own right.
— Australia Oil & Gas
CAPGEMINI EMC BIG DATA REPORTFROM PRODUCT VALUE TO DATA VALUE
Source: Forbeshttps://www.capgemini.com/news/new-global-study-by-capgemini-and-emc-shows-big-data-driving-market-disruption-leaving-many
Among our respondents, 63% consider that the monetization of data could eventually become as valuable to their organizations as their existing products and services.
— CapGemini EMC Big Data Report
6© Copyright 2016 EMC Corporation. All rights reserved.
BABOLATEXAMPLE: PRODUCT TO DATA VALUE
7© Copyright 2016 EMC Corporation. All rights reserved.
ADIDAS SMART BALL EXAMPLE: PRODUCT TO DATA VALUE
8© Copyright 2016 EMC Corporation. All rights reserved.
LANTMATERIETEXAMPLE: PRODUCT TO DATA VALUE
9© Copyright 2016 EMC Corporation. All rights reserved.
Source: WSJhttp://www.wsj.com/article_email/whats-all-that-data-worth-1413157156-lMyQjAxMTE1OTE1NDUxMDQ5Wj
companies have better accounting for their office furniture than their information assets
ARE WE GETTING A GOOD DEAL?
10© Copyright 2016 EMC Corporation. All rights reserved.
VALUATION BUSINESS PROCESSESM&A
CREDITORVALUATION
DATA INSURANCE DATAMONETIZATION
DATA SALE
11© Copyright 2016 EMC Corporation. All rights reserved.
DATA VALUE AND ACQUISITIONS
Source: https://press.linkedin.com/site-resources/news-releases/2015/linkedin-to-acquire-lyndacom
CALCULATIONS $1,500,000,000 90% $1,350,000,000Acquisition cost per tutorial $9,269.24Acquisition cost per GB (uncompressed) $112.88
Lynda.com’s extensive library of premium video content helps empower people to develop the skills needed to accelerate their careers.
— Jeff Weiner, CEO of LinkedIn
12© Copyright 2016 EMC Corporation. All rights reserved.
UPDATE: MICROSOFT ACQUISITION
Source: http://www.wsj.com/articles/microsoft-to-acquire-linkedin-in-deal-valued-at-26-2-billion-1465821523
Sales representatives using Microsoft’s Dynamics software for managing customer relationships could pick up useful tidbits of background on potential customers from LinkedIn data. Microsoft also sees opportunities in Lynda.com, a channel for training videos that LinkedIn bought for $1.5 billion last year. Microsoft will be able to offer Lynda’s videos inside its own software, such as Excel spreadsheets.
— Wall Street Journal, June 14 2016
13© Copyright 2016 EMC Corporation. All rights reserved.
BANKRUPTCY
Source: http://www.wsj.com/articles/in-caesars-fight-data-on-players-is-real-prize-1426800166
DEFENDANT’S LOOTING OF CEOC’S VALUABLE OPERATING ASSETS
Date of Transfer
Asset Transferred
Conservative Estimated Equity Value
Equity Value Attributed
Equity Valuation Shortfall - $
Equity Valuation Shortfall - %
May 2014 Total Rewards $1.0BN None $1.0BN 100%Total $5.9BN $2.4BN $3.6BN 60%
14© Copyright 2016 EMC Corporation. All rights reserved.
• 23andMe– 800,000 Customer DNA Kits since 2006– $99 per test ($79.2 million)
• Genentech– Upfront payment of $10 million– Further milestones as much as $50 million
DATA MONETIZATION
Source: Forbeshttp://www.forbes.com/sites/matthewherper/2015/01/06/surprise-with-60-million-genentech-deal-23andme-has-a-business-plan/
…this single deal with one large drug company could generate almost as much revenue as doubling 23andMe’s customer base.
— Forbes Article
15© Copyright 2016 EMC Corporation. All rights reserved.
DATA SALE
http://adexchanger.com/ecommerce-2/tesco-eyes-sale-of-dunnhumby-its-nearly-1-billion-shopper-data-business/
Tesco said it has appointed Goldman Sachs as its adviser to explore “strategic options” for the US$756 million business
[dunnhumby]…has a unique frame of reference on the purchase habits of 770 million shoppers
16© Copyright 2016 EMC Corporation. All rights reserved.
DATA INSURANCE
Liberty Mutual• 30% increase in primary data insurance
policies between 2013-2014
Source: Boston Globe http://www.bostonglobe.com/business/2014/02/17/more-companies-buying-insurance-against-hackers-and-privacy-breaches/9qYrvlhskcoPEs5b4ch3PP/story.html
TJX• 46 million credit/debit cards• Estimated cost $180 million
17© Copyright 2016 EMC Corporation. All rights reserved.
• What data-related business processes add revenue to the bottom line?
• What data-related business processes subtract costs from the bottom line?
• What characteristics of data can be used to calculate value?
AUDIENCE DISCUSSION
18© Copyright 2016 EMC Corporation. All rights reserved.
• Sell• Rent (or provide data services)• Monetize (analyze to increase revenue or cut costs)• Data/Cyber Insurance claim
COMPARE YOUR ANSWERS
• Cost to process data• Cost to store data• Cost of premiums for data/cyber insurance• Cost to purchase a data asset• Cost to acquire a company’s data• Regulatory fines for data violations• Data science staff
PLUS
MINUS
19© Copyright 2016 EMC Corporation. All rights reserved.
1. Introduction to Data Value2. Data Value IT Architectures
– Introduction to Data Lakes• Data Lake Architecture• Data Lake Industry use cases
– Adding Valuation to Data Lake Architectures
SESSION2
20© Copyright 2016 EMC Corporation. All rights reserved.
MORE DATA MEANS MORE COMPLEX RELATIONSHIPSTO ANALYZE IN REAL-TIME AT A LARGE SCALE
Amount of Data
4.4 Zb
2013 2020Source: IDC 2014
16+ ZbHot Data
44 Zb
The Data Multiplier Effect
Business
Human
Machine
1X 10X 100XDatabase Data
VOLUMEVARIETYVOLUME
VARIETYVOLUMEVELOCITY
Enterprise/External Data
Sensor/External Data
Satellite Imaging
Sensors
Video RecordingM2M Log
Files
Bio-Informatics
Email Documents
Web Logs Social
• More Data Needs To Be Captured Faster
• Real-time Analytics For Business Insights
• Existing Applications Are Taxed
• Evolving New Applications & Architectures
21© Copyright 2016 EMC Corporation. All rights reserved.
BRING TOGETHER DATA, ANALYTICS, & APPS
ANALYZE ANYTHINGAll of the dataMore sophisticated analysesNew combinations and correlations
STORE EVERYTHINGStructured, unstructured, darkGenerated by the enterprise, imported from outsideHistoric & real-time
SPEEDANALYTICS
APPS
DATA
BUILD THE RIGHT THINGDeliver data consistently & in a standardized way Get at the data quicklyBuild views and applications each user really needs
22© Copyright 2016 EMC Corporation. All rights reserved.
INGEST STORE ANALYZE SURFACE ACT
Ingest data in real-time,
near real-time batch/micro-
batch.
Open HDFS storage allows
access from the full-stack of analytics
tools.
Apply the latest
machine learning and data science techniques.
An open platform for visualization
of results and data products.
And an application
development platform to
act on findings.
DATA LAKE - ATTRIBUTES
23© Copyright 2016 EMC Corporation. All rights reserved.
24© Copyright 2016 EMC Corporation. All rights reserved.
Data Streaming Reference ArchitectureData Feeds Transactional Apps Analytic Apps
Data Stream Pipeline
DistributedComputing Real-Time Data Expert Systems &
Machine LearningAdvancedAnalytics
HDFSData Lake
25© Copyright 2016 EMC Corporation. All rights reserved.
FEDERATION BUSINESS DATA LAKE PLATFORMDATA & ANALYTICS CATALOG
(THIRD PARTY APPLICATIONS)
HADOOP
OPEN DATA PLATFORM
PIVOTAL BIG DATA SUITEADVANCED ANALYTICS
DATA PROCESSING
APPS AT SCALEGREENPLUM DATABASE HAWQ
PIVOTAL HDSPARKSPRING XD
DATA SERVICES MANAGEMENT
ANALYTICS TOOLBOX
REDIS
RABBITMQ
GEMFIRE
BDS ON PIVOTAL
DATAMANAGER
DATAGOVERNOR
INGEST
INDEX & SEARCH
POLICYMGMT
SECURITY & ACCESS
CONTROL
VIRTUALIZATION PIVOTAL CLOUD FOUNDRY
EMC II STORAGEDATA LAKE FOUNDATION: ISILON | ECS
VCE VBLOCK | XTREMIO
26© Copyright 2016 EMC Corporation. All rights reserved.
DATA LAKE ACCESS METHODS
FILE
HPC
Backup/Archive
Analytics
Mobile
File Shares
Cloud Apps
FILE
26© Copyright 2015 EMC Corporation. All rights reserved.
27© Copyright 2016 EMC Corporation. All rights reserved.
DATA: ANALYTICS-READY STORAGE CHOICES
ISILON, ECS Scale compute & storage independently HDFS-enable existing data No single point of failure Easily import & export via next-gen communication;
including HDFS, S3, Swift and Atmos API support Fault-tolerant, end-to-end data protection Self-service provisioning Storage hardware choice: enterprise, commodity
CONSOLIDATE DATA STORAGE THROUGH MULTI-PROTOCOL ACCESS
Real
tim
eBa
tch
Hadoop
Analytics
Surface
ActCloud
Archive
Mobile
HPC
Shares
28© Copyright 2016 EMC Corporation. All rights reserved.
DATA LAKE ARCHITECTURE/USE CASES
……
.
8 PetabytesGenome
Sequenzed Data
• 20.000 Oncology Samples per Year• 40-50GB per Sequence Session• Historical Data for Cancer Analysis
29© Copyright 2016 EMC Corporation. All rights reserved.
Mission statement
“To provide knowledge, services and solutions to fulfil Radboudumc research ICT needs, in a way that fits the individual study. The solutions includes a Digital Research Environment ( DRE) which allows researchers to import, merge, optimize, store, analyse, archive and share data from various sources (local and (inter)national) in a single scalable digital environment per study. The use of this digital environment increases research study efficiency and output, thereby increasing scientific impact of the Radboudumc. The sustainable, secure, law compliant infrastructure places the Radboudumc in a key position as scientific partner.”
The mission
30© Copyright 2016 EMC Corporation. All rights reserved.
DRE positioning
Education
Digital Learning Environment (DLE)
Care
Electronic Medical Record (EMR)
Research
Digital Research Environment (DRE)
31© Copyright 2016 EMC Corporation. All rights reserved.
DRE-mandate: valorize the new EMR ‘Epic’
EMR
DRE
‘Paddy Field’
32© Copyright 2016 EMC Corporation. All rights reserved.
DRE
Import
Merge
Optimize
StoreAnaly
se
Archive
Share
DRE-mandate: increase efficiency and output
33© Copyright 2016 EMC Corporation. All rights reserved.
Datamanagementat this moment
© Caspar Terheggen, Radboud Universiteit
& LAW
DRE-mandate: modernize
34© Copyright 2016 EMC Corporation. All rights reserved.
© Caspar Terheggen, Radboud Universiteit
Virtual research workspace
DRE-mandate: modernize
35© Copyright 2016 EMC Corporation. All rights reserved.
Local key management
Uitwisseling collega’sSharing
High performance computing
Data ponds
Secure Research Environment
Standards
Analysis & Reporting
Archiving
Multi centersources
Source disclosure
Pseudonymisation Merging
DRE: study example
36© Copyright 2016 EMC Corporation. All rights reserved.
Hybrid CloudPrivate Public
Platform as a ServiceLight Opensource Frameworks, Services, Data and Analytics
Orchestration Automated Provisioning and IT Infrastructure Portal
Converged Infrastructure IT Transformations to Service Delivery
Standardize, Virtualize, Automate
Research as a ServiceBuilding new apps for competitive advantage in market, the
new business
Research Self Service PortalEnd User Self Service with Measured SLA’s
Vendor inventory: architecture
37© Copyright 2016 EMC Corporation. All rights reserved.
BITBW (State SP for State of Baden-Württemberg) – EMC ECS-Electronic Record Archiving in Criminal Justice
Department– Cooperation with ISV and their SW (PDV Systeme)– Starting 1st Jan 2018 Electronic Comm. between Lawyers and Justice Courts will be fully electronical without any Media Break– Justice Courts are fully digital
38© Copyright 2016 EMC Corporation. All rights reserved.
I. Traditional / Horizontal Use Cases– Email Archives– File Shares / Home Directories– VDI (user data)– vCAD Workstation Virtualization– Backup + Archive– Video Surveillance
II. Engineering Use Cases– Computer-Aided Design– Computer-Aided Engineering– Advanced Driver Assistance Systems, ADAS
III. Emerging Use Cases– Hardware in the Loop / Simulation– Connected Cars– Analytics
IT IN AUTOMOTIVE
Source: Pictures by Bosch
39© Copyright 2016 EMC Corporation. All rights reserved.
Car Supplier - Typical Workflow
Labeling
SIL-HIL computing
Data Ingest
Label-data
Validation
Developer
Tape library
Import,Cut&Compress
144*
X400
SIL = Software in the LoopHIL = Hardware in the Loop
40© Copyright 2016 EMC Corporation. All rights reserved.
CAR SUPPLIER INFRASTRUCTURE (GLOBAL)
144*X400
3*X400
Plymouth
Germany
IP
California
Replication of selected data over Aspera *3*X400
Replication of selected data over Aspera *
41© Copyright 2016 EMC Corporation. All rights reserved.
Typical HiL Environment for adas develoment
SMB / NFS
Write (Ingest)
Read (simulation)
2PB – 20 PB HiL Server
42© Copyright 2016 EMC Corporation. All rights reserved.
VALUATION APPROACHES
ApplicationAgility
Content Workflow
ContentProcessing
Data Protection Ecosystem
ContentIngest
43© Copyright 2016 EMC Corporation. All rights reserved.
CONTENT PROCESSING
NLP, Translation, Stemming, Tokenization
Domain A Domain B Domain C Domain D
Valuation Algorithms
44© Copyright 2016 EMC Corporation. All rights reserved.
DATA PROTECTION ECOSYSTEM
Backup Schedule/Catalog
Backup Data
Valuation Algorithms
Mappings Between Primary/Protection
System
P1 B1
P2 B2
Schedule Num Copies
(x)
Catalog V1
V2
V3
45© Copyright 2016 EMC Corporation. All rights reserved.
CONTENT INGESTSpout
1Spout
2
Bolt 2
Bolt 1
Bolt 3
Bolt 4
46© Copyright 2016 EMC Corporation. All rights reserved.
APPLICATION AGILITY
APPS
DATA
SPEED
ANALYTICS
Zero Downtime Upgrade to Production
Commit Code
Change
1
Automate Build & Test
(Unit Test, Static Code Analysis)
2
Store Binaries &
Build Artifacts
3
Automated Integration
Testing
4
Acceptance,Performance
& Load
5 6
47© Copyright 2016 EMC Corporation. All rights reserved.
CONTENT WORKFLOW
End User End User End User
Driver
Source
Driver DriverDriver
Driver DriverDriver
Source Source Source Source Source
Driver DriverDriver
Driver Driver
Driver Driver
48© Copyright 2016 EMC Corporation. All rights reserved.
DATA SCIENCE EXAMPLE
Final Report & Business Recommendation$29M
Labeled DiskArrayData
Labeled Disk ArrayDataDisk array data where each disk drive is being assigned to a label indicating failure or activity
DiskArrayData
Product_Tables
Customer History Table
Geneaology
GPO GeneaologyProvides a listing of all items in venwith tracking of their
TCE History TableProvides history tracking for drives & other parts in vendor equipment
Product TablesThe product table contains information at the part item number level, i.e., configuration parameters & meta data
Disk Array DataData collected from arrays installed at customer sitesContains many places of information regarding configuration & parts as well as error information for disk drives
DiskArray
Enriched w/EMC
SN
Data Scientist: John SmithTool: Greenplum DBActions: Identify failed from non-failed drives
Disk Array DataEnriched with EMC SNDisk array data where each row disk drive serial number is mapped to WMC internal disk drive serial number
Data Scientist: John SmithTool: Greenplum DBActions: Join the product meta data with the drive data
Data Scientist: John SmithTool: Greenplum DBActions: Map serial numbers
Data Scientist: John SmithTool: Greenplum DBActions: Identify disk drives & map raw drive serial numbers to serial numbers
49© Copyright 2016 EMC Corporation. All rights reserved.
• Async replication is never perfect– Inherent data lag between production and replica – This is unprotected data; will be lost in disaster– Data loss drives monetary loss– We aim to minimize monetary loss in a case of disaster
• Optimize replication resources for minimize data loss’s costs
VALUE-DRIVEN DISASTER RECOVERYMINIMIZE MONETARY LOSS IN DISASTER
3 111 219 327 435 543 651 759 867 975 1083 1191 1299 1407 1515 1623 1731 1839 1947 2055 2163 2271 2379 2487 2595 2703 2811 2919
ASSUME DATA A IS X TIMES MORE IMPORTANT THAN APP B
Non optimized Business optimized (*2) Business optimized (*4) Business optimized (*10)
Time (secs)
BusinessDamage in
Disaster
(x=2)(x=4)
(x=10)Peleg Yiftachel, Udi Shemer, Omer Sagi
50© Copyright 2016 EMC Corporation. All rights reserved.
Learn about the value, opportunity, and insights that Big Data provides. Get introduced to the Federation Business Data Lake solution to leverage the full power of big data to drive major business strategies.https://educast.emc.com/learn/data-lakes-for-big-data-archive-2015 http://stevetodd.typepad.com
EMC HELPS WITH MOOC, BLOGMASSIVE OPEN ONLINE CURRICULUM