Achieving Agility and Scale for Your Data LakeAchieving Agility and Scale for Your Data Lake...
Transcript of Achieving Agility and Scale for Your Data LakeAchieving Agility and Scale for Your Data Lake...
@isanuage
Achieving Agility and Scale for Your Data Lake
Isabelle Nuage, Product Marketing Cyril Sonnefraud, Product Management
©2017 Talend Inc #TalendConnect
Poll• Who’s using Talend Big Data today?• Who has a data lake in production?• Who is deploying or planning a data lake project within
12 months?• Who is implementing a data lake in the Cloud?
<Digital Tranformation Stats>
By end 2017, > 70% of G500
By 2020, 50% of the G2000
Digital Transformation is no Longer an Option Are You Prepared?
But only 26% of Organizations
Accenture and Forrester Digital Transformation in the Age of the Customer studyIDC Futurescape
The Data Lake is the New Digital Backbone
• Break down data silos• Structured and
unstructured • Granular data• Machine learning
Bu
sin
es
s V
alu
e• Offload EDW
• Cheaper storage
• Access to archived data
Why Create Data Lakes?
Reduce costs
Bu
sin
es
s V
alu
e
Generating new opportunities
• Offload EDW
• Cheaper storage
• Access to archived data
• Customer acquisition, retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…
Why Create Data Lakes?
Reduce costs
7
Challenges
Complex Technology
Limited Access
DataSwamps
How to achieveAgility & Scale?
DATALAKES
#TalendConnect
People Doing it the OLD Way…
#TalendConnect
2017 Lenovo Internal. All rights reserved.
Change is the Only ConstantB
usin
ess Valu
e
Reporting MeasurementBusinessInsights
Optimization Predictive Analytics
Automation Prescriptive Analytics
Pre FY - 07FY - 07/10
FY - 11/ 12
FY - 13/ 14
FY – 15/ 17
Time
Cognitive Analytics
FY – 17/ 18
• Any innovation
• Any platform
• Any use case
• Any speed
• Any user
The Agile Data Lake
The Path to Agility
Ing
esti
on
+
basic
vis
uali
za
tio
n
Data
Qu
ali
ty
Self
Serv
ice
Data
Go
vern
an
ce
Real-
tim
e
Mach
ine
Le
arn
ing
©2017 Talend Inc #TalendConnect
ExamplesSmart Data QualitySmart Data Pipelines
Demo flow
Data Lake
Incoming Lead Data
(Raw)
Amazon EMR Cluster Data Lake
Output Lead Data (Processed)
With Segmentation
1
Ingestion with Smart Data
Quality
2
Smart Data Pipeline
with Machine Learning
©2017 Talend Inc #TalendConnect
ArchitectureGuidelines
On-premise Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Processing
Storage
On-prem Datalake
Hybrid Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Cloud Processing
Processing
Cloud Storage
Storage
On-prem Datalake
Cloud Datalake
Distribute
Cloud Data Lakes – A Concrete Example
Ingest Prepare Process Access Consume
Governance
Cloud Processing
Cloud Storage
On-Premise
Data Sources
Cloud
Data Sources
S3
EMR
Cloud Storage
Cloud Dataflow
Azure DL Store
HDInsight
The Path to Agility
Ing
esti
on
+
basic
vis
uali
za
tio
n
Data
Qu
ali
ty
Self
Serv
ice
Data
Go
vern
an
ce
Real-
tim
e
Mach
ine
Le
arn
ing
Deliver Value Along The Way
Start with quick wins & business outcome in mind
Get a cadence of constantly delivering value
Focus on game changer value drivers
Get the company onboard
Be Eligible to Win Prizes at the End of the Show!