Consuming The Data Lake - awstrainingday.com
Transcript of Consuming The Data Lake - awstrainingday.com
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming The Data Lake
Data Lake, Reporting, Analytics, Machine
Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Data Lake
Catalog & Search Access & User Interfaces
Data Ingestion
Analytics & Serving
S3
Amazon
DynamoDBAmazon Elasticsearch
Service
AWS
AppSync
Amazon
API GatewayAmazon
Cognito
AWS
KMSAWS
CloudTrail
Manage & Secure
AWS
IAM
Amazon
CloudWatch
AWS
SnowballAWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMRAWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
NeptuneAmazon
RDS
Central Storage
Scalable, secure, cost-effective
AWS
Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anti-Pattern
Everything
Query
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Also an Anti-Pattern
Everything
Query
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One tool to
rule them all
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where do I start?
• Understand your data
• Data Structure, Access patterns & characteristics,
Temperature, Cost, Size
• Know your audience
• Business Users, Data Scientists, Developers
• Select the right service
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Archival
In-memoryWarehouse
NoSQL
Hot data Warm data Cold data
Data
Str
uctu
re
Low
High
Object
Search
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Amazon ES
Amazon
DynamoDBAmazon S3 Amazon Glacier
Hot data Warm data Cold data
Data
Str
uctu
re
Low
High
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
NoSQLObject
Archival
Search
In-MemoryWarehouse
Amazon Redshift
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who is your audience?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PRIORITIES NEEDS
Creating engaging visual and narrative journeys
for analytical solutionsData Visualizer
Manages data as a product. Ensures freshness
and consistency of data; understands lineage and
compliance needs; treats DS as customers
Data Product
Manager
Monitoring for reliability, quickly diagnose
deployment or availability issues
DevOps
Engineer
ROLE
Visualization
Dashboards
Reporting
Reports – data quality, errors
Ad hoc querying
Dashboards
Makes sense of data, generates and communicates
insights to improve or create business processes,
creates predictive ML models to support themData Scientist
Ad hoc querying
Robust ML tools
Builds scalable pipelines, transforms and loads data
into structures complete with metadata that can be
readily consumed by DS
Data
Engineer
Ad hoc querying
Quick visualization
Vetting the priortization and ROI, funding projects,
providing ongoing feedback
Business
Sponsor
Reporting
Dashboards
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dashboards
Visual Representation of key metrics that change over time
• Data structure - Low
• Usage - Near real-time visualization
• Data temperature - Hot
Available Services:
AWS Lambda Amazon DynamoDB Amazon Kinesis
Data Streams
Amazon Elasticsearch
Service
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dashboards – Near Real-time
Amazon
EMRAWS Glue
OR
ETL
Data Lake
Amazon
S3
Raw Bucket Transformed Data
Bucket
DynamoDBUsers
EC2
Containers
Serverless
OR
OR
Web serving layer
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dashboards + Search
Amazon
EMRAWS Glue
OR
ETL
Data Lake
Amazon
S3
Raw Bucket Transformed Data
Bucket
DynamoDB
Users
Dynamo
Streams
Amazon Kinesis
Firehose
AWS
Lambda
Amazon
Elasticsearch
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ad Hoc Analysis
Information sought on an as-needed basis
• Usage - Dynamic Data Querying
• Data structure - Case based
• Data temperature - Medium - cold
Available Services:
Amazon Redshift
SpectrumAthena Amazon
EMR
Amazon
ElasticSearch
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reports and Ad-Hoc Analysis
Amazon
QuickSight
OR
Amazon Redshift
Athena
Amazon
EMRAWS Glue
OR
ETL
Data Lake
Amazon
S3
Raw Bucket Transformed Data
BucketAmazon Redshift
Spectrum
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning
Data labeled with outcomes to train predication models
• Usage - Machine learning data preparation
• Data structure - Case based
• Data temperature - Medium - cold
Available Services:
Amazon
EMR
Amazon
SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning
Amazon EMR
Users
Amazon
EMRAWS Glue
OR
ETL
Data Lake
Amazon
S3
Raw Bucket Transformed Data
Bucket
Amazon
SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reports
Static representations of data rendered at a point in time
• Usage - Point in time data extraction
• Data structure - High
• Data temperature – Medium - cold
Available Services:
Amazon Redshift Amazon Athena Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
Data Scientists & Developers
Business
UsersAmazon Redshift
Amazon
QuickSight
ORAmazon
EMRAWS Glue
OR
ETL
Data Lake
Amazon
S3
Raw Bucket Transformed Data
Bucket
Amazon EMRData
Scientist
BI/BA
Engineer
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing & Analytics
Transactional & RDBMS
DynamoDB
NoSQL DB Relational Database
Aurora
BI & Data Visualization
Kinesis Streams
& Firehose
Batch
EMR
Hadoop, Spark,
Presto
Redshift
Data Warehouse
Athena
Query Service
AWS Batch
Predictive
Real-time
AWS LambdaApache Storm
on EMR
Apache Flink
on EMR
Spark Streaming
on EMR
Elasticsearch
ServiceKinesis Analytics,
Kinesis Streams
ElastiCache DAX
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon AI – Predictive
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
AWS enables you to build sophisticated big data applications
• Retrospective, Real-time, Predictive
Understand who is the user
• Business user, Data Scientist & Developers
Use the right tool for the job• Data structure, latency, throughput, access patterns
Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!