Consuming The Data Lake - awstrainingday.com

24
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consuming The Data Lake Data Lake, Reporting, Analytics, Machine Learning

Transcript of Consuming The Data Lake - awstrainingday.com

Page 1: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Consuming The Data Lake

Data Lake, Reporting, Analytics, Machine

Learning

Page 2: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Data Lake

Catalog & Search Access & User Interfaces

Data Ingestion

Analytics & Serving

S3

Amazon

DynamoDBAmazon Elasticsearch

Service

AWS

AppSync

Amazon

API GatewayAmazon

Cognito

AWS

KMSAWS

CloudTrail

Manage & Secure

AWS

IAM

Amazon

CloudWatch

AWS

SnowballAWS Storage

Gateway

Amazon

Kinesis Data

Firehose

AWS Direct

Connect

AWS Database

Migration

Service

Amazon

Athena

Amazon

EMRAWS

Glue

Amazon

Redshift

Amazon

DynamoDB

Amazon

QuickSight

Amazon

Kinesis

Amazon

Elasticsearch

Service

Amazon

NeptuneAmazon

RDS

Central Storage

Scalable, secure, cost-effective

AWS

Glue

Page 3: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Anti-Pattern

Everything

Query

Page 4: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Also an Anti-Pattern

Everything

Query

Page 5: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

One tool to

rule them all

Page 6: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Where do I start?

• Understand your data

• Data Structure, Access patterns & characteristics,

Temperature, Cost, Size

• Know your audience

• Business Users, Data Scientists, Developers

• Select the right service

Page 7: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Archival

In-memoryWarehouse

NoSQL

Hot data Warm data Cold data

Data

Str

uctu

re

Low

High

Object

Search

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

Page 8: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon

ElastiCache

Amazon ES

Amazon

DynamoDBAmazon S3 Amazon Glacier

Hot data Warm data Cold data

Data

Str

uctu

re

Low

High

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

NoSQLObject

Archival

Search

In-MemoryWarehouse

Amazon Redshift

Page 9: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who is your audience?

Page 10: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

PRIORITIES NEEDS

Creating engaging visual and narrative journeys

for analytical solutionsData Visualizer

Manages data as a product. Ensures freshness

and consistency of data; understands lineage and

compliance needs; treats DS as customers

Data Product

Manager

Monitoring for reliability, quickly diagnose

deployment or availability issues

DevOps

Engineer

ROLE

Visualization

Dashboards

Reporting

Reports – data quality, errors

Ad hoc querying

Dashboards

Makes sense of data, generates and communicates

insights to improve or create business processes,

creates predictive ML models to support themData Scientist

Ad hoc querying

Robust ML tools

Builds scalable pipelines, transforms and loads data

into structures complete with metadata that can be

readily consumed by DS

Data

Engineer

Ad hoc querying

Quick visualization

Vetting the priortization and ROI, funding projects,

providing ongoing feedback

Business

Sponsor

Reporting

Dashboards

Page 11: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning

Page 12: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dashboards

Visual Representation of key metrics that change over time

• Data structure - Low

• Usage - Near real-time visualization

• Data temperature - Hot

Available Services:

AWS Lambda Amazon DynamoDB Amazon Kinesis

Data Streams

Amazon Elasticsearch

Service

Page 13: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dashboards – Near Real-time

Amazon

EMRAWS Glue

OR

ETL

Data Lake

Amazon

S3

Raw Bucket Transformed Data

Bucket

DynamoDBUsers

EC2

Containers

Serverless

OR

OR

Web serving layer

Page 14: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dashboards + Search

Amazon

EMRAWS Glue

OR

ETL

Data Lake

Amazon

S3

Raw Bucket Transformed Data

Bucket

DynamoDB

Users

Dynamo

Streams

Amazon Kinesis

Firehose

AWS

Lambda

Amazon

Elasticsearch

Page 15: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Ad Hoc Analysis

Information sought on an as-needed basis

• Usage - Dynamic Data Querying

• Data structure - Case based

• Data temperature - Medium - cold

Available Services:

Amazon Redshift

SpectrumAthena Amazon

EMR

Amazon

ElasticSearch

Page 16: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Reports and Ad-Hoc Analysis

Amazon

QuickSight

OR

Amazon Redshift

Athena

Amazon

EMRAWS Glue

OR

ETL

Data Lake

Amazon

S3

Raw Bucket Transformed Data

BucketAmazon Redshift

Spectrum

Page 17: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Machine Learning

Data labeled with outcomes to train predication models

• Usage - Machine learning data preparation

• Data structure - Case based

• Data temperature - Medium - cold

Available Services:

Amazon

EMR

Amazon

SageMaker

Page 18: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Machine Learning

Amazon EMR

Users

Amazon

EMRAWS Glue

OR

ETL

Data Lake

Amazon

S3

Raw Bucket Transformed Data

Bucket

Amazon

SageMaker

Page 19: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Reports

Static representations of data rendered at a point in time

• Usage - Point in time data extraction

• Data structure - High

• Data temperature – Medium - cold

Available Services:

Amazon Redshift Amazon Athena Amazon QuickSight

Page 20: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift

Data Scientists & Developers

Business

UsersAmazon Redshift

Amazon

QuickSight

ORAmazon

EMRAWS Glue

OR

ETL

Data Lake

Amazon

S3

Raw Bucket Transformed Data

Bucket

Amazon EMRData

Scientist

BI/BA

Engineer

Page 21: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Processing & Analytics

Transactional & RDBMS

DynamoDB

NoSQL DB Relational Database

Aurora

BI & Data Visualization

Kinesis Streams

& Firehose

Batch

EMR

Hadoop, Spark,

Presto

Redshift

Data Warehouse

Athena

Query Service

AWS Batch

Predictive

Real-time

AWS LambdaApache Storm

on EMR

Apache Flink

on EMR

Spark Streaming

on EMR

Elasticsearch

ServiceKinesis Analytics,

Kinesis Streams

ElastiCache DAX

Page 22: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon AI – Predictive

Page 23: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Summary

AWS enables you to build sophisticated big data applications

• Retrospective, Real-time, Predictive

Understand who is the user

• Business user, Data Scientist & Developers

Use the right tool for the job• Data structure, latency, throughput, access patterns

Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin

Page 24: Consuming The Data Lake - awstrainingday.com

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!