Welcome & AWS Big Data Solution Overview

52
AWS Big Data Solution Overview Ivan Cheng (鄭志帆) AWS Solutions Architect

Transcript of Welcome & AWS Big Data Solution Overview

Page 1: Welcome & AWS Big Data Solution Overview

AWS Big Data Solution Overview

Ivan Cheng (鄭志帆)

AWS Solutions Architect

Page 2: Welcome & AWS Big Data Solution Overview

What is Big Data?

When your data sets become so large and complex

you have to start innovating around how to

collect, store, process, analyze, and share them.

Page 3: Welcome & AWS Big Data Solution Overview

GBTB

PB

ZB

EB

Big Data: Unconstrained Growth

Unstructured data growth is explosive

95% of the 1.2 zettabytes of data in the digital universe is unstructured

Machine data and IoT will only steepen the curve

70% of this data is user-generated content

Source: IDC, The Internet of Things: Getting Ready to Embrace Its Impact on the Digital Economy, March 2016.

Page 4: Welcome & AWS Big Data Solution Overview

The Cloud Was Built for Big Data

Page 5: Welcome & AWS Big Data Solution Overview

Elastic and highly scalable

No upfront capital expense

Only pay for what you use+

+

Available on-demand+

= the Cloud removes constraints

Page 6: Welcome & AWS Big Data Solution Overview

Ingest/

Collect

Consume/

visualizeStore Process/

analyze

Data1 4

0 95 Answers &

insights

START HEREWITH A BUSINESS CASE

Time to answer (Latency)

Cost

Page 7: Welcome & AWS Big Data Solution Overview

Evolution of Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

applications

Page 8: Welcome & AWS Big Data Solution Overview

AWS Big Data Benefits

Immediate Availability. Deploy instantly. No hardware to procure,

no infrastructure to maintain & scale.

Broad & Deep Capabilities. Over 50 services and 100s of features

to support virtually any big data application & workload.

Trusted & Secure. Designed to meet the strictest requirements.

Continuously audited, including certifications such as ISO 27001,

FedRAMP, DoD CSM, and PCI DSS.

Hundreds of Partners & Solutions. Get help from a consulting partner

or choose from hundreds of tools and applications across the entire data

management stack.

Page 9: Welcome & AWS Big Data Solution Overview

AWS Data PipelineAWS Database Migration Service

EMR

Analyze

Amazon

GlacierS3

StoreCollect

Amazon Kinesis

Direct Connect

Amazon

Machine

Learning

Amazon

Redshift

DynamoDB AWS IoT

AWS Snowball

QuickSight

Amazon Athena

EC2Amazon

Elasticsearch

Service

Lambda

AWS Glue

Page 10: Welcome & AWS Big Data Solution Overview

Key AWS Certifications and Assurance Programs

Page 11: Welcome & AWS Big Data Solution Overview

AWS Big Data Customer Success

Page 12: Welcome & AWS Big Data Solution Overview

AWS Big Data Partners

Page 13: Welcome & AWS Big Data Solution Overview

AWS Big Data Service Overview

Page 14: Welcome & AWS Big Data Solution Overview

AWS Database Migration Service

AWS Direct

ConnectAWS

Import/Export

& Snowball

AWS

Storage

Gateway

Data Movement

Page 15: Welcome & AWS Big Data Solution Overview

Storage and Databases

Page 16: Welcome & AWS Big Data Solution Overview

• Store unlimited number of objects

• Designed for 99.999999999% durability

• As Data Lake with integration with other AWS services

(Amazon Kinesis, Amazon Redshift, Amazon EMR, etc.)

• Low cost with tired-storage (Standard, IA, Amazon Glacier)

via life-cycle policy

• Secure – SSL, client/server-side encryption at rest

Amazon S3

Page 17: Welcome & AWS Big Data Solution Overview

• Fully Managed NoSQL Database

• Fast consistent performance (single-digit millisecond latency

at any scale)

• Highly scalable - automatic scaling of throughput capacity

• Highly available and durability

• Store unlimited number of data

Amazon

DynamoDB

Page 18: Welcome & AWS Big Data Solution Overview

• Fully Managed Relational Database Service

• MySQL and PostgreSQL compatible relational database with up to

5x better performance running on the same hardware

• Security, availability, and reliability of commercial databases at

1/10th the cost

• Designed to offer greater than 99.99% availability.

• Automatically grows storage as needed, from 10GB up to 64TB

• Achieve up to 500,000 reads and 100,000 writes per second

Amazon

Aurora

Page 19: Welcome & AWS Big Data Solution Overview

• Fully managed petabyte-scale relational, MPP, data warehousing

• Built-in end-to-end security, including SSL connections and cluster

encryption

• Fault-tolerant - automatically recovers from disk and node failures

• Data automatically backed up to Amazon S3

• $1,000/TB/Year; start at $0.25/hour. Provision in minutes; scale

from 160 GB to 2 PB of compressed data with just a few clicks

Amazon

Redshift

Page 20: Welcome & AWS Big Data Solution Overview

Analytic Frameworks

Page 21: Welcome & AWS Big Data Solution Overview

• Managed Hadoop framework

• Apache Hadoop, Hive, Spark, Zeppelin, Presto, HBase, Phoenix,

Tez, Flink, etc.

• Auto Scaling clusters with support for on-demand and spot pricing

• Support for end-to-end encryption, IAM/VPC, S3 client-side

encryption with customer managed keys and AWS KMS

• Integrates with Amazon S3, Amazon DynamoDB, Amazon Kinesis

and Amazon Redshift

Amazon

EMR

Page 22: Welcome & AWS Big Data Solution Overview

PIG

Amazon

EMR

Amazon

S3

EMRFS

Amazon EMR

Page 23: Welcome & AWS Big Data Solution Overview

• Fully managed, reliable, and scalable Elasticsearch service

• Support for ELK

• Integration options with other AWS services (CloudWatch

Logs, Amazon DynamoDB, Amazon S3, Amazon Kinesis)

• Use Case: log analytics, full text search, application

monitoring, and more.

Amazon

Elasticsearch

Page 24: Welcome & AWS Big Data Solution Overview

• Serverless query service for querying data in S3 using

standard SQL with no infrastructure to manage

• Support for multiple data formats include text, CSV, TSV,

JSON, Avro, ORC, Parquet

• Pay per query only when you’re running queries based on

data scanned. If you compress your data, you pay less and

your queries run faster

Amazon

Athena

Page 25: Welcome & AWS Big Data Solution Overview

Familiar Technologies Under the Covers

Used for SQL Queries

In-memory distributed query engine

ANSI-SQL compatible with extensions

Used for DDL functionality

Complex data types

Multitude of formats

Supports data partitioning

Page 26: Welcome & AWS Big Data Solution Overview

• Fast and cloud-powered Business Analytics

• Easy to use, no infrastructure to manage

• Quick calculations with SPICE

• 1/10th the cost of legacy BI software

• Accessed from any browser or mobile device

Amazon

Quicksight

Page 27: Welcome & AWS Big Data Solution Overview

• Fully managed ETL (extract, transform, load) service

• Integrated data catalog, automatic schema discovery, ETL

code generation, flexible job scheduler

• Integrated across a wide range of AWS services (Amazon

RDS, Database running on Amazon EC2, Amazon Athena,

etc.)

AWS Glue

Page 28: Welcome & AWS Big Data Solution Overview

1. Build your data catalog

2. Generate and Edit Transformations

3. Schedule and Run Your Jobs

How AWS Glue Works

Page 29: Welcome & AWS Big Data Solution Overview

Real-time Analytics

Page 30: Welcome & AWS Big Data Solution Overview

• Fully managed streaming application

• Scalable – handle any amount of streaming data

• Ingest, buffer and process data in real-time

• React quickly – derive insight in seconds

Amazon

Kinesis

Page 31: Welcome & AWS Big Data Solution Overview

Amazon Kinesis

Amazon Kinesis

Streams

Build your own custom

applications that process or

analyze streaming data

Amazon Kinesis

Firehose

Easily load massive volumes

of streaming data into

Amazon S3, Amazon

Redshift, and Amazon

Elasticsearch

Amazon Kinesis

Analytics

Easily analyze data streams

using standard SQL queries

Page 32: Welcome & AWS Big Data Solution Overview

Amazon Kinesis Streams

• Reliably ingest and durably store streaming data at low

cost

• Build custom real-time applications to process

streaming data

Page 33: Welcome & AWS Big Data Solution Overview

Amazon Kinesis Firehose

Reliably ingest and deliver batched, compressed, and encrypted

data to S3, Amazon Redshift, and Amazon Elasticsearch Service

Page 34: Welcome & AWS Big Data Solution Overview

Amazon Kinesis Analytics

Interact with streaming data in real time using SQL

Page 36: Welcome & AWS Big Data Solution Overview

Modern Data Analytics Architecture on AWS

Page 37: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Page 38: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Transactions

Web logs /

cookies

ERP

Data analysts

Data scientists

Business users

Engagement platformsConnected

devices

Social media Automation / events

Page 39: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Page 40: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

AWS Glue

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Page 41: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Page 42: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Page 43: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Page 44: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

Event CaptureAmazon Kinesis

Stream AnalysisAmazon EMR Event Scoring

Amazon AI

Event HandlerAWS Lambda Response Handler

AWS Lambda

Modern data architectureInsights to enhance business applications, new digital services

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Page 45: Welcome & AWS Big Data Solution Overview

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

Event CaptureAmazon Kinesis

Stream AnalysisAmazon EMR Event Scoring

Amazon AI

Event HandlerAWS Lambda Response Handler

AWS Lambda

Modern data architectureInsights to enhance business applications, new digital services

Page 46: Welcome & AWS Big Data Solution Overview

Reference Architecture

Page 47: Welcome & AWS Big Data Solution Overview

Sample Reference Architecture: Data Lake

AthenaGlue

Page 48: Welcome & AWS Big Data Solution Overview

Data Marts

(Amazon

Redshift)

Query Cluster

(EMR)

Query Cluster

(EMR)

Auto Scaling

EC2

Analytics

App

Normalization

ETL Clusters

(EMR)

Batch Analytic

Clusters

(EMR)

Ad Hoc Query

Cluster (EMR)

Auto Scaling

EC2

Analytics

App

Users Data

ProvidersAuto Scaling

EC2

Data

Ingestion

Services

Optimization

ETL Clusters

(EMR)

Shared Metastore

(RDS)

Query Optimized

(S3)

Auto Scaling EC2

Data

Catalog

& Lineage

Services

Reference Data

(RDS)

Shared Data Services

Auto Scaling

EC2

Cluster Mgt

& Workflow

Services

Source of

Truth (S3)

>5 PB, up to 75 billion events per day

Page 49: Welcome & AWS Big Data Solution Overview

Amazon

S3

Amazon

EMR

Amazon

S3

Amazon

Redshift

Amazon

QuickSightData

Sources

Enterprise Data Warehouse

Amazon

Athena

Amazon

Athena

Page 50: Welcome & AWS Big Data Solution Overview

Ingest/

Collect

Consume/

visualizeStore

Process/

analyze

Data

1 40 9

5

Outcomes

& insights

Personalized

recommendations within

seconds (from 15-20 min)

Scale the expertise of

stylists to all shoppers

Reduce costs by 2X order

of magnitude

Mobile Users

Desktop Users

Analytics

Tools

Online Stylist

Amazon

Redshift

Amazon

Kinesis

AWS

Lambda

Amazon

DynamoDBAWS

Lambda

Amazon S3

Data Storage

NORDSTROM

Page 51: Welcome & AWS Big Data Solution Overview

Big Data on AWS:

https://aws.amazon.com/big-data/

Page 52: Welcome & AWS Big Data Solution Overview

Thank you!