Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video...

51
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Eric Johnson Senior Developer Advocate - Serverless AWS @edjgeek Big “Serverless” Data Powering Big Data with Serverless Background Image by Эдуард Ризванов from Pixabay

Transcript of Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video...

Page 1: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Eric JohnsonSenior Developer Advocate - Serverless

AWS

@edjgeek

Big “Serverless” DataPowering Big Data with Serverless

Background Image by Эдуард Ризванов from Pixabay

Page 2: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who am I?

• Sr. Developer Advocate – Serverless, AWS

• Serverless / Tooling / Automation Geek

• Software Architect / Solutions Architect

• Husband to Brigitte

• Father to Noah, Jake, Owen

Sophie Anne, & Gracie Mae

• Music lover

• Pizza / Diet Dr. Pepper fanatic

Page 3: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why are we here?

Page 4: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Serverless in big data processing

Amazon Kinesis Video Streams

Amazon KinesisData Streams

Amazon Kinesis Data Firehose

Amazon KinesisData Analytics

Amazon Athena AWS Lambda Amazon SimpleStorage Service

Amazon DynamoDB

Understanding the role Serverless plays in Big Data

Page 5: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda

Ingestion

Real-time processing

Real-time analytics

Post processing

Page 6: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What is serverless?

No infrastructure provisioning, no management

Automatic scaling

Pay for value Highly available and secure

Page 7: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Ingestion

Page 8: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Ingesting data at scale

Amazon Kinesis Video Streams

Amazon KinesisData Streams

Amazon Kinesis Data Firehose

Video Ingestion Data Ingestion

Page 9: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Video ingestion

• Fully managed infrastructure that scales to load

• Offers SDK in C++ and Java

• Supports live and on-demand playback of streams

• Durable storage using Amazon S3

• Works with many forms of time encoded data

• Supports multiple time code based formats

Amazon Kinesis Video Streams

Page 10: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data ingestion – Kinesis Data Streams

• Uses shards to scale• 1 MB or 1000 records /second/shard ingress

• 2 MB/second/shard egress

• Works with Kinesis Data Analytics

• Can support connected consumers for enhanced fanout

• Can store data up to 168 hours (7 days)

Amazon Kinesis Data Streams

Page 11: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data ingestion – Kinesis Firehose

• Auto-scales to meet load• Different regions have different capacity

• US East: 5,000 records/second, 2,000

transactions/second, and 5 MiB/second.

• Works with Kinesis Data

Analytics

• Can transform data before

delivery to target

• Stores data up to 24 hours

on failed delivery

Amazon Kinesis Firehose

Page 12: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data ingestion – Kinesis Firehose

Data Sources Targets

• Firehose PUT APIs• Amazon Kinesis

Agent• AWS IoT• CloudWatch Logs• CloudWatch Events

• Amazon S3• Amazon Redshift• Amazon

Elasticsearch Service

Page 13: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Data Stream

Kinesis Data Streams vs. Kinesis Firehose

Kinesis Firehose

Amazon KinesisData Stream

Data Producers010001110010100

01000111001001101010100010010100010100

01000100101110100

010010100010100

010010100010100010010100010100

010010100010100010010100010100

Data Producers

Amazon Kinesis Data Firehose

01000111001001101010100010010100010100

010010100010100

010010100010100010010100010100

01000111001001101010100

01000111001001101010100

01000111001001101010100

01000111001001101010100

010001101010100

010001101100

Page 14: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Firehose

Kinesis Firehose

Data Producers

Amazon Kinesis Data Firehose

01000111001001101010100010010100010100

010010100010100

010010100010100010010100010100

01000111001001101010100

01000111001001101010100

01000111001001101010100

01000111001001101010100

010001101010100

010001101100

Use Kinesis Firehose when you need:• Ability to transform data in the stream• Auto scaling for unpredictable load• Multiple targets for final data

Page 15: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Data Stream

Kinesis Data Streams

Amazon KinesisData Stream

Data Producers010001110010100

01000111001001101010100010010100010100

01000100101110100

010010100010100

010010100010100010010100010100

010010100010100010010100010100

Use Kinesis Data Streams when:• You have semi-predictable traffic• You need to perform real-time action on

data in the stream

Page 16: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time

processing

Page 17: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Data Stream + Lambda

Amazon KinesisData Stream

Data Producers

Lambda function

Lambda function

Lambda function

Amazon DynamoDB

Amazon KinesisData Stream

AWS IoT Core

Lambda services handles intermittent pollingvia GetRecords API

Page 18: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Data Stream + Lambda

Amazon KinesisData Stream

Data Producers

Lambda function

Lambda function

Lambda function

Amazon DynamoDB

Amazon KinesisData Stream

AWS IoT Core

Lambda services handles intermittent pollingvia GetRecords API

All applications share 2 MB/second/shard egress

Page 19: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kinesis Data Stream + Enhanced Fanout + Lambda

Amazon KinesisData Stream

Data Producers

Lambda function

Lambda function

Lambda function

Amazon DynamoDB

Amazon KinesisData Stream

AWS IoT Core

Functions triggered by consumers

Page 20: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon KinesisData Stream

Data Producers

Lambda function

Lambda function

Lambda function

Amazon DynamoDB

Amazon KinesisData Stream

AWS IoT Core

Functions triggered by consumers

Each consumer provides an individual 2 MB/second/shard egress

Kinesis Data Stream + Enhanced Fanout + Lambda

Page 21: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Video Processing

Amazon Kinesis

Video Streams

Amazon Rekognition video

Amazon SageMaker

S3 Bucket

Page 22: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Video Processing

Amazon Kinesis

Video Streams

Amazon Rekognition video

Amazon SageMaker

Real time analysis and machine learning

S3 Bucket

Page 23: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Video Processing

Amazon Kinesis

Video Streams

Amazon Rekognition video

Amazon SageMaker

Real time analysis and machine learning

S3 Bucket

HLS Compatible live oron-demand playback

Page 24: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Video Processing

Amazon Kinesis

Video Streams

Amazon Rekognition video

Amazon SageMaker

Real time analysis and machine learningHLS Compatible live oron-demand playback

S3 Bucket

Near real-time processing

Page 25: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time

analytics

Page 26: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Kinesis Data Analytics

• Built-in functions to filter, aggregate, and transform streaming data

• Processes streaming data with sub-second latencies

• Build SQL queries that perform joins, aggregations over time windows and filters

• includes open source libraries based on Apache Flink that enable you to build an application in hours instead of months

Amazon KinesisData Analytics

Page 27: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time analytics

Amazon KinesisData Stream

Amazon Kinesis Data Firehose

Amazon KinesisData Analytics

Stream source can be Kinesis Data Stream or Firehose

Page 28: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Inside Kinesis Data Analytics

Stream

data

-- Create Fail Stream --CREATE OR REPLACE STREAM "FAIL_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "FAIL_STREAM_PUMP" AS INSERT INTO "FAIL_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%FAIL%';

-- Create Warn Stream --CREATE OR REPLACE STREAM "WARN_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "WARN_STREAM_PUMP" AS INSERT INTO "WARN_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%WARN%';

FAIL_STREAM

WARN_STREAM

Page 29: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Inside Kinesis Data Analytics

Stream

data

-- Create Fail Stream --CREATE OR REPLACE STREAM "FAIL_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "FAIL_STREAM_PUMP" AS INSERT INTO "FAIL_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%FAIL%';

-- Create Warn Stream --CREATE OR REPLACE STREAM "WARN_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "WARN_STREAM_PUMP" AS INSERT INTO "WARN_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%WARN%';

FAIL_STREAM

WARN_STREAM

Use SQL or Apache Flink to filter data

Page 30: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Inside Kinesis Data Analytics

Stream

data

-- Create Fail Stream --CREATE OR REPLACE STREAM "FAIL_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "FAIL_STREAM_PUMP" AS INSERT INTO "FAIL_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%FAIL%';

-- Create Warn Stream --CREATE OR REPLACE STREAM "WARN_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "WARN_STREAM_PUMP" AS INSERT INTO "WARN_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%WARN%';

FAIL_STREAM

AWS Lambda

• Alert• Diagnose• Remediat

e

Page 31: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Inside Kinesis Data Analytics

Stream

data

-- Create Fail Stream --CREATE OR REPLACE STREAM "FAIL_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "FAIL_STREAM_PUMP" AS INSERT INTO "FAIL_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%FAIL%';

-- Create Warn Stream --CREATE OR REPLACE STREAM "WARN_STREAM" (

sensorId INT,currentTemperature INT,status VARCHAR(10)

);

CREATE OR REPLACE PUMP "WARN_STREAM_PUMP" AS INSERT INTO "WARN_STREAM"SELECT "sensorId", "currentTemperature", "status"FROM "SOURCE_SQL_STREAM_001"WHERE "status" SIMILAR TO '%WARN%';

WARN_STREAM

Amazon KinesisData Stream

• Dashboards

• Consumer response

Page 32: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time analytics

Amazon KinesisData Stream

Amazon Kinesis Data Firehose

Amazon KinesisData Analytics

Amazon KinesisData Stream

AWS Lambda

FAIL_STREAM

WARN_STREAM

What about the raw data?

Page 33: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time analytics

Amazon KinesisData Stream

Amazon Kinesis Data Firehose

Amazon KinesisData Analytics

Amazon KinesisData Stream

Amazon Kinesis Data Firehose

AWS Lambda

FAIL_STREAM

WARN_STREAM

Raw Data Archive

Page 34: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Post processing

Page 35: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Serverless data storage

Amazon SimpleStorage Service

Amazon DynamoDB

Amazon Timestream

AmazonQuantum Ledger

Database

Amazon CloudWatc

h

Amazon Kinesis Data Firehose

Amazon Kinesis Data Streams

Amazon KinesisData Analytics

Page 36: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Serverless data storage

Amazon SimpleStorage Service

Amazon DynamoDB

Amazon Timestream

AmazonQuantum Ledger

Database

Amazon CloudWatc

h

Amazon Kinesis Data Firehose

Amazon Kinesis Data Streams

Amazon KinesisData Analytics

How you need to process

your data determines

where to store it

Page 37: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Serverless storage options

Amazon SimpleStorage Service

Amazon DynamoDB

Amazon Timestream

AmazonQuantum Ledger

Database

Amazon CloudWatc

h

• Immutable and transparent

• Cryptographically Verifiable

• Object storage• Unstructured

data

• Structured data• Alerting built in

• NoSQL• Key value or

document data

• Time series database

Page 38: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Post processing – Serverless Tools

Amazon AthenaQuery S3 data with standard SQL expressions

Amazon S3 SelectRetrieve subsets of object data, instead of the entire object.

AWS GlueExtract, transform, and load (ETL) service that works across multiple services.

Page 39: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Other non-serverless services• MariaDB

• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

Critical data can be stored in many places

Page 40: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Crawler Data Catalog

Other non-serverless services• MariaDB

• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

Page 41: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Crawler Data Catalog

Other non-serverless services• MariaDB

• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

What it is doing • Classifies data to determine the format,

schema, and associated properties of the raw data

• Groups data into tables or partitions – Data is grouped based on crawler heuristics.

• Writes metadata to the Data Catalog

Page 42: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Crawler Data Catalog

Other non-serverless services• MariaDB

• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

This catalog contains meta-data about the data stores. How do I get the data itself in a meaningful way?

Page 43: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enter: AWS Athena

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Crawler Data Catalog

Other non-serverless services• MariaDB

• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

Amazon Athena

Page 44: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Athena

Bucket

Bucket

Bucket

Bucket

Bucket

DynamoDB TableDynamoDB

Table

DynamoDB Table

DynamoDB Table

DynamoDB Table

Crawler Data Catalog

Other non-serverless services• Amazon Aurora

• MariaDB• Microsoft SQL Server• MySQL• Oracle• PostgreSQL

Athena queries Glue Data Catalog

Glue returns data from data source Amazon Athena

Page 45: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Athena

Page 46: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Question

I have HUGE compressed CSV files

stored on Amazon S3.

How do I get small bits of data without

reading the entire file?

Page 47: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enter: Amazon S3 Select

import boto3

s3 = boto3.client('s3’)

r = s3.select_object_content(

Bucket='jbarr-us-west-2’,

Key='sample-data/airportCodes.csv’,

ExpressionType='SQL’,

Expression="select * from s3object s where s.\"Country (Name)\" like '%United States%’”,

InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},

OutputSerialization = {'CSV': {}}, )

for event in r['Payload’]:

if 'Records' in event:

records = event['Records']['Payload'].decode('utf-8’)

print(records)

elif 'Stats' in event:

statsDetails = event['Stats']['Details’]

Page 48: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Before S3 Select

Lambda function

Bucket

001010010110110010010101010010100111001001001101100101100101010001001001111001000011001001001111110010000000110110010100110000010100101101100100101010100101001110010010011011001011001010100010010011110010000110010010011111100100000001101100101001100000101001011011001001010101001010011100100100110110010110010101000100100111100100001100100100111111001000000011011001010011000001010010110110010010101010010100111001001001101100101100101010001001001111001000011001001001111110010000000110110010100110000010100101101100100101010100101001110010010011011001011001010100010010011110010000110010010011111100100000001101100101001100000101001011011001001010101001010011100100100110110010110010101000100100111100100001100100100111111001

0000000110110010100110

Entire file returned

Page 49: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

001010010110110010010101010010100111001001001101100101100101010001001001111001000011001001001111110010000000110110010100110000010100101101100100101010100101001110010010011011001011001010100010010011110010000110010010011111100100000001101100101001100000101001011011001001010101001010011100100100110110010110010101000100100111100100001100100100111111001000000011011001010011000001010010110110010010101010010100111001001001101100101100101010001001001111001000011001001001111110010000000110110010100110000010100101101100100101010100101001110010010011011001011001010100010010011110010000110010010011111100100000001101100101001100000101001011011001001010101001010011100100100110110010110010101000100100111100100001100100100111111001

0000000110110010100110

After S3 Select

Lambda function

Bucket

Parsed value returned

Up to 400% faster and 80% cheaper

Page 50: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Questions?https://pixabay.com/illustrations/questions-font-who-what-how-why-2245264/

Page 51: Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Eric Johnson@edjgeek

Image Source: https://pixabay.com/illustrations/thank-you-polaroid-letters-2490552/