Build an SQL-based data processing

21

Transcript of Build an SQL-based data processing

Page 1: Build an SQL-based data processing
Page 2: Build an SQL-based data processing

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Build an SQL-based data processing pipeline in minutes

A N A 0 9

Melody Yang

Data & Analytics Specialist SA

Amazon Web Services

Page 3: Build an SQL-based data processing
Page 4: Build an SQL-based data processing

Delivery Speed Limited skillSecurity

Challenges

Page 5: Build an SQL-based data processing

10 3 Layers

70 Days7 Days

An example of the challenge

Page 6: Build an SQL-based data processing

Design principles

Requirements

Shorten analytics lifecycle

Repeatable and scalable

Use existing analysis skill to improve data quality

Design Principles

Microservices for batch and stream processes

Configuration driven and codeless ETL

SQL first approach

Page 7: Build an SQL-based data processing

An ETLframework

10 job Configuration files in S3

A job dependency definition

7 days effort70 days effort

The new solution

Page 8: Build an SQL-based data processing

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 9: Build an SQL-based data processing

Standardised API

Page 10: Build an SQL-based data processing

Standardised practice

Page 11: Build an SQL-based data processing

Security assurance

Networking Access control Visibility

• Private connection

• Isolated network

• Load balancer

• Role based control

• Secret protection

• Encryption

• Granular level logging

• Integrated alert service

• Readable business logic

Page 12: Build an SQL-based data processing

Amazon CloudWatch

Services

Amazon Elastic

Container Registry

AWS Secrets

Manager

Amazon S3

AWS Glue Data

Catalog

Amazon Athena

ARC Jupyter

Notebook services

ARC Job

services

High level architecture

Page 13: Build an SQL-based data processing

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 14: Build an SQL-based data processing

Solution workflow

AWS Open

Data Registry

PrepareBuild &

TestExecute

Store & Catalogue Consume

Page 15: Build an SQL-based data processing

Solution workflow

PrepareBuild &

TestExecute

Store & Catalogue Consume

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

Page 16: Build an SQL-based data processing

Solution workflow

PrepareBuild &

TestExecute

Store & Catalogue Consume

ARC Job

Services

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

Page 17: Build an SQL-based data processing

Solution workflow

AWS Glue

Data Catalog

Amazon S3

ARC Job

Services

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

PrepareBuild &

TestExecute

Store & Catalogue Consume

Page 18: Build an SQL-based data processing

Solution workflow

AWS Glue

Data Catalog

Amazon S3

ARC Job

Services

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

PrepareBuild &

TestExecute

Store & Catalogue Consume

Page 19: Build an SQL-based data processing

Build a data pipeline

in minutes, not weeks

Leverage

microservices

Highly secure

and transparent

• SQL first approach, no more

custom-code

• Empower business users with

self- service ETL

• Move away from monolithic

architecture with improved

productivity and speed

• Fault tolerance and easy

to scale

• Multiple layers of security

controls

• Granular level logging,

validation and alert

Summary

Page 20: Build an SQL-based data processing

Resources

AWS Reference Architecture

• SQL Based Data Processing in Amazon ECS

ARC reference

• Documentation and tutorial - https://arc.tripl.ai/tutorial/

• CI/CD Example - https://github.com/tripl-ai/deploy

• Forum - https://github.com/tripl-ai/question

• Source Code - https://github.com/tripl-ai/arc

• Docker Hub - https://hub.docker.com/u/triplai

Page 21: Build an SQL-based data processing

Thank you!

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Melody Yang

[email protected]