Build an SQL-based data processing

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Build an SQL-based data processing pipeline in minutes

A N A 0 9

Melody Yang

Data & Analytics Specialist SA

Amazon Web Services

Delivery Speed Limited skillSecurity

Challenges

10 3 Layers

70 Days7 Days

An example of the challenge

Design principles

Requirements

Shorten analytics lifecycle

Repeatable and scalable

Use existing analysis skill to improve data quality

Design Principles

Microservices for batch and stream processes

Configuration driven and codeless ETL

SQL first approach

An ETLframework

10 job Configuration files in S3

A job dependency definition

7 days effort70 days effort

The new solution

Standardised API

Standardised practice

Security assurance

Networking Access control Visibility

• Private connection

• Isolated network

• Load balancer

• Role based control

• Secret protection

• Encryption

• Granular level logging

• Integrated alert service

• Readable business logic

Amazon CloudWatch

Services

Amazon Elastic

Container Registry

AWS Secrets

Manager

Amazon S3

AWS Glue Data

Catalog

Amazon Athena

ARC Jupyter

Notebook services

ARC Job

services

High level architecture

Solution workflow

AWS Open

Data Registry

PrepareBuild &

TestExecute

Store & Catalogue Consume

Solution workflow

PrepareBuild &

TestExecute


ARC Jupyter

Notebook

Services

AWS Open

Data Registry

Solution workflow

PrepareBuild &

TestExecute


ARC Job

Services

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

Solution workflow

AWS Glue

Data Catalog

Amazon S3

ARC Job

Services

ARC Jupyter

Notebook

Services

AWS Open

Data Registry

PrepareBuild &

TestExecute


Build a data pipeline

in minutes, not weeks

Leverage

microservices

Highly secure

and transparent

• SQL first approach, no more

custom-code

• Empower business users with

self- service ETL

• Move away from monolithic

architecture with improved

productivity and speed

• Fault tolerance and easy

to scale

• Multiple layers of security

controls

• Granular level logging,

validation and alert

Summary

Resources

AWS Reference Architecture

• SQL Based Data Processing in Amazon ECS

ARC reference

• Documentation and tutorial - https://arc.tripl.ai/tutorial/

• CI/CD Example - https://github.com/tripl-ai/deploy

• Forum - https://github.com/tripl-ai/question

• Source Code - https://github.com/tripl-ai/arc

• Docker Hub - https://hub.docker.com/u/triplai

Thank you!


Melody Yang

[email protected]

Build an SQL-based data processing

Documents

Transcript of Build an SQL-based data processing