Build an SQL-based data processing
Transcript of Build an SQL-based data processing
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build an SQL-based data processing pipeline in minutes
A N A 0 9
Melody Yang
Data & Analytics Specialist SA
Amazon Web Services
Delivery Speed Limited skillSecurity
Challenges
10 3 Layers
70 Days7 Days
An example of the challenge
Design principles
Requirements
Shorten analytics lifecycle
Repeatable and scalable
Use existing analysis skill to improve data quality
Design Principles
Microservices for batch and stream processes
Configuration driven and codeless ETL
SQL first approach
An ETLframework
10 job Configuration files in S3
A job dependency definition
7 days effort70 days effort
The new solution
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Standardised API
Standardised practice
Security assurance
Networking Access control Visibility
• Private connection
• Isolated network
• Load balancer
• Role based control
• Secret protection
• Encryption
• Granular level logging
• Integrated alert service
• Readable business logic
Amazon CloudWatch
Services
Amazon Elastic
Container Registry
AWS Secrets
Manager
Amazon S3
AWS Glue Data
Catalog
Amazon Athena
ARC Jupyter
Notebook services
ARC Job
services
High level architecture
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution workflow
AWS Open
Data Registry
PrepareBuild &
TestExecute
Store & Catalogue Consume
Solution workflow
PrepareBuild &
TestExecute
Store & Catalogue Consume
ARC Jupyter
Notebook
Services
AWS Open
Data Registry
Solution workflow
PrepareBuild &
TestExecute
Store & Catalogue Consume
ARC Job
Services
ARC Jupyter
Notebook
Services
AWS Open
Data Registry
Solution workflow
AWS Glue
Data Catalog
Amazon S3
ARC Job
Services
ARC Jupyter
Notebook
Services
AWS Open
Data Registry
PrepareBuild &
TestExecute
Store & Catalogue Consume
Solution workflow
AWS Glue
Data Catalog
Amazon S3
ARC Job
Services
ARC Jupyter
Notebook
Services
AWS Open
Data Registry
PrepareBuild &
TestExecute
Store & Catalogue Consume
Build a data pipeline
in minutes, not weeks
Leverage
microservices
Highly secure
and transparent
• SQL first approach, no more
custom-code
• Empower business users with
self- service ETL
• Move away from monolithic
architecture with improved
productivity and speed
• Fault tolerance and easy
to scale
• Multiple layers of security
controls
• Granular level logging,
validation and alert
Summary
Resources
AWS Reference Architecture
• SQL Based Data Processing in Amazon ECS
ARC reference
• Documentation and tutorial - https://arc.tripl.ai/tutorial/
• CI/CD Example - https://github.com/tripl-ai/deploy
• Forum - https://github.com/tripl-ai/question
• Source Code - https://github.com/tripl-ai/arc
• Docker Hub - https://hub.docker.com/u/triplai
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Melody Yang