AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
Introduction to Amazon DynamoDB · © 2020, Amazon Web Services, Inc. or its Affiliates. Pete...
Transcript of Introduction to Amazon DynamoDB · © 2020, Amazon Web Services, Inc. or its Affiliates. Pete...
© 2020, Amazon Web Services, Inc. or its Affiliates.
Pete NaylorSr Database Specialist SA - AWS
Introduction to Amazon DynamoDB- and some use case examples
© 2020, Amazon Web Services, Inc. or its Affiliates.
Agenda
• DynamoDB’s place in the history of purpose-built databases
• Key concepts
• How do I use DynamoDB?
- Basic data models and controls
• Challenges with changing and imbalanced loads
- And how DynamoDB deals with them
• Integrated DynamoDB architecture and data flows
© 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates.
Database evolution and DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates.
2004 – Amazon.com operational database scaling woeshttps://www.cnet.com/news/amazon-com-hit-with-outages/
2007 – Whitepaper describing internal distributed database solutionhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
2012 – Reimagined implementation made available via AWS as DynamoDBhttps://aws.amazon.com/about-aws/whats-new/2012/01/18/aws-announces-dynamodb/
2018 – Amazon.com almost entirely migrated to Redshift, Aurora, DynamoDBhttps://twitter.com/ajassy/status/1060979175098437632
2019 – Continuing rapid innovation informed by unmatched experiencehttp://amzn.to/2fAJXH9
DynamoDB historyHow did we get here?
© 2020, Amazon Web Services, Inc. or its Affiliates.
1. Reduce dependence on commercially licensed relational database engines
2. Minimize operational complexity and administrative overhead
1. Reduce dependence on commercially licensed relational database engines
2. Minimize operational complexity and administrative overhead
3. Provide best possible customer experience across all data indexing needs
DynamoDB historyMotivations at Amazon
1. Reduce dependence on commercially licensed relational database engines
“A one size fits all database doesn't fit anyone”- Werner Vogels
(and elsewhere)
© 2020, Amazon Web Services, Inc. or its Affiliates.
Internet-scale applications
Users 1M+
Data volume TB-PB-EB
Locality Global
Performance Milliseconds-microseconds
Request rate Millions
Access Mobile, IoT, devices
Scale Incrementally based on traffic
Economics Pay as you go
Developer access Instant API access
© 2020, Amazon Web Services, Inc. or its Affiliates.
By: A
rturo
Par
davi
la II
Iht
tps:
//ww
w.fl
ickr
.com
/pho
tos/
apar
davi
la/a
lbum
s/72
1576
7462
4352
641
© 2020, Amazon Web Services, Inc. or its Affiliates.
Snap (Snapchat)
Database writes peak secondsafter Chicago Cubs winthe World Series.
© 2020, Amazon Web Services, Inc. or its Affiliates.
A migration from mainframe
• Retail business ran on mainframe,which became a bottleneck
• Migrated financial transaction data to DynamoDB
• Unbound scale for customers and app developers
“We built a secure and resilient cloud infrastructure that could solve the scalability and reliability problems with a serverless architecture.”
Srini UppalapatiCapital One
© 2020, Amazon Web Services, Inc. or its Affiliates.
SQL and DynamoDB side by side
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views for known patterns
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
Traditional SQL DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates.
Prioritiesü Securityü Durabilityü Scalabilityü Availabilityü Low Latencyü Easy to Useü Easy to Manage
Core Strengths of DynamoDBSounds cool, but when should I use it?
Considerations:• SQL / NoSQL• Relational / Non-Relational• Operational / Analytical
DynamoDB solves for:• Horizontal scaling• Decoupled compute/storage• Asynchronous roll-ups/aggregations• Availability/Durability/Latency• Well-known access patterns• Schema flexibility
© 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates.
Key concepts
© 2020, Amazon Web Services, Inc. or its Affiliates.
Scaling databases
Traditional SQL NoSQL
DBDB
Scale up
DBP1
DBPn
DBP2
DBP3
Scale out to many shards
© 2020, Amazon Web Services, Inc. or its Affiliates.
Scaling NoSQL databases
Most NoSQL databases DynamoDB
DB
Servers and clusters DynamoDB: partitions
DBDB
DBDB
DBDB
DBDB
DB
DBP1
DBP1
DBP1
DBP1
DBP1
DBP1DB
P1DBP1
DBP1
DBP1
DBP1
DBP1DB
P1DBP1
DBP1
DBP1
DBP1
DBP1DB
P1DBP1
DBP1
DBP1
DBP1
DBP1DB
P25DBP26
DBP27
DBP28
DBP29
DBP30
Basic premise: There is a way to shard data that’s horizontally scalable.
© 2020, Amazon Web Services, Inc. or its Affiliates.
Incremental scaling with DynamoDB
Workload:data volume, reads, writes
DynamoDB resources:storage, read, and write capacity
© 2020, Amazon Web Services, Inc. or its Affiliates.
Server 1
T1.p1
Table 1 Table 2 Table 3
Server N
T1.pn
You work with tables…
1K WU or 3K RUup to 10 GB
DynamoDB does the rest under the hood…
© 2020, Amazon Web Services, Inc. or its Affiliates.
DynamoDB table
Items
Primary key – unique item identifier
Terminology
Sort key conditions:
all items==, <, >, >=, <=“begins with”“between”
sorted resultscountstop/bottom N
AttributesPartition key
- 1:1 relationships- distribute traffic- collection identity
Sort key(optional)
- 1:many relationships- collect related items- efficient filtering- sorting
© 2020, Amazon Web Services, Inc. or its Affiliates.
Global secondary index (GSI)
GSIsA5
(part.)A4
(sort)A1
(table key)A3
(projected)
Table
INCLUDE A3
A4(part.)
A5(sort)
A1(table key)
A2(projected)
A3(projected) ALL
A2(part.)
A1(table key) KEYS_ONLY
Capacity provisioned separately from table
Online indexing
A1(partition)
A2 A3 A4 A5
Alternate partition (+sort) key for alternate materialized view
Up to 20 GSIs per table
© 2020, Amazon Web Services, Inc. or its Affiliates.
Sharding/partitioning
Hash(1) = 7B
Orders
00
55
AA
FF
Partition A
Partition B
Partition C
Hash.MIN = 0
Hash.MAX = FFKeyspace
Hash(2) = 48
Hash(3) = CD
DynamoDB table
OrderId: 1CountryCode: 1ASIN: [B00X4WHP5E]
OrderId: 2CountryCode : 1ASIN: [B00OQVZDJM]
OrderId: 3CountryCode : 1ASIN: [B00U3FPN4U]
Denormalized Record
Related data is stored together for efficient access
Partition key
© 2020, Amazon Web Services, Inc. or its Affiliates.
Availability Zone A
Partition A
Host 4 Host 6
Availability Zone B Availability Zone C
Partition APartition A Partition CPartition C Partition C
Host 5
Partition B
Host 1 Host 3Host 2
Partition B
Host 7 Host 9Host 8
Partition B
CustomerOrdersTable
Across 3 Availability Zones in the Region -one replica is elected to serve as “leader”.
Three ReplicasOrderId: 1CustomerId: 1ASIN: [B00X4WHP5E]
Hash(1) = 7B
A view “from a different angle”
Partition A
© 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates.
Modeling data for DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates.
Partition key
• A good sharding (partitioning) scheme affords even distribution of both data and workload as they grow
• Key concept: partition key as the dimension of scalability• Distribute traffic and data across partitions – horizontal scaling
• Ideal scaling conditions:• The partition key is from a high cardinality set (that grows)• Requests are evenly spread over the key space• Requests are evenly spread over time
© 2020, Amazon Web Services, Inc. or its Affiliates.
Denormalization
CART
CartId: 1,Version: v3
CART_ITEM
ItemId: 2, Qty: 1,CartId: 1
Traditional database
CART
CartId: 1,Version: v3,Items: [{ID: 2, Qty: 1},{ID: 5, Qty: 2}]
DynamoDB
Normalized schema: multiple items – one table per entity type Denormalized: one item
with flexible schema
© 2020, Amazon Web Services, Inc. or its Affiliates.
Ensuring data consistency on updates
Example: add/remove shopping cart items
1. get cart => vread = Version2. update cart:
IF Version = vreadadd/remove cart items++Version
ELSE go back to Step 1.
Use ConditionExpressionin DynamoDB
Optimistic concurrency control
© 2020, Amazon Web Services, Inc. or its Affiliates.
Updating cart: data consistency using OCC
CART
CartID:2, Version: 3, CartItems: [{ID: 2, Qty: 1},{ID: 5, Qty: 2}]
ü Use conditions to implement optimistic concurrency control ensuring data consistency
ü Single-item operations are ACIDü GetItem call can be eventually consistent
1. Get the cart: GetItem
{ "TableName": “Cart”, "Key": {“CartId”: {“N”: “2”}}
}
2. Update the cart: conditional PutItem (or UpdateItem)
{ "TableName": “Cart”, "Item": {
“CartID”: {“N”: “2”},“Version”: {“N”:”4”}, “CartItems”: {…}
},"ConditionExpression": “Version = :ver”, "ExpressionAttributeValues": {":ver": {“N”: “3”}}
}
© 2020, Amazon Web Services, Inc. or its Affiliates.
Yes, you can have strongly consistent read-after-write and concurrency control
with DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates.
One-to-one relationships or key-values
• Use a table or GSI with a partition key• Use GetItem or BatchGetItem API
Example: Given a user or email, get attributes
Users tablePartition key Attributes
UserId = bob Email = [email protected], JoinDate = 2011-11-15UserId = fred Email = [email protected], JoinDate = 2011-12-01
Users-Email-GSIPartition key Attributes
Email = [email protected] UserId = bob, JoinDate = 2011-11-15Email = [email protected] UserId = fred, JoinDate = 2011-12-01
© 2020, Amazon Web Services, Inc. or its Affiliates.
One-to-many relationships or parent-children
• Use a table or GSI with a partition and sort key• Use the Query API to get multiple items
Example: Given a device, find all readings between epoch X, Y
Device-measurementsPart. key Sort key AttributesDeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
© 2020, Amazon Web Services, Inc. or its Affiliates.
Many-to-many relationships
• Use a table and GSI with the partition and sort key elements switched
• Use the Query API
Example: Given a user, find all games. Or given a game, find all users.
User-Games-TablePart. key Sort keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3
Game-Users-GSIPart. key Sort keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob
© 2020, Amazon Web Services, Inc. or its Affiliates.
Yes, you can model complex data relationships with DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates.
Challenges with growing datasets, variable throughput, imbalanced workloads
© 2020, Amazon Web Services, Inc. or its Affiliates.
Time-To-Live (TTL)Enabled TTL on Table
© 2020, Amazon Web Services, Inc. or its Affiliates.
Provisioned Mode with Auto Scaling
© 2020, Amazon Web Services, Inc. or its Affiliates.
On-demand Mode: Spiky workloads
© 2020, Amazon Web Services, Inc. or its Affiliates.
Imbalanced load and DynamoDB Adaptive Capacity
DBShard
1
DBShard
n
DBShard
2
DBShard
3
{k=X, v=Y}
Poor distribution of traffic across the indexed data
© 2020, Amazon Web Services, Inc. or its Affiliates.
Dealing with concentrated read throughput
DynamoDB Accelerator (DAX)
Fully managed, highly available: handles all software management, fault tolerant, replication across multi-AZs within a region
DynamoDB API compatible: seamlessly caches DynamoDB API calls, no application re-writes required
Write-through: DAX handles caching for writes
DynamoDB
YourApplications
DynamoDBAccelerator
© 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates.
Integrating DynamoDB into your data flow
© 2020, Amazon Web Services, Inc. or its Affiliates.
Complex queries and analytics
• Use the service that best meets the requirements• Amazon Athena, Amazon Redshift, Amazon Elasticsearch Service…
• Deliver data updates reliably with DynamoDB Streams• Integrates with AWS Lambda• End-to-end serverless
DynamoDB Streams
DynamoDB table
AWS Lambda
© 2020, Amazon Web Services, Inc. or its Affiliates.
Querying in microservices architectures
AWS LambdaDynamoDB Streams
Different materialized views exposed as microservices
Query service 2
Amazon ES
Query service 1
Query service 3
Amazon Redshift
Amazon DynamoDB
Command side Query side
Amazon DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates.
Closing summary
• Data management at internet scale gave rise to DynamoDB
- and to polyglot persistence: use the right database for each job
• DynamoDB: highly-automated (serverless) distributed database
- ideal for mission-critical OLTP use cases
• Remove scaling concerns – distribute your data and traffic
• You can have data consistency and integrity in DynamoDB
• You can model relationships beyond key-value in DynamoDB
• Distributed databases are hard to operate - use DynamoDB!
© 2020, Amazon Web Services, Inc. or its Affiliates.
Thank you!