Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks
-
Upload
amazon-web-services -
Category
Technology
-
view
174 -
download
2
Transcript of Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks
![Page 1: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/1.jpg)
Andrey Zaychikov, Solutions Architect, EMEA21.02.2017
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS
![Page 2: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/2.jpg)
Typical algorithm of choosing right options for NoSQL DB deployments
![Page 3: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/3.jpg)
What we will cover today?
![Page 4: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/4.jpg)
How these databases differs?
DynamoDB
Cloud-based Self-managed (EC2)Key-value Document-oriented
Graph
![Page 5: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/5.jpg)
Cassandra
![Page 6: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/6.jpg)
What is it?• Dynamo model database
+ CQL• Horizontally scalable• No single point of failure • Data is immutable and
stored in collections• JVM based• Lot of management work
is done in a background• Rely on gossip protocol
![Page 7: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/7.jpg)
Main concerns of the customers
Schema & usage pattern
Geo distribution Background routines &
specific optimizations
![Page 8: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/8.jpg)
How does it work?
![Page 9: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/9.jpg)
Choosing instance & storage capacity: 80% Writes
• For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option
• For write-heavy workloads with high RPS requirements C4 with EBS should be considered
• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
![Page 10: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/10.jpg)
Choosing instance & storage capacity: 80% Reads
• For most of the workloads M4s with EBS is the good choice
• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
• When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors
![Page 11: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/11.jpg)
FAQ: 2AZ cluster architecture
Hint: RetryPolicy for Cassandra Driver
![Page 12: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/12.jpg)
FAQ
Cassandra backup / restore
Auto Scaling of Cassandra
clusters
Cassandra in Containers
- Restore procedure for the whole cluster can be complicated
- Restore for single node can be done
with EBS Snapshots
- Auto-scaling puts unpredictable
pressure on the cluster
- Scaling up is simple, but scaling down is
extremely complicated
- Makes sense only for test / dev
environments
![Page 13: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/13.jpg)
FAQ: Troubleshooting
JVM Caching Compaction
Disks I/O CPU Memory
![Page 14: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/14.jpg)
MongoDB
![Page 15: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/15.jpg)
What is it?• Document-oriented
database• Horizontally scalable• HA is based on
master / slave replication
• Geo-distributed• Lots of management
work is done in a background
![Page 16: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/16.jpg)
Main concerns of the customers
Schema & usage pattern
Geo distribution and performance
Data consistency & partition tolerance
![Page 17: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/17.jpg)
How does it work?
![Page 18: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/18.jpg)
Choosing instance & storage• MongoDB needs a lot of memory
and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset)
• If the dataset is big you should consider to use R4 with different EBS flavors
• For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily
![Page 19: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/19.jpg)
FAQ: 2AZ cluster architecture
Best option: Replica Set in one AZ and Hidden member in another one.
![Page 20: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/20.jpg)
FAQ
MongoDB backup / restore
Querying large amount of data
MongoDB consistency
- Hidden nodes with EBS and EBS
snapshots backups
- Design schema properly
- Avoid using MapReduce on
Master
- Lots of improvements where done but
there are some edge cases
![Page 21: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/21.jpg)
FAQ: Troubleshooting
Mongos performance
Long running queries
Fragmentation
Disks I/O CPU Memory
![Page 22: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/22.jpg)
CouchDB
![Page 23: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/23.jpg)
What is it?• Document-oriented database
built on Dynamo model• Supports RESTful API• Eventual consistency• Lockless optimistic with
conflicts resolution• Horizontally scalable (with
constraints)• Offline-first database• Map reduce to prepare views
![Page 24: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/24.jpg)
How it works?
![Page 25: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/25.jpg)
Choosing instance & storage
![Page 26: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/26.jpg)
FAQ: 2AZ cluster architecture• You should plan
replication schema on your own so it is your responsibility to check how it will behave in case of DR event
![Page 27: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/27.jpg)
FAQ
Proper replication schema
Indexed views & its performance
Proxy for requests
![Page 28: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/28.jpg)
Aerospike
![Page 29: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/29.jpg)
What is it?• In-memory key-
value database• High and
constant performance
• Sharing-nothing architecture
• Geo-distributed (hash partitions)
• Master-slave replication
![Page 30: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/30.jpg)
How does it work?
![Page 31: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/31.jpg)
Choosing instance & storage• Aerospike is used when
the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.
![Page 32: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/32.jpg)
FAQ: 2AZ cluster architecture• If one AZ goes down
depending on you replication factor you will still have a copy of data
• Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes
• It takes time to replicate data
![Page 33: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/33.jpg)
FAQ
Aerospike backup / restore
Auto Scaling of Aerospike clusters
Aerospike in Containers
- Restore procedure for the whole cluster can be complicated
- Restore for single node can be done
with EBS Snapshots
- Auto-scaling puts unpredictable
pressure on the cluster
- Scaling up is simple, but scaling down is
complicated
- Does not make any sense
![Page 34: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/34.jpg)
FAQ: Troubleshooting
Disks I/O CPU Memory
![Page 35: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/35.jpg)
![Page 36: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/36.jpg)
What is it?• Graph database• JVM based• Provides REST API • Two clustering modes:
HA cluster & Casual cluster
• Two types of nodes – Core nodes & Read replicas (RAFT protocol)
• Uses Cypher language for querying Neo4j Casual Clustering
![Page 37: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/37.jpg)
How does it work?
![Page 38: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/38.jpg)
Choosing instance & storage
![Page 39: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/39.jpg)
FAQ: 2AZ cluster architecture• If AZ fails and the
master node was in it – new master election procedure is initiated
• Core nodes in Casual cluster mode vote by simple majority
• If majority is unavailable cluster becomes read-only
![Page 40: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/40.jpg)
FAQ: Troubleshooting
JVM Page Caching
Disks I/O CPU Memory
![Page 41: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/41.jpg)
NoSQL on EC2:Cost considerations
![Page 42: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/42.jpg)
General cost considerations
Usage pattern (R/W)
RPS Size of the dataset
Traffic costs Object size Number of nodes
![Page 43: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/43.jpg)
Cost: Performance / Size• If you want to be always cost
effective and efficient than deployment is a journey for you
• Consider EBS as main option for most of the workloads
• If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS
![Page 44: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/44.jpg)
Sum up• There is no general solution for
all cases• Context matters and the
solution should follow the changing context
• Apps and code should be adapted to the way NoSQL DBs work
• Initial choice of the deployment options can be changed
• Best way to make initial choice of the deployment – PoC
![Page 45: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks](https://reader035.fdocuments.net/reader035/viewer/2022070510/58acaa351a28ab68608b47e1/html5/thumbnails/45.jpg)
Thank you!