Post on 16-Apr-2017
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Michael Labib, Specialist Solutions Architect, AWS
Brian Kaiser, CTO, Hudl
November 29, 2016
DAT306
Amazon ElastiCache Deep DiveBest Practices and Usage Patterns
What to Expect from the Session
• Why we’re here – In-Memory Data Stores
• Amazon ElastiCache Overview
• Usage Patterns
• Scale with Redis Cluster
• Best Practices
• Hudl Presentation
In-Memory Data Stores
Why we’re here
Amazon
ElastiCache
µs are the new ms
In-Memory Key-Value Store
High-performance
Redis and Memcached
Fully managed; Zero admin
Highly Available and Reliable
Hardened by Amazon
Amazon
ElastiCache
Request RateHigh Low
LatencyLow High
Str
uctu
reLow
High
Data VolumeLow High
Amazon
RDS
Amazon S3AmazonGlacier
AmazonCloudSearch and
Amazon Elasticsearch Service
Amazon
DynamoDB
Amazon
ElastiCache
HDFS
Memcached – Fast Caching
Slab allocator
In-memory key-value datastore
Supports strings, objects
Multi-threaded
Insanely fast!
Very established
No persistence
Open Source
Easy to Scale
Redis – The In-Memory Leader
Powerful ~200 commands + Lua scripting
In-memory data structure server
Utility data structuresstrings, lists, hashes, sets, sorted
sets, bitmaps & HyperLogLogs
Simple
Atomic operationssupports transactions
Ridiculously fast!<1ms latency for most commands
Highly Availablereplication
Persistence
Open Source
Redis Data Types - String
• Binary safe.
• Can contain a max value of 512 MB.
• Great for storing Counters, HTML, Images, JSON objects, etc.
valueKey
Key
Redis Data Types - Set
• A collection of unique unordered Strings values
• Great for Deduplicating and Grouping related information
value: 75 value: 1 value: 39 value: 63 value: 63
Duplicate!
Key
Redis Data Types - Sorted Set
• A collection of unique Strings values ordered by score
• Great for Deduplicating, Grouping and Sorting related information
value: mike
score: 50 score: 75
value: dan value: emma
score: 79
value: lina
score: 123
value: luke
score: 350
Key
Redis Data Types - List
HEAD value 1 value 2 value 3 TAIL
• A collection of Strings stored in the order of their insertion
• Push and Pop from head or tail of the list
• Great for message queues and timelines
Key
Redis Data Types - Hashes
Field 1 value 1
• A collection of unordered fields and values
• Great for representing objects
• Ability to Add, GET, and DEL individual fields by Key
Field 2 value 2
Field 3 value 3
Field 4 value 4
Memcached vs. RedisRedis Memcached
Simple Cache offload to database pressure and lower latency
Atomic counter support
Data Sharding (supported in Redis 3.X)
Need support for advanced datatypes such as Lists, Sets, Hashs
Multi-threaded Architecture (takes full advantage of all CPU cores)
Need ability to auto sort data to support Ranking or Leaderboards
Need Pub/Sub capabilities
High Availability and Failover
Persistence
Data volume max size 3.5 TiB 4.7 TiB +
Max key/value size 512MB | 512MB 256 bytes | 1MB
Memcached vs. RedisRedis Memcached
Simple Cache offload to database pressure and lower latency
Atomic counter support
Data Sharding (supported in Redis 3.X)
Need support for advanced datatypes such as Lists, Sets, Hashs
Multi-threaded Architecture (takes full advantage of all CPU cores)
Need ability to auto sort data to support Ranking or Leaderboards
Need Pub/Sub capabilities
High Availability and Failover
Persistence
Data volume max size 3.5 TiB 4.7 TiB +
Max key/value size 512MB | 512MB 256 bytes | 1MB
Amazon ElastiCache
Amazon
ElastiCache
Redis Multi-AZ with Automatic Failover
Open-Source Compatible
Fully Managed
Enhanced Redis Engine
Easy to Deploy, Use and Monitor
No Cross-AZ Data Transfer Costs
Extreme Performance at Cloud Scale
ElastiCache - Customer Value
Enhanced Redis Engine – Hardened by Amazon
Optimized Swap Memory
•Mitigate the risk of increased swap usage during syncs and snapshots.
Dynamic write throttling
•Improved output buffer management when the node’s memory is close to being exhausted.
Smoother failovers
•Clusters recover faster as replicas avoid flushing their data to do a full re-sync with the primary.
Amazon
ElastiCache
Usage Patterns
Caching
Clients
Amazon
ElastiCache
Amazon
DynamoDB
Cache
Reads/Writes
DB
Reads/Writes
Elastic Load
BalancingAmazon
EC2
Amazon
RDS
Better Performance - Microseconds Speed
Cost Effective
Higher Throughput - ~ 20M / RPS
DB
Reads/Writes
AWS
Lambda
Caching
# Write Through
def save_user(user_id, values):
record = db.query("update users ... where id = ?", user_id, values)
cache.set(user_id, record, 300) # TTL
return record
# Lazy Load
def get_user(user_id):
record = cache.get(user_id)
if record is None:
record = db.query("select * from users where id = ?", user_id)
cache.set(user_id, record, 300) # TTL
return record
# App code
save_user(17, {"name": “Big Mike"})
user = get_user(17)
Amazon
ElastiCache
Caching
# Write Through
def save_user(user_id, values):
record = db.query("update users ... where id = ?", user_id, values)
cache.set(user_id, record, 300) # TTL
return record
# Lazy Load
def get_user(user_id):
record = cache.get(user_id)
if record is None:
record = db.query("select * from users where id = ?", user_id)
cache.set(user_id, record, 300) # TTL
return record
# App code
save_user(17, {"name": “Big Mike"})
user = get_user(17)
Amazon
ElastiCache
Write Through1. Updated DB
2. SET in Cache
Lazy Load1. GET from cache.
2. If MISS get from DB
3. Then SET in Cache
1) Install php, apache php memcache client
e.g. yum install php apache php-pecl-memcache
2) Configure “php.ini”
session.save_handler = memcache
session.save_path=
"tcp://node1:11211, tcp://node2:11211"
3) Configure “php.d/memcache.ini”
memcache.hash_strategy = consistent
memcache.allow_failover = 1
memcache.session_redundancy=3*
4) Restart httpd
5) Begin using Session Data:
For situations where you need an
external session store
• Especially needed when using ASGs
• Cache is optimal for high-volume
reads
PHP ExampleSession Caching
https://github.com/mikelabib/elasticache-memcached-php-demo
IoT Device Data
AWS
IoT
AWS
IoT DeviceAmazon
EC2
AWS
Lambda
Hot Data
Amazon
ElastiCache
Amazon
DynamoDB
Longer
Retention
Data Lake
Amazon
S3
Amazon
Glacier
Cold Data
Amazon
Kinesis
Firehose
Amazon
ElastiCache
Lambda Trigger for IoT Rule
var redis = require("redis");
exports.handler = function(event, context) {
client = redis.createClient("redis://your-redis-endpoint:6379");
multi = client.multi();
multi.zadd("SensorData", date, event.deviceId);
multi.hmset(event.deviceId, "temperature", event.temperature,
"deviceIP", event.deviceIP,
"humidity", event.humidity,
"awsRequestId", context.awsRequestId);
multi.exec(function (err, replies) {
if (err) {
console.log('error updating event: ' + err);
context.fail('error updating event: ' + err);
} else {
console.log('updated event ' + replies);
context.succeed(replies);
client.quit();
}
});
}
AWS
Lambda
Amazon
ElastiCache
AWS IoT
Lambda Trigger for IoT Rule
var redis = require("redis");
exports.handler = function(event, context) {
client = redis.createClient("redis://your-redis-endpoint:6379");
multi = client.multi();
multi.zadd("SensorData", date, event.deviceId);
multi.hmset(event.deviceId, "temperature", event.temperature,
"deviceIP", event.deviceIP,
"humidity", event.humidity,
"awsRequestId", context.awsRequestId);
multi.exec(function (err, replies) {
if (err) {
console.log('error updating event: ' + err);
context.fail('error updating event: ' + err);
} else {
console.log('updated event ' + replies);
context.succeed(replies);
client.quit();
}
});
}
AWS
Lambda
Amazon
ElastiCache
AWS IoT
Transaction block start
SET
• Sorted Set
• Hash
Transaction block end
https://github.com/mikelabib/IoT-Sensor-Data-and-Amazon-ElastiCache
Streaming Data
Amazon
ElastiCache
Amazon
EC2AWS
Lambda
Amazon
Kinesis
Streams
Amazon
DynamoDB
Hot Data
Longer
Retention
Amazon
ElastiCache
Data
Sources
Amazon Kinesis
Analytics
AWSLambda
Amazon Kinesis
Streams
Amazon Kinesis
Streams
Data
Sources
Amazon
ElastiCache
De-duplicate,
Aggregate, Sort,
Enrich, etc.
cleansed
stream
Streaming Data Enrichment
Streaming Data Analytics
Data
Sources
1
Amazon
Kinesis
Streams
Amazon
EMR
(Spark Streaming)
Amazon
ElastiCache
Amazon
S3
Amazon
EC2
Amazon Redshift
Spark Redis Connector
Data Lake
Amazon
ElastiCache
ElastiCache Redis with Multi-AZ
Prim
ary
Availability Zone A Availability Zone B
Re
plic
a
Re
plic
a
writes
Use Primary Endpoint
reads
Use Read Replicas
Auto-Failover
Chooses replica with
lowest replication lag
DNS endpoint is same
ElastiCache for Redis Multi-AZ
ElastiCache
for Redis
ElastiCache
for RedisElastiCache
for Redis
Automatic Failover to a read replica in case of
primary node failure
ElastiCache
Automates
snapshots for
persistence
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
ElastiCache Cluster
Auto Scaling
PrimaryRead
Replica
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
PrimaryRead
Replica
Auto Scaling
ElastiCache Cluster
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
PrimaryRead
Replica
Auto Scaling
ElastiCache Cluster
Get ReplicationGroup Replica endpointspublic List getReplicationGroupEndpoints(String replicationGroupId) {
List<String> replicaEndpoints = new ArrayList<String>();
if (replicationGroupId!=null) {
try {
DescribeReplicationGroupsRequest request = new DescribeReplicationGroupsRequest();request.setReplicationGroupId(replicationGroupId);DescribeReplicationGroupsResult result = elastiCacheClient.describeReplicationGroups(request);Object[] nodeMembers;
if (result != null) {
for (ReplicationGroup replicationGroup : result.getReplicationGroups()) {
for (NodeGroup node : replicationGroup.getNodeGroups()) {
nodeMembers = node.getNodeGroupMembers().toArray();
for (int i = 0; i < nodeMembers.length; i++) {
String nodeDescriptions = nodeMembers[i].toString();
if (nodeDescriptions.contains("replica")) { …
Amazon
ElastiCache
Get ReplicationGroup Replica endpointspublic List getReplicationGroupEndpoints(String replicationGroupId) {
List<String> replicaEndpoints = new ArrayList<String>();
if (replicationGroupId!=null) {
try {
DescribeReplicationGroupsRequest request = new DescribeReplicationGroupsRequest();request.setReplicationGroupId(replicationGroupId);DescribeReplicationGroupsResult result = elastiCacheClient.describeReplicationGroups(request);Object[] nodeMembers;
if (result != null) {
for (ReplicationGroup : result.getReplicationGroups()) {
for (NodeGroup node : replicationGroup.getNodeGroups()) {
nodeMembers = node.getNodeGroupMembers().toArray();
for (int i = 0; i < nodeMembers.length; i++) {
String nodeDescriptions = nodeMembers[i].toString();
if (nodeDescriptions.contains("replica")) { …
Amazon
ElastiCache
DescribeReplicationGroups
https://github.com/mikelabib/ElastiCacheRedisLoadBalancer
What’s New!
Features
• Horizontal Scale of up to 3.5 TiB per cluster
• Up to 20 million reads per second
• Up to 4.5 million writes per second
• Enhanced Redis Engine within ElastiCache
• Up to 4x times failover than with Redis 2.8
• Cluster-level Backup and Restore
• Fully Supported by AWS CloudFormation
• Available in all AWS Regions
New - October 2016Redis 3.2 Support
Amazon
ElastiCache
• GEOADD locations 87.6298 41.8781 chicago
• GEOADD locations 122.3321 47.6062 seattle
• ZRANGE locations 0 -1
1) "chicago"
2) "seattle"
• GEODIST locations chicago seattle mi
"1733.4089"
• GEORADIUS locations 122.4194 37.7749 1000 mi
WITHDIST
1) 1) "seattle"
2) "679.4848"
Geospatial Commands
• GEOPOS locations chicago
1) 1) "87.62979894876480103
2) "41.87809901914020116"
• GEORADIUSBYMEMBER locations chicago 2000 mi
WITHDIST
1) 1) "chicago"
2) "0.0000"
2) 1) "seattle"
2) "1733.4089“
• GEOHASH locations chicago
• ZREM locations seattle
Scaling with Redis Cluster
Setting up Redis Cluster - Console
Cluster Mode
Redis Cluster – Automatic Client-Side Sharding
S5
S1
S2
S4 S3Client
• 16384 hash slots per Cluster
• Slot for a key is CRC16 modulo {key}
• Slots are distributed across the Cluster
into Shards
• Developers must use a Redis cluster client!
• Clients are redirected to the correct shard
• Smart clients store a map
Shard S1 = slots 0 – 3276
Shard S2 = slots 3277 – 6553
Shard S3 = slots 6554 – 9829
Shard S4 = slots 9830 – 13106
Shard S5 = slots 13107 - 16383
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
Redis Cluster – Architecture
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363slots 10910 – 16363
Redis Cluster – Multi AZA cluster consists of 1 to 15 shards
Availability Zone A
slots 0 - 5454
Redis Cluster
Redis Cluster – Architecture
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363
Shard
ReplicaReplicaPrimary
Each shard has a Primary Node
and up to 5 replica nodes
slots 5455 – 10909
slots 10910 – 16363
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
Redis Cluster – Architecture
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909slots 5455 – 10909
Shard
ReplicaReplica Primary
Each shard has a Primary Node
and up to 5 replica nodes
slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363slots 10910 – 16363
Availability Zone A
slots 0 - 5454
Redis Cluster
Redis Cluster – Architecture
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 10910 – 16363slots 10910 – 16363
Shard
Replica PrimaryReplica
Each shard has a Primary Node
and up to 5 replica nodes
slots 5455 – 10909 slots 0 - 5454slots 5455 – 10909
slots 0 - 5454 slots 5455 – 10909
Setting up Redis Cluster - Console
Cluster Name
Setting up Redis Cluster - Console
Redis Version
Setting up Redis Cluster - Console
Instance
Setting up Redis Cluster - Console
# of Shards
Setting up Redis Cluster - Console
# of Replicas
Slots Distribution
Setting up Redis Cluster - Console
Select AZs
Setting up Redis Cluster - Console
Redis Failure Scenarios
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909 slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363 slots 10910 – 16363
Scenario 1: Single Primary Shard Failure
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
Scenario 1: Single Primary Shard Failure
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909 slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363
Mitigation:
1. Promote Read Replica Node
2. Repair Failed Node
slots 10910 – 16363
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
Scenario 2: Two Primary Shards Fail
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909 slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
slots 10910 – 16363slots 10910 – 16363
Availability Zone A
slots 0 - 5454 slots 5455 – 10909
Redis Cluster
Scenario 2: Two Primary Shards Fail
slots 10910 – 16363
Availability Zone B Availability Zone C
slots 5455 – 10909 slots 5455 – 10909slots 0 - 5454 slots 0 - 5454
Mitigation: Redis enhancements on ElastiCache
• Promote Read Replica Nodes
• Repair Failed Nodes
slots 10910 – 16363slots 10910 – 16363
Migrating to a Cluster
1. Create new Cluster
2. Make snapshot of old CacheCluster
3. Restore snapshot to new Cluster
4. Update Client
5. Terminate old Cluster
S5
S1
S2
S4 S3
Client
Old
< 3.2Client
Enhanced CloudFormation
• Support for Clusters
• Delete Policy: set as Snapshot
• Take one last backup before
deleting
• Replication Group tagging
• Replication Group: add more replicas
• User-defined resource identifiers
• use Cluster name, Replication
Group ID and Subnet group name
to identify appropriate resources by
assigning Physical Resource
Identifier
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "Test template for ReplicationGroup",
"Resources" : {
"BasicReplicationGroup" : {
"Type" : "AWS::ElastiCache::ReplicationGroup",
"Properties" : {
"AutomaticFailoverEnabled" : true,
"AutoMinorVersionUpgrade" : true,
"CacheNodeType" : "cache.r3.large",
"CacheSubnetGroupName" : { "Ref" : "CacheSubnetGroup" },
"Engine" : "redis",
"EngineVersion" : "3.2",
"NumNodeGroups" : "2",
"ReplicasPerNodeGroup" : "2",
"Port" : 6379,
"PreferredMaintenanceWindow" : "sun:05:00-sun:09:00",
"ReplicationGroupDescription" : "CFN RG test",
"SecurityGroupIds" : [
{ "Ref" : "RGSG" }
],
"SnapshotRetentionLimit" : 5,
"SnapshotWindow" : "10:00-12:00",
"
CloudFormation: Infrastructure as Code
AWS
CloudFormation
AWS
CloudFormation
Template
Amazon
ElastiCache
Best Practices
Redis
• Avoid very short key names - while lengthening a name does adds bytes, it also simplifies
app development when key names are predictable
• Create a logical schema such as: [Object]:{value]. Use colon rather than “.” or “-”
• Hashes, Lists, Sets are encoded to be much more efficient - use them!
• Avoid small Strings values given the overhead of the data type. Otherwise use Hashes.
• Avoid “KEYS” command and other long running commands
• Max Key Size, Max Value Size = 512MB
• List, Sets, Hashes size = 2^32-1
Architecting for Availability
• Upgrade to the latest engine version – 3.2.4
• Set reserved-memory to 30% of total available memory
• Swap usage should be zero or very low. Scale if not.
• Put read-replicas in a different AZ from the primary
• For important workloads use 2 read replicas per primary
• Write to the primary, read from the read-replicas
• Take snapshots from read-replicas
• For Redis Cluster have odd number of shards.
Monitoring Your Cluster
Key ElastiCache CloudWatch Metrics
• CPUUtilization
• Memcached – up to 90% ok
• Redis – divide by cores (ex: 90% / 4 = 22.5%)
• SwapUsage low
• CacheMisses / CacheHits Ratio low / stable
• Evictions near zero
• Exception: Russian doll caching
• CurrConnections stable
• Setup alarms with CloudWatch Metrics
Whitepaper: http://bit.ly/elasticache-whitepaper
ElastiCache Modifiable Parameters
• Maxclients: 65000 (unchangeable)
• Use connection pooling
• timeout – Closes a connection after its been idle for a given interval
• tcp-keepalive – Detects dead peers given an interval
• Databases: 16 (Default)
• Logical partition
• Reserved-memory: 0 (Default)
• Recommended
50% of maxmemory to use before 2.8.22
30% after 2.8.22 – ElastiCache
• Maxmemory-policy:
• The eviction policy for keys when maximum memory usage is reached
• Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random,
volatile-ttl, noeviction
Session Recap
• Amazon ElastiCache provides the performance needed for demanding real-time applications
• With a few lines of code, you can power your applications with an In-Memory datastore
• Redis Cluster allows you to scale to terabytes of data and support millions of IOPS
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brian Kaiser, CTO
11/29/2016
ElastiCache @ Hudl
130k teams
4.5M active users
> 2B videos on S3
35 hr/min of video
15k API requests/sec
Web - Auto Scaling Group
Routing layer
AZ #1MongoDb
Squad Cluster
AZ #2MongoDb AZ #3MongoDb
ELB
Supporting
Services
Couchbase/Memcached
public async Task<TResult> Get<TResult>(string key) where TResult : class
{
if (!_redisEnabled.Value)
{
return default(TResult);
}
var value = await _connection.Database.StringGetAsync(key);
if (!value.HasValue || value.IsNull)
{
return default(TResult);
}
return _serializer.Deserialize<TResult>(value);
}
public async Task Put(string key, object item, TimeSpan ttl)
{
if (!_redisEnabled.Value || string.IsNullOrWhiteSpace(key))
{
return;
}
var data = _serializer.Serialize(item);
await _connection.Database.StringSetAsync(key, data, ttl);
}
public async Task<TResult> GetAndPut<TResult>(string key, TimeSpan ttl, Func<TResult> valueAccessor)
where TResult : class
{
if(!_redisEnabled.Value)
{
return valueAccessor();
}
var cachedValue = await Get<TResult>(key);
if (cachedValue != null)
{
return cachedValue;
}
cachedValue = valueAccessor();
await Put(key, cachedValue, ttl);
return cachedValue;
}
Basic Object Caching Examples
• Auth Token
• User information
• Team Information
The Feed
http://amzn.to/2fGS9nx
Distributed Locking
S3 S3 MongoDb
ElastiCache
Workers
ElastiCache
ElastiCache
Auto Scaling group
Routing layer
AZ #1MongoDb
Squad Cluster
Auto Scaling group
AZ #2MongoDb
Auto Scaling group
AZ #3MongoDb
Primary Replica Replica
ElastiCache – Redis Cluster
ElastiCache – Redis Cluster
Some best practices
• Always Multi-AZ Replicas
• Setup predictive alerts
• Understand Eviction Policies
• Learn Redis data structures and Big O complexity
Thank you!
Remember to complete
your evaluations!