When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...

31

Transcript of When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...

Page 1: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.
Page 2: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

When every millisecond counts

July 2016

Matija [email protected]

@mad_max0204

Page 3: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Why this talk

We were challenged with an interesting requirement...

Page 4: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

What makes a distributed system?

A bunch of stuff that magically works together

Page 5: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

How to start?

Investigate the current setup (if any)

Understand your use case

Understand your data

Set a base configuration

Define the goal

Page 6: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Investigate the current setup

● What type of deployment are you working with?● What is the available hardware?

○ CPU cores and threads○ Memory amount and type○ Storage size and type○ Network interfaces amount and type○ Limitations

Page 7: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Hardware configuration

8-16 cores32GB ram

Commit log SSDData drive SSD

1GbE

Placement groupsAvailability zones

Enhanced networking

Page 8: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

OS - Swap, storage, cpu

Swap is bad

● remove swap from fstab● disable swap: swapoff -a

Optimize block layer

echo 1 > /sys/block/XXX/queue/nomergesecho 8 > /sys/block/XXX/queue/read_ahead_kbecho deadline > /sys/block/XXX/queue/scheduler

Disable cpu scaling

for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*do echo performance > $sysfs_cpu/cpufreq/scaling_governordone

Page 9: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

sysctl.d - network

net.ipv4.tcp_rmem = 4096 87380 16777216 # read buffer space allocatable in units of pagesnet.ipv4.tcp_wmem = 4096 65536 16777216 # write buffer space allocatable in units of pagesnet.ipv4.tcp_ecn = 0 # disable explicit congestion notificationnet.ipv4.tcp_window_scaling = 1 # enable window scaling (higher throughput)net.ipv4.ip_local_port_range = 10000 65535 # allowed local port rangenet.ipv4.tcp_tw_recycle = 1 # enable fast time-wait recycle

net.core.rmem_max = 16777216 # max socket receive buffer in bytesnet.core.wmem_max = 16777216 # max socket send buffer in bytesnet.core.somaxconn = 4096 # number of incoming connectionsnet.core.netdev_max_backlog = 16384 # incoming connections backlog

Page 10: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

sysctl.d - vm and fs

vm.swappiness = 1 # memory swapping thresholdvm.max_map_count = 1073741824 # max memory map areas a process can havevm.dirty_background_bytes = 10485760 # dirty memory amount threshold (kernel)vm.dirty_bytes = 1073741824 # dirty memory amount threshold (process)fs.file-max = 1073741824 # max number of open filesvm.min_free_kbytes = 1048576 # min number of VM free kilobytes

Page 11: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

JVM - G1GC

JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" # Set to number of full coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" # Set to number of full cores

Page 12: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

JVM - HotSpot

MAX_HEAP_SIZE="8G" # Good starting pointHEAP_NEWSIZE="2G" # Good starting point

JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"

# Tunable settingsJVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096"

# Instagram settingsJVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"

Page 13: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Cassandra yaml

concurrent_reads: 128concurrent_writes: 128concurrent_counter_writes: 128memtable_allocation_type: heap_buffersmemtable_flush_writers: 8memtable_cleanup_threshold: 0.15memtable_heap_space_in_mb: 2048memtable_offheap_space_in_mb: 2048

trickle_fsync: truetrickle_fsync_interval_in_kb: 1024

internode_compression: dc

Page 14: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Data model

Data model impacts performance a lotOptimize so that you read from one partition

Make sure your data can be distributedSSTable compression depending on the use case

Compaction strategy

Page 15: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Ok, what now?

After we set the base configuration it’s time for testing and observing

Page 16: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Test setup

Make sure you have repeatable testsFixed rate tests

Variable rate testsProduction like testsCassandra Stress

Various loadgen tools (gatling, wrk, loader,...)Coordinated omission

Page 17: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Tuning methodology

Page 18: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Metrics and reporting stack

OS metrics (SmartCat)Metrics reporter config (AddThis)

Cassandra diagnostics (SmartCat)FilebeatRiemannInfluxDBGrafana

ElasticsearchLogstashKibana

Page 19: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Grafana

Page 20: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Kibana

Page 21: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Slow queries

Track query execution times above some thresholdGain insights into the long processing queries

Relate that to what’s going on on the nodeCompare app and cluster slow queries

https://github.com/smartcat-labs/cassandra-diagnostics

Page 22: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Slow queries - cluster

Page 23: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Slow queries - cluster vs app

Page 24: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Ops center

Pros:Great when starting out

Everything you need in a nice GUICluster metrics

Cons:Metrics stored in the same cluster

Issues with some of the services (repair, slow query,...)Additional agents on the nodes

Page 25: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

AWS

Page 26: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

AWS deployment

Choose your instance based on calculationsCost limits come second

Use placement groups and availability zonesDon’t overdo it just because you can ($$$)

Go for EBS volumes (gp2)You don’t need ephemeral storage (mostly)

Page 27: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

EBS volumes

Pros:3.4TB+ volume has 10.000 IOPs

Average latency is ~0.38msDurable across reboots

AWS snapshotsCan be attached/detached

Easy to recreate

Cons:Rare latency spikes

Average latency is ~0.38msDegrading factor

Page 28: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

EBS volume problems

Page 29: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

End result

Did we meet our goal?Can we go any further?

Torture testingFailure scenarios

Latency and delay inducersAutomate everything

Page 30: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Q&A

Page 31: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.

Matija [email protected]

@mad_max0204

Thank you