Supercharging Cassandra - GOTO Amsterdam

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)



Transcript of Supercharging Cassandra - GOTO Amsterdam

2. Before the Flood 1990 Small databasesBTree indexesBTree File systemsRAIDOld hardware 3. Two Revolutions2010 Distributed, shared-nothing databasesWrite-optimised indexesWrite-optimised indexesBTree le systemsBTree le systems RAID...RAID New hardware New hardware 4. Bridging the Gap 2011Distributed, shared-nothing databases CastleCastle ...New hardware New hardware 5. Big DataApplicationsMemcached Open APIManagement...Deployment.........Monitoring... ... ... ... ... ...... Acunu Storage Core ......Cross-Cluster Management UI 6. 1. Predictability 7. Small random insertsInserting 3 billion rows Acunu powered Cassandra -standard Cassandra - 8. Insert latencyWhile inserting 3 billion rows Acunu powered Cassandra xstandard Cassandra + 9. Small random range queriesPerformed immediately after inserts Acunu powered Cassandra -standard Cassandra - 10. Performance summaryStandard AcunuBenetsinserts rate ~32k/s~45k/s >1.4x95% latency ~32s ~0.3s>100x gets rate ~100/s~350/s>3.5x95% latency ~2s~0.5s>4xrange queries~0.4/s~40/s>100x 95% latency ~15s ~2s >7.5x 11. Doubling ArrayInserts 22 9 9 Buffer arrays in memoryuntil we have > B of them 12. Doubling Array Inserts112 9 2 8 9 11 88 11etc...Similar to log-structured merge trees (LSM), cache-oblivious lookahead array (COLA), ... 13. Demo 14. 8KB @ 100MB/s, w/ 8ms seek100 / 5= 100 IOs/s= 20 updates/s~ log (2^30)/log 100= 5 IOs/updateRange Query Update (Size Z)O(logB N) O(Z/B)B-Tree random IOs random IOs O((log N)/B) O(Z/B) Doubling Array sequential IOs sequential IOs~ log (2^30)/100 8KB @ 100MB/s 13k / 0.2= 0.2 IOs/update= 13k IOs/s= 65k updates/s B = block size, say 8KB at 100 bytes/entry ~= 100 entries 15. MoreShared memory interface CastlekeysUserspaceAcunu Kerneluserspace interface values, shared memory ringworkloads shared bufferskernelspaceStreaming interfaceinterfacerange keybufferedkey buffered queries insert value insert get value getDoubling Arrays Opensource (GPLv2, MITdoubling arraymapping layerfor user libraries)insert Bloom lters queueskey get arrays xrange arrays queriesmanagementkey insert merges Arraysmapping layermodlist btree keyVersion treeLoadable Kernel Module,insert btree CentOSs 2.6.18range queriesvalue arrays Cacheblock mapping & cacheing layer"Extent" layer prefetcher extent block extent cacheblogs/andy-twigg/why- freespaceallocatormanager usher& mapperpage cacheacunu-kernel/linuxs block &Linux Kernel MM layers Block layer Memory manager 16. 2. Monitoring 17. jQuery VisualVM 18. mx4j: Rest-JMX adapter Munin, Nagios etc 19. 3. Operations 20. -bash-3.2$ nodetool...Available commands:ring - Print informations on the token ringjoin - Join the ringinfo - Print node informations (uptime, load, ...)cfstats- Print statistics on column familiesversion- Print cassandra versiontpstats- Print usage statistics of thread poolsdrain- Drain the node (stop accepting writes and flush all column families)decommission - Decommission the nodecompactionstats- Print statistics on compactionsdisablegossip- Disable gossip (effectively marking the node dead)enablegossip - Reenable gossipdisablethrift- Disable thrift serverenablethrift - Reenable thrift servernetstats [host]- Print network information on provided host (connecting node by defaumove - Move node on the token ring to a new tokenremovetoken status|force| - Show status of current token removal, force completion ofsetcompactionthroughput - Set the MB/s throughput cap for compaction in the syssnapshot [keyspaces...] -t [snapshotName] - Take a snapshot of the specified keyspaces usingclearsnapshot [keyspaces...] -t [snapshotName] - Remove snapshots for the specified keyspacesflush [keyspace] [cfnames] - Flush one or more column familyrepair [keyspace] [cfnames] - Repair one or more column familycleanup [keyspace] [cfnames] - Run cleanup on one or more column familycompact [keyspace] [cfnames] - Force a (major) compaction on one or more column familyscrub [keyspace] [cfnames] - Scrub (rebuild sstables for) one or more column familyinvalidatekeycache [keyspace] [cfnames] - Invalidate the key cache of one or more column famiinvalidaterowcache [keyspace] [cfnames] - Invalidate the key cache of one or more column famigetcompactionthreshold - Print min and max compaction thresholds for a gicfhistograms - Print statistic histograms for a given column familysetcachecapacity - Set the key andsetcompactionthreshold - Set the min and ma 21. SH OT S*SN AP * And clones! 22. v0 v1v2v5 v3 v4 v6 23. Rebuild 24. Disk Layout: RDArandom duplicate allocation 4214525313 710 7689910 68 15 12 14 11 13 14 11 12 13 15 1616 25. Future 26. Memcache + Cassandraget/insertget/putCass client memcached100k randominserts/sec! Cassandra memcacheCassandra memcacheCastleCastle...H/WH/W 27. v1 v1v1 v1v12 v13 v15v12 v13 v15v12 v13 v15v12 v13 v15v16 v24v16 v24v16 v24v16 v24 28. ~device capacityBeware the write cliff... 29. Castle: Predictable Performancefor Big Data Monitoring: distributed, multi-master tools, give you aggregatedand summarised view of yourcluster Snapshots & Clones: addressingreal problems with new workloads RDA: lightening fast rebuilds formassive disks 30. Questions?Tom Wilkie @tom_wilkie [email protected]