Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

35
©2014 DataStax Confidential. Do not distribute without consent. @AlTobey Open Source Mechanic @ Datastax Designing Commodity Storage 1

description

As we move into the world of Big Data and the Internet of Things, the systems architectures and data models we've relied on for decades are becoming a hindrance. At the core of the problem is the read-modify-write cycle. In this session, Al will talk about how to build systems that don't rely on RMW, with a focus on Cassandra. Finally, for those times when RMW is unavoidable, he will cover how and when to use Cassandra's lightweight transactions and collections.

Transcript of Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Page 1: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

©2014 DataStax Confidential. Do not distribute without consent.

@AlTobey Open Source Mechanic @ Datastax

Designing Commodity Storage

1

Page 2: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

What is commodity storage?•software-defined storage •e.g. Cassandra, S3, GCE Persistent Disks • Intel/AMD x86_64 architecture !

Open Standards: •PCI-Express •Near-line SAS, Enterprise SATA, SATA SSD •1g/10g ethernet

Page 3: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Definitely NOT this

Designed to solve different problems from a different era.

Page 4: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Not this either

Besides SSDs most “desktop” gear is to be avoided for production deployment.

Page 5: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Enterprise

Page 6: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 7: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Rack & Stack•Blades & 1U for high CPU with low storage density •2U for plenty of CPU & storage & air flow •3U-4U for high-latency / high-density storage •“racks” don’t have to be literal •blade chassis •separate network/power is key

Page 8: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Vendors

Page 9: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Choosing Server Components•CPU •Memory •Motherboards •Host Bus Adapters •Hard Drives •Network Interface Cards

Page 10: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

CPU Pricing

E5-2620

E5-2630

E5-2650

E5-2670

E5-2687W

E5-2690

0 550 1100 1650 2200

6 cores 2.6Ghz 80w6 cores 2.1Ghz 80w

8 cores 2.6Ghz 95w10 cores 2.5Ghz 115w (3.3Ghz turbo)8 cores 3.4Ghz 150w8 cores 2.9Ghz 135w (3.8Ghz turbo)

Dollars

15MB L3 Cache15MB20MB20MB25MB25MB

Page 11: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Processors

Source: http://en.wikipedia.org/wiki/Sandy_Bridge-E

Page 12: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 13: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Memory•always get ECC! •~5 single bit errors in 8 GB RAM per hour (top-end error rate) •unexplainable crashes •data corruption •8GB DIMMs are still the sweet spot !

•Registered Memory: match to your CPU/motherboard •Pretty much all server memory is ECC and Registered !

•Speed: match to fastest rating of CPU/motherboard

Page 14: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Motherboards•Largely out of your control •Dell / HP / etc. you’re looking at server model, e.g. DL380 •Supermicro: be very careful when picking your VAR •Features to watch for: •Socket count (NUMA) • IPMI •onboard SAS or SATA port speed/count •PCIe speed & layout •RAM capacity

Page 15: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Storage Adapters•Serial Attached SCSI •Bit Error Rate: 1 in 10^16 bits or 1bit in 1,250TiB •Supports SATA drives over STP •Near-line SAS drives are SATA chassis with SAS boards •Always use SAS if you need an expander •Check out enclosure services in Linux •Serial ATA •Bit Error Rate: 1 in 10^15 or 1 bit in 125 TiB •Avoid expanders

Page 16: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Storage Adapters•JBOD •cheap •OS manages drives •drivers usually shipped with OS •CPU overhead is negligible •HW RAID is sometimes faster, usually comes with cache •writethrough v.s. writeback •writeback + BBU provides interesting performance options •driver + utilities management

Page 17: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 18: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 19: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Parity RAID

Page 20: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 21: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

RAID•JBOD •mount every drive with individual filesystems •cheap •RAID0 •single drive failure means node rebuild •cheap •RAID10 • fast, protects against single disk failure •expensive

Page 22: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

RAID•RAID 5 / 6 (and beyond) •parity data protection •performance heavily dependent on implementation •cheapest option for drive failure protection •RAID 50 / 60 •stripe across multiple RAID[56] volumes •mostly useful with large number of drives •can provide decent performance esp. on HW RAID

Page 23: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 24: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Hard Drives•SATA HDD • there’s only one head carriage •seeks kill •decent performance on sequential IO •bit errors •cheap!

Page 25: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 26: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Hard Drives•SAS HDD • there’s only one head carriage •seeks kill •bit errors •expensive! • faster RPMs may help a little with seek latency

Page 27: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 28: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Hard Drives•SATA SSD •very low latency seeks •slightly lower sequential IO throughput •more expensive than SATA HDD •vendors might not want to sell them to you! •sometimes called “value series” or similar •Cassandra runs fine on consumer-grade SSDs •make sure your SATA/SAS bus and HBA are up to the task

Page 29: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Hard Drives•Enterprise SSD •quite expensive •vendor supported •more reliable •often faster as well

Page 30: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 31: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Hard Drives•PCIe SSD •e.g. FusionIO, ioSwitch •highest performance potential •not as expensive as you think • lots of new products entering the market •generally not hot-swappable

Page 32: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Page 33: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Networking•you don’t need 10gig •but it’s awesome •Broadcom cards are common and commonly buggy • Intel cards are expensive but a good bet •Consider lesser-known add-in cards, e.g. Myricom

Page 34: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

To the Cloud!•Amazon, Google, etc. all use similar gear under the VM •same constraints apply, but you only get a fraction of the box •pass-through PCIe devices for the best performance •Avoid EBS in EC2, go with ephemerals •GCE PD’s may need additional read/write threads

Page 35: Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

@AlTobey

Q & A

Everybody is hiring, including Datastax!

Open Source Mechanic, Datastax