Cassandra CLuster Management by Japan Cassandra Community

Click here to load reader

  • date post

    13-Apr-2017
  • Category

    Software

  • view

    608
  • download

    3

Embed Size (px)

Transcript of Cassandra CLuster Management by Japan Cassandra Community

  • Managing a

    Cassandra Cluster:

    Lessons Learned from 3M+

    Node-Hours of Experience

    19 April 2016

  • Agenda

    About Instaclustr & presenters

    Foundation practices for a happy cluster

    The most common Cassandra issues and how to avoid them

    Important monitoring and health-check procedures

    Q & A

  • About Instaclustr

    Cassandra, Spark and Zeppelin Managed Service on AWS, Azure and SoftLayer 500+ nodes

    >80 clusters from 3 nodes to 50+

    >3M node-hours of C* experience

    Cassandra and Spark Consulting & Support

  • Ben Slater Chief Product Officer

    Aleks Lubiejewski VP Consulting & Security

    (prev VP Support)

  • Foundation Practices

    What do I need to get right at the beginning to have a happy cluster?

  • Pick the right Cassandra version

    Most stable -> Cassandra 2.1 (or, better yet, DSE)

    Want the latest features and can live with cutting edge or have a few months until production -> Cassandra 3.x Odd numbers (eg 3.5) should be more stable as these are defect-fix releases

    Cassandra 2.2: same end of life as Cassandra 2.1, no DSE version (yet?). Only recommended if you really want the new features but dont want to jump to 3.x.

  • Appropriate Hardware Configuration

    Solid State Disks

    Lots of memory

    Can use EBS on AWS (and equivalent on other platforms) but needs very careful attention to configuration

    For the cloud, we prefer more, smaller nodes: In AWS, m4.xlarge with 800G or 1.6TB are our standard building blocks Smaller proportionate impact of failure of a node Reasonable time to replace a failed node or add new ones

  • Estimating Costs in the Cloud Cost Description Driver Instances Cost of the base compute instances (eg m4.xl). Number and size of nodes in the cluster.

    EBS Volume Cost of attached EBS volumes (where applicable) Size of the EBS volume (eg 400GB)

    Network Public IP In/Out

    Loading/retrieving data via public IP Only applicable if accessing via Public IP: dependant on number of Cassandra read/writes in a month and transaction size.

    Network Interzone In/Out

    Cross-availability zone communication within the cluster

    Transaction volume and size, consistency factor used for reads

    Network VPC In/Out

    Loading/retrieving data via a peered VPC Only applicable if accessing via Peered VPC: dependant on number of Cassandra read/writes in a month and transaction size.

    S3 Storage S3 space for storing backups Volume of data, length of backup retention, deduplication of backup files/data

    S3 Operations S3 calls for storing backups Number of sstables (volume of data + compaction strategy), backup strategy

    S3 Data Transfer Out S3 retrieval data transfer cost Only applicable if you need to copy data from S3 to a region other than US East to restore a backup.

    EBS and network costs can exceed instance cost in some circumstances

  • CassandraSpecific Load Testing

    Short load tests from the application side can be misleading.

    Need to consider: Is there enough data on disk to overflow the operating system file cache?

    Does your data reflect production distributions, particularly for primary key fields?

    Are you performing deletes/updates to test for impact of tombstones (virtual deletes)?

    If you are using cassandra-stress, do you understand the available options and their impacts?

  • NetworkTopologyStrategy, RF=3,CL=Quorum NetworkTopologyStrategy

    Most future proof strategy

    Allows use of multi-dc Can be very useful for cluster operations such as upgrades and splitting out tables

    Replication Factor = 3, Consistency Level = Quorom Provides expected behaviour for most people:

    Strong consistency

    Availability through failure of a replica

    Other settings are valid for many uses cases but need careful consideration of the impacts

  • Most Common Issues

    What are the most common causes when things go wrong in production?

  • Data Modelling Issues

    Partition Keys Cassandra primary keys consist of a partition key and a clustering key.

    Eg PRIMARY KEY ((c1, c2), c3, c4)

    Partition keys determine how data is distributed around the nodes

    1 node per partition, many partitions per node

    Need to ensure there are are a large number of partitions with a reasonable number of rows per partitions

    Small number or very uneven partitions defeat the basic concepts of Cassandra scaling

    Very large partitions can causes issues with garbage collection and excess disk usage

  • Data Modelling Issues (2)

    Tombstones Tombstones are entries created to mark the fact that a record has been deleted

    (updates to primary key will also cause tombstones)

    By default, tombstones are retained for 10 days before being removed by compactions

    High ratios of tombstones to live data can cause significant performance issues

    Secondary Indexes Secondary Indexes are useful for a limited set of cases

    only index low (but not too low) cardinality columns;

    dont index columns that are frequently updated or deleted

    Poor use of secondary indexes can result in poor read performance and defeat scalability

  • Other Issues

    Write Overload Cluster may be able to initial handle a write workload but then fail to keep up

    as compactions kick in

    The effects can go beyond slow latency and cause crashes

    Garbage Collection (GC) Long garbage collection pauses can often cause issues

    Typically a symptom of partitions being too big or general overload of cluster

    Tuning GC settings and heap allocations can help but need to address root cause also

  • Running out of capacity

    Cassandra scales infinitely but processing capacity is used when adding new nodes to a cluster.

    Therefore, you need to add capacity well before existing capacity is exhausted.

    This applies to both disk and processor/memory.

  • Fundamental Monitoring & Health Check Procedures

    How do I get advanced warning if my cluster is going to hit issues?

  • Monitoring Basic Metrics (OS)

    Disk usage less than 70% under normal running is a good starting guide

    this allows for bursts of usage by compactions and repairs

    extreme cases may required 50% free space

    Levelled compaction strategy and data spread across multiple column families can allow higher disk usage

    CPU Usage again, 70% utilization is a reasonable target

    Keep a look out for high iowait indicates storage bottleneck

  • Monitoring Basic Metrics (C*)

    Read/Write Latency closely tied to user experience for most use cases

    monitor for significant changes

    be aware that read latency can very greatly depending on number of rows returned

    distinguish between changes impacting a specific column family (likely data modelling issues) and changes impacting a specific node (hardware or capacity issues)

    Pending Compactions increase numbers of pending compactions indicates that a node is not

    keeping up with workload

  • Cassandra Logs

    Regularly inspecting Cassandra logs for warnings and errors is important for picking up issues. The issues you will find include: large batch warnings

    compacting large partitions warnings

    reading excess tombstones warnings

    plenty more!

    Apr 18 08:00:27 ip-10-224-111-138.ec2.internal docker[15521]: [Native-Transport-Requests:22756] WARN

    org.apache.cassandra.cql3.statements.BatchStatement Batch of prepared statements for [ks.col_family] is of size

    59000, exceeding specified threshold of 5120 by 53880.

    ip-10-224-169-153.eu-west-1.compute.internal docker[25837]: WARN o.a.c.io.sstable.SSTableWriter Compacting large

    partition ks/col_family:c7d65814-1a58-4675-ad54-d6c92e10d1d7 (405357404 bytes)

    Mar 29 11:55:26 ip-172-16-151-148.ec2.internal docker[30099]: WARN o.a.c.db.filter.SliceQueryFilter Read 3563 live

    and 3520 tombstone cells in ks.col_family for key: 4364012 (see tombstone_warn_threshold). 5000 columns were

    requested, slices=[2016-03-28 11:55Z:!-]

  • cfstats and cfhistograms

    The Cassandra nodetool tool has many commands that help diagnose issues.

    nodetool cfstats and cfhistograms are two of the most important

    These tools can help you see: Large and uneven partitions

    Excess tombstones

    Too many sstables per read

    read and write latency by keyspace and column family

    many more

  • Summary

    Cassandra is an incredibly reliable and scalable technology

    if

    you design and build correctly from the start

    and follow basic management procedures.

  • Thank you for listening. QUESTIONS? Contact:

    www.instaclustr.com

    Hiro Komatsu hiro.komatsu@instaclustr.com

    Ben Slater ben.slater@instaclustr.com

    Aleks Lubiejewski aleks@instaclustr.com

    http://www.instaclustr.com/mailto:hiro.komatsu@instaclustr.commailto:ben.slater@instaclustr.commailto:aleks@instaclustr.com