Cassandra Summit 2014: Successful Software Development with Apache Cassandra

89
CASSANDRA-SF 2014 SUCCESSFUL SOFTWARE DEVELOPMENT WITH CASSANDRA Nate McCall @zznate #CassandraSummit Co-Founder & Sr. Technical Consultant Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

description

Adding a new technology to your development process can be challenging, and the distributed nature of Apache Cassandra can make it daunting. However, recent improvements in drivers, utilities and tooling have simplified the process making it easier than ever before to develop software with Apache Cassandra. In this presentation we will cover essential knowledge for all developers wanting to efficiently create reliable Apache Cassandra based solutions. Topics will include: - Language and Driver selection - Optimizing Driver configuration - Productive Developer environments using ccm, Vagrant and DataStax DevCenter - Creating appropriate test data - Unit testing - Automated integration testing New and existing users will leave this presentation with the necessary knowledge to make their next Apache Cassandra project a success.

Transcript of Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Page 1: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

CASSANDRA-SF 2014

SUCCESSFUL SOFTWARE DEVELOPMENT WITH

CASSANDRA Nate McCall

@zznate #CassandraSummit

Co-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

Page 3: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

OVERVIEW

Page 4: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview:

What makes a software development

project successful?

Page 5: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview: Successful Software Development

- it ships - maintainable - good test coverage - check out and build

Page 6: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview:

Impedance mismatch: distributed systems

development on a laptop.

Page 7: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

GETTING STARTED: FOLLOW THE PATH OF LEAST

RESISTANCE

Page 8: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

JVM-Based if at all Possible.

Page 9: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

Python Otherwise.

https://github.com/datastax/python-driver

Page 10: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

C#?

https://github.com/datastax/csharp-driver

Page 11: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

Ruby?

https://github.com/datastax/ruby-driver

Page 12: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

ORM? maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Page 13: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

DATA MODELING

Page 14: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Data Modeling: !

… a topic unto itself. But quickly:

Page 15: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Data Modeling - Quickly !

• It’s Hard • Do research • #1 performance problem • Tip: don’t “port” your schema

Page 16: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

DEVELOPER PRODUCTIVITY

Page 17: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

use CQL

Page 18: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Using CQL: !

• tools support • easy tracing (and trace discovery) • documentation*

*Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Page 19: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Use the Java Driver

Page 20: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver :

!

• Reference implementation • Well written, extensive coverage • open source

https://github.com/datastax/java-driver/

Page 21: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Existing Spring Users: Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Page 22: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Guice Users: “GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Page 23: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Configuration is Similar to Other DB Drivers (with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Page 24: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Major Difference: it’s a Cluster!

Page 25: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Two groups of configurations !

• policies • connections

Page 26: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Three Policy Types: • load balancing • connection • retry

Page 27: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Connection Options: • protocol* • pooling • socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Page 28: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Embrace Asynchronicity (but use RxJava)

https://github.com/ReactiveX/RxJava

Page 29: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

A note about User Defined Types (UTDs)

Page 30: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Using UDTs: !

Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Page 31: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Tools: DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Page 32: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/

Page 33: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Instrumentation via Metrics API: !

Run Riemann locally

http://riemann.io/

Page 34: Cassandra Summit 2014: Successful Software Development with Apache Cassandra
Page 35: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Trace Frequently

Page 36: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 37: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; !

doc_version ------------- 65856 !

!

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Page 38: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 39: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 40: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 41: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

Page 42: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 43: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Page 44: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

`nodetool settraceprobability`

Page 45: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

…then make sure you try it again

with a node down!

Page 46: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Final note on tracing: do it sparingly

Page 47: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Logging Verbosity can be changed dynamically**

!

!

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Page 48: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

nodetool for developers • cfstats • cfshistograms • proxyhistograms

Page 49: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfstats:

cfstats: per-table statistics about size

and performance (single most useful command)

Page 50: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfhistograms:

cfhistograms: column count and partition size vs. latency distribution

Page 51: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - proxyhistograms:

proxyhistograms: performance of inter-cluster

requests

Page 52: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Running Cassandra during development

Page 53: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Local Cassandra • easy to setup • you control it • but then you control it!

Page 54: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

CCM • supports multiple versions • clusters and datacenters • up/down individual nodeshttps://github.com/pcmanus/ccm

Page 55: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production!

http://www.vagrantup.com/

Page 56: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 57: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 58: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 59: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

Page 60: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

ENCAPSULATE ENVIRONMENTS

Page 61: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Environments: !

Configuration Management is Essential

Page 62: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Environments: !

Laptop to Production with NO

Manual Modifications!

Page 63: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

TESTING

Page 64: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Use a Naming Scheme !

• *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”

Page 65: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Page 66: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Group tests into

logical units (“suites”)

Page 67: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Suites:

Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles

Page 68: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 69: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 70: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Suites:

Using annotations for suites in code

Page 71: Cassandra Summit 2014: Successful Software Development with Apache Cassandra
Page 72: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Use Mocks where possible

Page 73: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Unit Integration Testing

Page 74: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Verify Assumptions: test failure scenarios

explicitly

Page 75: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration:

Runtime Integrations: • local • in-process • forked-process

Page 76: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

EmbeddedCassandra

Page 77: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Page 78: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

CCMBridge: delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Page 79: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

Vagrant: delegate to vagrant cli

Page 80: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration:

Best Practice: Jenkins should be able to

manage your cluster

Page 81: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge? !

• choice of style, really • developer integration with CM • what else is in the architecture?

Page 82: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Load Testing Goals • reproducible metrics • catch regressions • test to breakage point

Page 83: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Stress.java (lot’s of changes recently)

Page 84: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

CassandraJMeter

https://github.com/Netflix/CassJMeter

Page 85: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Workload recording and playback coming soon

https://issues.apache.org/jira/browse/CASSANDRA-6572

Page 86: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Primary testing goal: Don’t let

cluster behavior surprise you.

Page 87: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Summary: • Go slowly with bite sized chunks • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures

Page 88: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Thanks. !

Page 89: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com

#CassandraSummit