Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Post on 14-Dec-2014

296 views 2 download

description

Adding a new technology to your development process can be challenging, and the distributed nature of Apache Cassandra can make it daunting. However, recent improvements in drivers, utilities and tooling have simplified the process making it easier than ever before to develop software with Apache Cassandra. In this presentation we will cover essential knowledge for all developers wanting to efficiently create reliable Apache Cassandra based solutions. Topics will include: - Language and Driver selection - Optimizing Driver configuration - Productive Developer environments using ccm, Vagrant and DataStax DevCenter - Creating appropriate test data - Unit testing - Automated integration testing New and existing users will leave this presentation with the necessary knowledge to make their next Apache Cassandra project a success.

Transcript of Cassandra Summit 2014: Successful Software Development with Apache Cassandra

CASSANDRA-SF 2014

SUCCESSFUL SOFTWARE DEVELOPMENT WITH

CASSANDRA Nate McCall

@zznate #CassandraSummit

Co-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

OVERVIEW

Overview:

What makes a software development

project successful?

Overview: Successful Software Development

- it ships - maintainable - good test coverage - check out and build

Overview:

Impedance mismatch: distributed systems

development on a laptop.

GETTING STARTED: FOLLOW THE PATH OF LEAST

RESISTANCE

Getting Started: !

JVM-Based if at all Possible.

Getting Started: !

Python Otherwise.

https://github.com/datastax/python-driver

Getting Started: !

C#?

https://github.com/datastax/csharp-driver

Getting Started: !

Ruby?

https://github.com/datastax/ruby-driver

Getting Started: !

ORM? maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

DATA MODELING

Data Modeling: !

… a topic unto itself. But quickly:

Data Modeling - Quickly !

• It’s Hard • Do research • #1 performance problem • Tip: don’t “port” your schema

DEVELOPER PRODUCTIVITY

Productivity: !

use CQL

Productivity - Using CQL: !

• tools support • easy tracing (and trace discovery) • documentation*

*Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Productivity: !

Use the Java Driver

Productivity - Java Driver :

!

• Reference implementation • Well written, extensive coverage • open source

https://github.com/datastax/java-driver/

Productivity - Java Driver : !

Existing Spring Users: Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Productivity - Java Driver : !

Guice Users: “GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Productivity - Java Driver : !

Configuration is Similar to Other DB Drivers (with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Productivity - Java Driver - Configuration: !

Major Difference: it’s a Cluster!

Productivity - Java Driver - Configuration: !

Two groups of configurations !

• policies • connections

Productivity - Java Driver - Configuration: !

Three Policy Types: • load balancing • connection • retry

Productivity - Java Driver - Configuration: !

Connection Options: • protocol* • pooling • socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Productivity - Java Driver : !

Embrace Asynchronicity (but use RxJava)

https://github.com/ReactiveX/RxJava

Productivity - Java Driver : !

A note about User Defined Types (UTDs)

Productivity - Java Driver - Using UDTs: !

Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Productivity: !

Tools: DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Productivity: !

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/

Productivity - Instrumentation via Metrics API: !

Run Riemann locally

http://riemann.io/

Productivity: !

Trace Frequently

Productivity - Tracing: !

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; !

doc_version ------------- 65856 !

!

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Productivity - Tracing: !

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Productivity - Tracing: !

`nodetool settraceprobability`

Productivity - Tracing: !

…then make sure you try it again

with a node down!

Productivity - Tracing: !

Final note on tracing: do it sparingly

Productivity: !

Logging Verbosity can be changed dynamically**

!

!

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Productivity: !

nodetool for developers • cfstats • cfshistograms • proxyhistograms

Productivity - nodetool - cfstats:

cfstats: per-table statistics about size

and performance (single most useful command)

Productivity - nodetool - cfhistograms:

cfhistograms: column count and partition size vs. latency distribution

Productivity - nodetool - proxyhistograms:

proxyhistograms: performance of inter-cluster

requests

Productivity: !

Running Cassandra during development

Productivity - Running Cassandra: !

Local Cassandra • easy to setup • you control it • but then you control it!

Productivity - Running Cassandra: !

CCM • supports multiple versions • clusters and datacenters • up/down individual nodeshttps://github.com/pcmanus/ccm

Productivity - Running Cassandra: !

Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production!

http://www.vagrantup.com/

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

ENCAPSULATE ENVIRONMENTS

Environments: !

Configuration Management is Essential

Environments: !

Laptop to Production with NO

Manual Modifications!

TESTING

Testing:

Use a Naming Scheme !

• *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Testing:

Group tests into

logical units (“suites”)

Testing - Suites:

Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Testing - Suites:

Using annotations for suites in code

Testing: !

Use Mocks where possible

Testing: !

Unit Integration Testing

Testing:

Verify Assumptions: test failure scenarios

explicitly

Testing - Integration:

Runtime Integrations: • local • in-process • forked-process

Testing - Integration - Runtime:

EmbeddedCassandra

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Testing - Integration - Runtime:

CCMBridge: delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Testing - Integration - Runtime:

Vagrant: delegate to vagrant cli

Testing - Integration:

Best Practice: Jenkins should be able to

manage your cluster

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge? !

• choice of style, really • developer integration with CM • what else is in the architecture?

Testing: !

Load Testing Goals • reproducible metrics • catch regressions • test to breakage point

Testing - Load Testing: !

Stress.java (lot’s of changes recently)

Testing - Load Testing: !

CassandraJMeter

https://github.com/Netflix/CassJMeter

Testing - Load Testing: !

Workload recording and playback coming soon

https://issues.apache.org/jira/browse/CASSANDRA-6572

Testing: !

Primary testing goal: Don’t let

cluster behavior surprise you.

Summary: • Go slowly with bite sized chunks • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures

Thanks. !

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com

#CassandraSummit