Download - Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Transcript
Page 1: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

CASSANDRA-SF 2014

SUCCESSFUL SOFTWARE DEVELOPMENT WITH

CASSANDRA Nate McCall

@zznate #CassandraSummit

Co-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

Page 3: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

OVERVIEW

Page 4: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview:

What makes a software development

project successful?

Page 5: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview: Successful Software Development

- it ships - maintainable - good test coverage - check out and build

Page 6: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Overview:

Impedance mismatch: distributed systems

development on a laptop.

Page 7: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

GETTING STARTED: FOLLOW THE PATH OF LEAST

RESISTANCE

Page 8: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

JVM-Based if at all Possible.

Page 9: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

Python Otherwise.

https://github.com/datastax/python-driver

Page 10: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

C#?

https://github.com/datastax/csharp-driver

Page 11: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

Ruby?

https://github.com/datastax/ruby-driver

Page 12: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Getting Started: !

ORM? maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Page 13: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

DATA MODELING

Page 14: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Data Modeling: !

… a topic unto itself. But quickly:

Page 15: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Data Modeling - Quickly !

• It’s Hard • Do research • #1 performance problem • Tip: don’t “port” your schema

Page 16: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

DEVELOPER PRODUCTIVITY

Page 17: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

use CQL

Page 18: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Using CQL: !

• tools support • easy tracing (and trace discovery) • documentation*

*Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Page 19: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Use the Java Driver

Page 20: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver :

!

• Reference implementation • Well written, extensive coverage • open source

https://github.com/datastax/java-driver/

Page 21: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Existing Spring Users: Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Page 22: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Guice Users: “GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Page 23: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Configuration is Similar to Other DB Drivers (with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Page 24: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Major Difference: it’s a Cluster!

Page 25: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Two groups of configurations !

• policies • connections

Page 26: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Three Policy Types: • load balancing • connection • retry

Page 27: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Connection Options: • protocol* • pooling • socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Page 28: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Embrace Asynchronicity (but use RxJava)

https://github.com/ReactiveX/RxJava

Page 29: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

A note about User Defined Types (UTDs)

Page 30: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Using UDTs: !

Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Page 31: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Tools: DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Page 32: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/

Page 33: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Instrumentation via Metrics API: !

Run Riemann locally

http://riemann.io/

Page 34: Cassandra Summit 2014: Successful Software Development with Apache Cassandra
Page 35: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Trace Frequently

Page 36: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 37: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; !

doc_version ------------- 65856 !

!

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Page 38: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 39: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 40: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 41: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

Page 42: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 43: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Page 44: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

`nodetool settraceprobability`

Page 45: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

…then make sure you try it again

with a node down!

Page 46: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Final note on tracing: do it sparingly

Page 47: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Logging Verbosity can be changed dynamically**

!

!

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Page 48: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

nodetool for developers • cfstats • cfshistograms • proxyhistograms

Page 49: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfstats:

cfstats: per-table statistics about size

and performance (single most useful command)

Page 50: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfhistograms:

cfhistograms: column count and partition size vs. latency distribution

Page 51: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - nodetool - proxyhistograms:

proxyhistograms: performance of inter-cluster

requests

Page 52: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity: !

Running Cassandra during development

Page 53: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Local Cassandra • easy to setup • you control it • but then you control it!

Page 54: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

CCM • supports multiple versions • clusters and datacenters • up/down individual nodeshttps://github.com/pcmanus/ccm

Page 55: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production!

http://www.vagrantup.com/

Page 56: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 57: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 58: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 59: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

Page 60: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

ENCAPSULATE ENVIRONMENTS

Page 61: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Environments: !

Configuration Management is Essential

Page 62: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Environments: !

Laptop to Production with NO

Manual Modifications!

Page 63: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

TESTING

Page 64: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Use a Naming Scheme !

• *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”

Page 65: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Page 66: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Group tests into

logical units (“suites”)

Page 67: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Suites:

Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles

Page 68: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 69: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 70: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Suites:

Using annotations for suites in code

Page 71: Cassandra Summit 2014: Successful Software Development with Apache Cassandra
Page 72: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Use Mocks where possible

Page 73: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Unit Integration Testing

Page 74: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing:

Verify Assumptions: test failure scenarios

explicitly

Page 75: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration:

Runtime Integrations: • local • in-process • forked-process

Page 76: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

EmbeddedCassandra

Page 77: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Page 78: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

CCMBridge: delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Page 79: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

Vagrant: delegate to vagrant cli

Page 80: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration:

Best Practice: Jenkins should be able to

manage your cluster

Page 81: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge? !

• choice of style, really • developer integration with CM • what else is in the architecture?

Page 82: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Load Testing Goals • reproducible metrics • catch regressions • test to breakage point

Page 83: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Stress.java (lot’s of changes recently)

Page 84: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

CassandraJMeter

https://github.com/Netflix/CassJMeter

Page 85: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Workload recording and playback coming soon

https://issues.apache.org/jira/browse/CASSANDRA-6572

Page 86: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Testing: !

Primary testing goal: Don’t let

cluster behavior surprise you.

Page 87: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Summary: • Go slowly with bite sized chunks • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures

Page 88: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Thanks. !

Page 89: Cassandra Summit 2014: Successful Software Development with Apache Cassandra

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com

#CassandraSummit