Download - Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Transcript
Page 1: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

CASSANDRA DAY ATLANTA 2015

SOFTWARE DEVELOPMENT WITH CASSANDRA:A WALKTHROUGH

Nate McCall@zznate

#CassandraDaysCo-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Based in New Zealand & USA.

Page 3: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

OVERVIEW

Page 4: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Overview:

What makes a software development

project successful?

Page 5: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Overview: Successful Software Development

- it ships- maintainable- good test coverage- check out and build

Page 6: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Overview:

Impedance mismatch:distributed systems

developmenton a laptop.

Page 7: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

DATA MODELING

Page 8: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Data Modeling:

… a topic unto itself.But quickly:

Page 9: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Data Modeling - Quickly

• It’s Hard• Do research• #1 performance problem• Tip: don’t “port” your schema

Page 10: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Data Modeling - Using CQL:

• tools support• easy tracing (and trace discovery)• documentation*

*Maintained in-tree:https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Page 11: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Data Modeling - DevCenter :

Tools:DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Page 12: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough
Page 13: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

WRITING CODE

Page 14: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

ORM?maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Page 15: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

use CQL

Page 16: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

Use the Java Driver

Page 17: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

• Reference implementation• Well written, extensive coverage• open source• dedicated resourceshttps://github.com/datastax/java-driver/

Page 18: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

Existing Spring Users:Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Page 19: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

Guice Users:“GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Page 20: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

Four rules for Writing Code• one Cluster for physical cluster• one Session per app per keyspace• use PreparedStatements • use Batches to reduce network IO

Page 21: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

Configuration is Similar to Other DB Drivers(with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Page 22: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Cluster - Java Driver - Configuration:

Major Difference:it’s a Cluster!

Page 23: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver - Configuration:

Two groups of configurations

• policies• connections

Page 24: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver - Configuration:

Three Policy Types:• load balancing• connection• retry

Page 25: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver - Configuration:

Connection Options:• protocol*• pooling• socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Page 26: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

Embrace Asynchronicity(but use RxJava)

https://github.com/ReactiveX/RxJava

Page 27: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver :

A note about User Defined Types (UTDs)

Page 28: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Java Driver - Using UDTs:

Wait.- serialized as blobs !!?!- new version already being discussed*- will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Page 29: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.javahttps://dropwizard.github.io/metrics/3.1.0/

Page 30: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Instrumentation via Metrics API:

Run Riemann locally

http://riemann.io/

Page 31: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough
Page 32: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

Using Trace (and doing so frequently)

Page 33: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

Trace per query via DevCenter

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 34: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 35: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

cqlsh> tracing on;Now tracing requests.cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1;

doc_version------------- 65856

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Page 36: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

Page 37: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

Page 38: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 39: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

Page 40: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 41: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Page 42: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

`nodetool settraceprobability`

Page 43: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

…then make sure you try it again

with a node down!

Page 44: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

Final note on tracing:do it sparingly

Page 45: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - Tracing:

Coming Soon:slow query log

(client side)

https://github.com/datastax/java-driver/compare/java646https://datastax-oss.atlassian.net/browse/JAVA-646

Page 46: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

Logging Verbositycan be changed dynamically**

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Page 47: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code:

nodetool for developers• cfstats• cfshistograms• proxyhistograms

Page 48: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - nodetool - cfstats:

cfstats:per-table statistics about size

and performance (single most useful command)

Page 49: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - nodetool - cfhistograms:

cfhistograms:column count and partition size vs. latency distribution

Page 50: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Writing Code - nodetool - proxyhistograms:

proxyhistograms:performance of inter-cluster

requests

Page 51: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

MANAGING ENVIRONMENTS

Page 52: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments:

Configuration Management is Essential

Page 53: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments:

Laptop to Productionwith NO

Manual Modifications!

Page 54: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments:

Running Cassandraduring development

Page 55: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments - Running Cassandra:

Local Cassandra• easy to setup• you control it • but then you control it!

Page 56: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments - Running Cassandra:

CCM• supports multiple versions• clusters and datacenters• up/down individual nodeshttps://github.com/pcmanus/ccm

Page 57: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Managing Environments - Running Cassandra:

Vagrant• isolated, controlled environment• configuration mgmt integration• same CM for production!

http://www.vagrantup.com/

Page 58: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 59: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 60: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 61: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

Page 62: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

TESTING

Page 63: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Use a Naming Scheme

• *UnitTest.java: no external resources• *ITest.java: uses external resources• *PITest.java: safely parallel “ITest”

Page 64: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Page 65: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Group tests into

logical units (“suites”)

Page 66: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Suites:

Benefits of Suites:• share test data• share Cassandra instance(s)• build profiles

Page 67: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 68: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 69: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Suites:

Using annotations for suites in code

Page 70: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough
Page 71: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Suites:

Interesting test plumbing• [Before|Afer]Suite• [Before|After]Group• Listeners

Page 72: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Use Mocks where possible

Page 73: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Unit Integration Testing

Page 74: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Verify Assumptions:test failure scenarios

explicitly

Page 75: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration:

Runtime Integrations:• local • in-process• forked-process

Page 76: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration - Runtime:

EmbeddedCassandra

https://github.com/jsevellec/cassandra-unit/

Page 77: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Page 78: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration - Runtime:

CCMBridge:delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Page 79: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration - Runtime:

Vagrant:delegate to vagrant cli

Page 80: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration:

Best Practice:Jenkins should be able to

manage your cluster

Page 81: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge?

• choice of style, really• developer integration with CM• what else is in the architecture?

Page 82: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Load Testing Goals• reproducible metrics• catch regressions• test to breakage point

Page 83: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Load Testing:

Stress.java(lot’s of changes recently)

https://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.htmlhttp://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

Page 84: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing - Load Testing:

Workload recording and playback coming soon

one day

https://issues.apache.org/jira/browse/CASSANDRA-8929

Page 85: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Testing:

Primary testing goal:Don’t let

cluster behavior surprise you.

Page 86: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Summary:• Go slowly with bite sized chunks• Segment your tests and use build profiles• Monitor and Instrument• Use reference implementation drivers• Control your environments• Verify any assumptions about failures

Page 87: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Thanks.

Page 88: Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Nate McCall@zznate

Co-Founder & Sr. Technical Consultantwww.thelastpickle.com

#CassandraDays