Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough
date post
15-Jul-2015Category
Technology
view
296download
1
Embed Size (px)
Transcript of Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough
CASSANDRA DAY ATLANTA 2015
SOFTWARE DEVELOPMENT WITH CASSANDRA:A WALKTHROUGH
Nate McCall@zznate
#CassandraDaysCo-Founder & Sr. Technical Consultant
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle.
Work with clients to deliver and improve Apache Cassandra based solutions.
Based in New Zealand & USA.
OVERVIEW
Overview:
What makes a software development
project successful?
Overview: Successful Software Development
- it ships- maintainable- good test coverage- check out and build
Overview:
Impedance mismatch:distributed systems
developmenton a laptop.
DATA MODELING
Data Modeling:
a topic unto itself.But quickly:
Data Modeling - Quickly
Its Hard Do research #1 performance problem Tip: dont port your schema
Data Modeling - Using CQL:
tools support easy tracing (and trace discovery) documentation*
*Maintained in-tree:https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile
Data Modeling - DevCenter :
Tools:DataStax DevCenter
http://www.datastax.com/what-we-offer/products-services/devcenter
WRITING CODE
Writing Code:
ORM?maybe - only if its very simple
more later
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html
Writing Code:
use CQL
Writing Code:
Use the Java Driver
Writing Code - Java Driver :
Reference implementation Well written, extensive coverage open source dedicated resourceshttps://github.com/datastax/java-driver/
Writing Code - Java Driver :
Existing Spring Users:Spring Data Integration
http://projects.spring.io/spring-data-cassandra/
Writing Code - Java Driver :
Guice Users:GuicyFig:
Archaius + Guice
https://stash.safehaus.org/projects/GFIG/repos/main/browse
Writing Code - Java Driver :
Four rules for Writing Code one Cluster for physical cluster one Session per app per keyspace use PreparedStatements use Batches to reduce network IO
Writing Code - Java Driver :
Configuration is Similar to Other DB Drivers(with caveats**)
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html
Writing Cluster - Java Driver - Configuration:
Major Difference:its a Cluster!
Writing Code - Java Driver - Configuration:
Two groups of configurations
policies connections
Writing Code - Java Driver - Configuration:
Three Policy Types: load balancing connection retry
Writing Code - Java Driver - Configuration:
Connection Options: protocol* pooling socket
*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec
Writing Code - Java Driver :
Embrace Asynchronicity(but use RxJava)
https://github.com/ReactiveX/RxJava
Writing Code - Java Driver :
A note about User Defined Types (UTDs)
Writing Code - Java Driver - Using UDTs:
Wait.- serialized as blobs !!?!- new version already being discussed*- will be a painful migration path
* https://issues.apache.org/jira/browse/CASSANDRA-7423
Writing Code:
Metrics API for your own code
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.javahttps://dropwizard.github.io/metrics/3.1.0/
Writing Code - Instrumentation via Metrics API:
Run Riemann locally
http://riemann.io/
Writing Code:
Using Trace (and doing so frequently)
Writing Code - Tracing:
Trace per query via DevCenter
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
Writing Code - Tracing:
Trace per query via cqlsh
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
cqlsh> tracing on;Now tracing requests.cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1;
doc_version------------- 65856
Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817
Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
!!?!
Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
Writing Code - Tracing:
Enable traces in the driver
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html
Writing Code - Tracing:
`nodetool settraceprobability`
Writing Code - Tracing:
then make sure you try it again
with a node down!
Writing Code - Tracing:
Final note on tracing:do it sparingly
Writing Code - Tracing:
Coming Soon:slow query log
(client side)
https://github.com/datastax/java-driver/compare/java646https://datastax-oss.atlassian.net/browse/JAVA-646
Writing Code:
Logging Verbositycan be changed dynamically**
** since 0.4rc1
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html
Writing Code:
nodetool for developers cfstats cfshistograms proxyhistograms
Writing Code - nodetool - cfstats:
cfstats:per-table statistics about size
and performance (single most useful command)
Writing Code - nodetool - cfhistograms:
cfhistograms:column count and partition size vs. latency distribution
Writing Code - nodetool - proxyhistograms:
proxyhistograms:performance of inter-cluster
requests
MANAGING ENVIRONMENTS
Managing Environments:
Configuration Management is Essential
Managing Environments:
Laptop to Productionwith NO
Manual Modifications!
Managing Environments:
Running Cassandraduring development
Managing Environments - Running Cassandra:
Local Cassandra easy to setup you control it but then you c