Case Study: Troubleshooting Production Issues as a Developer
© 2015. All Rights Reserved.
1 Introductions
2 The problem
3 Show me the code!
4 Conclusions
5 Q & A
Carlos Alonso - About me
3© 2015. All Rights Reserved.
• Spanish Londoner
• MSc at Salamanca University, Spain
• Software Engineer @ MyDrive Solutions
• Enjoying Cassandra since 2014
• Cassandra Certified Developer
• DataStax Cassandra MVP 2015
• @calonso / http://mrcalonso.com
MyDrive Solutions - About
4© 2015. All Rights Reserved.
• World leading driver profiling company. • Using technology and data to understand
how to improve driving behaviour. • Enjoying Cassandra since 2012 • Recently acquired by The Generali Group. • @_MyDrive • http://www.mydrivesolutions.com • We are hiring!
@calonso @_MyDrive© 2015. All Rights Reserved. 5
Troubleshooting performance issues in production as developer
@calonso @_MyDrive© 2015. All Rights Reserved. 6
trips.csvdriver_id start_time end_time start_location end_location
123456 2011-04-18 12:23:36 +0100
2011-04-18 14:11:22 +0100
-11.06238284, -126.71490131
86.02409727, 72.13493018
…
import.rb
The program
@calonso @_MyDrive© 2015. All Rights Reserved. 7
CREATE KEYSPACE drivers WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' };
• 2 environments • 3 nodes clusters • Cassandra 1.2.9
The Setup
@calonso @_MyDrive© 2015. All Rights Reserved. 8
CREATE TABLE trips ( driver_id int, end_time timestamp, end_location text, start_location text, start_time timestamp, PRIMARY KEY (driver_id, end_time) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'};
@calonso @_MyDrive© 2015. All Rights Reserved. 9
Tools
Ruby-Prof https://github.com/ruby-prof/ruby-prof
Cassanity https://github.com/jnunemaker/cassanity
@calonso @_MyDrive
Conclusions
• Measure everything • Metrics and monitoring let us know we were facing an
unexpected performance issue
• Keep calm, read to the end • On first profiling report I could have spotted that the issue was
on Cluster#connect but due to my eagerness to find a fix I made a wrong assumption that meant more time (~2 days!).
© 2015. All Rights Reserved. 11
@calonso @_MyDrive
Thank the community
• We have great tools such as: • Cassandra: Apache and DataStax • Statsd, Graphite. • RubyProf: All members of RubyProf Organisation
• Special thanks to Patrick McFadin.
© 2015. All Rights Reserved. 12
Thank you
Top Related