Apache Cassandra at Wayin

download Apache Cassandra at Wayin

of 14

  • date post

  • Category


  • view

  • download


Embed Size (px)


Jamey Wood presents on Apache Cassandra at Wayin for the Colorado Cassandra Users Group on August 28th, 2013. http://www.meetup.com/Colorado-Cassandra-Meetup/

Transcript of Apache Cassandra at Wayin

  • 1. August 28, 2013 Cassandra in the Cloud August 28, 2013Jamey Wood

2. Wayin: History 8/30/2013 2 Founded in 2011 Located in beautiful Denver, Colorado Global clients in largest corporations, sports teams, agencies, and publishers $20M raised Co-founded by Scott McNealy Twitter Certified May 2013 3. Wayin: Mission Transforming Social Media into Brand Experiences 8/30/2013 3 4. 8/30/2013 4 Marketing is becoming more reactive, and the ability to own, brand, curate and customize relevant experiences in the moment is more valuable now, than it has ever been Why it Works 5. How it Works 8/30/2013 5 ELB Load Balancer CloudFront S3 Route 53 SQS API Server API Server API Server API Server Scaling Group Auto-Scaled Based on Machine Load Clients DB Server Scaling Groups Scaled Based on Data Volume Cassandra API Server API Server Tracking Server Tracking Server Scaling Group Auto-Scaled Based on Queue Length 6. Challenge 1: Provisioning and Deployment CloudFormation, Auto Scaling Groups, and the Cassandra Ring 8/30/2013 6 Clients CloudFormation DB Auto Scaling Group: us-east-1a DB Auto Scaling Group: us-east-1b DB Auto Scaling Group: us-east-1c 1a 1a 1b 1c1b 1c Cassandra time 7. Challenge 1: Provisioning and Deployment Pitfalls and Opportunities 8/30/2013 7 Clients Auto Scaling Groups are helpful for automatically replacing terminated instances, but certain actions can be problematic. Be familiar with as-suspend-processes options. Token management is important to keep Cassandra ring balanced, properly distributed across availability zones, etc. Also important to be able to bring up rings (and launch replacement servers) in a fully automated fashion. Netflixs Priam open source tool can provide this kind of token management (and more). 8. Challenge 2: Migration 8/30/2013 8 Clients Jackson{ _id: abc, author : John Doe, body: some text, } id: abc author: John Doe data: { } id: def author: JaneDoe data: { } id: ghi author: Jim Doe data: { } id: jkl author: Jill Doe data: { } MongoDB Cassandra 9. Challenge 3: Volatile Performance Managing EC2 I/O 8/30/2013 9 Clients Source for EC2 IO Performance Graph: http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/ IO Performance for 45 EC2 Instances over Time Mitigation: md(4) RAID0 across Ephemeral Disks 10. Challenge 3: Volatile Performance Client Resiliency 8/30/2013 10 Clients new ConnectionPoolConfigurationImpl("MyConnectionPool") // Will resort hosts per token partition every 10 seconds .setLatencyAwareUpdateInterval(10000) // Will clear the latency every 10 seconds .setLatencyAwareResetInterval(10000) // Will sort hosts if a host is more than 100% slower than the best and always // assign connections to the fastest host, otherwise will use round robin .setLatencyAwareBadnessThreshold(2) // Uses last 100 latency samples. These samples are in a FIFO queue and // will just cycle themselves .setLatencyAwareWindowSize(100); Astyanax Example: Configuring Latency Awareness 11. Challenge 4: Sorting 8/30/2013 11 1a 1b 1c Cassandra 1b 1c 1a Single wide rows make it easy to code sorting/slicing logic, but can lead to performance hotspots. Good rule of thumb is to keep individual rows below 10MB in size[1]. Our current solution involves using bucketed wide rows (spreading the data for a given sorting range across multiple keys/servers, and then collating that data during reads). More info: 1. http://rubyscale.com/blog/2011/03/06/basic-time-series- with-cassandra/ 2. http://www.datastax.com/dev/blog/advanced-time-series- with-cassandra 12. Challenge 5: Monitoring Nagios Reports 8/30/2013 12 Clients Nagios Report: RecentReadLatency 13. Challenge 5: Monitoring Nagios Setup 8/30/2013 13 Clients ColumnFamilies/RecentReadLatencyMicros for some_table table check_jmx -U service:jmx:rmi:///jndi/rmi:// org.apache.cassandra.db:columnfamily=some_table,keyspace=some_keyspace ,type=ColumnFamilies Monitor Cassandra using JMX Nagios Plugin / NRPE (Nagios Remote Plugin Executor) http://wiki.apache.org/cassandra/JmxInterface 14. Challenge 6: Were Hiring! Looking for great developers to work with Cassandra (amongst other things) 8/30/2013 14 Clients http://www.wayin.com/about-us/careers Senior Software Engineer Work with great people and great technologies: Cassandra JVM Jetty Jersey Jackson AWS Vice President of Sales Work with great brands and agencies: Denver Broncos Atlanta Falcons St. Louis Rams San Jose Sharks Chevrolet Bank of America Turtlewax