Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

10

description

Presenter: Claudiu Barbura, Senior Director of Engineering at Atigeo xPatterns is a big data analytics platform-as-a-service that enables rapid development of enterprise-grade analytical applications. It provides tools, API sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to Cassandra and solrCloud clusters for real-time access through low-latency/high-throughput (automatically generated) apis as well as dashboard and visualization api/tools leveraging the available data and models. In this talk I'll share some of the hard lessons we've learned in the past three years while leveraging Cassandra (and Hector) in large-scale enterprise-grade deployments. We will focus on three specific areas, in which we identified consistent best practices & design patterns: data model optimization as a result of exporting data from HDFS/Hive/Shark into Cassandra through Spark/Hadoop MR jobs under Mesos with throttling, instrumentation and resilience features, automatically publishing geo-replicated, instrumented and monitored REST API's on top of the exported Cassandra data, and lessons learned from running Cassandra at scale from 0.6 to 2.0.6, including performance tuning, and tips and tricks. You will see live demos of our Publish to NoSql tools (Spark/Shark, Mesos, Hive, Cassandra ), a dashboard application built on top of generated data apis (D3.js, Cassandra) and xPatterns' monitoring and instrumentation consoles (Graphite, Ganglia, Nagios).

Transcript of Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

Page 1: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments
Page 2: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

2  

Cassandra  in  xPa+erns  

Cassandra  Summit    Sept  2014  

Page 3: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

3  

• xPa'erns  Architecture  • Export  to  NoSql  API  (Demo)  • Monitoring,  instrumentaAon  (Demo)  • xPa'erns  applicaAon  (Demo)  • Data  Modeling  • Lessons  Learned  since  0.6  All  2.0.6  

Agenda  

Page 4: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

4  

Page 5: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

5  

Page 6: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

6  

Demos  …  

Page 7: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

7  

• NTP:  synchronize  ALL  clocks  (servers  and  clients)  •  Schema  disagreement:  lock  cluster  (Zk)  before  CF  create/delete  • Reduce  the  number  of  CFs  (avoid  OOM  …  memtable_total_space_in_mb)  • Do  not  drop  CFs  before  emptying  them  (truncate/compact  first)  • Monitoring,  instrumentaAon,  automaAc  restarts  • ConsistencyLevel:  ONE  is  best  …  for  our  use  cases  •  Key  cache,  Snappy  (LZ4)  compression,  vnodes  

Lessons  learned  0.6  -­‐  2.0.6  

Page 8: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

8  

• Rows  not  too  skinny  and  not  too  wide  (avoid  OOM)  o Less  memory  pressure  during  high-­‐throughput  writes  

o Reduced  network  I/O,  less  rows,  more  column  slices  

o Key  cache  &  bloom  filter  index  size  affects  perf  

o Efficient  compacAon,  avoid  hot  spots  • Custom  serializaAon  and  dynamic  columns  for  maximum  perf  gain  (40%)  

Data  Modeling  

Page 9: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

9  

Q  &  A  

Page 10: Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns Deployments

©  2013  AAgeo,  LLC.  All  rights  reserved.    AAgeo  and  the  xPa'erns  logo  are  trademarks  of  AAgeo.  The  informaAon  herein  is  for  informaAonal  purposes  only  and  represents  the  current  view  of  AAgeo  as  of  the  date  of  this  presentaAon.    Because  AAgeo  must  respond  to  changing  market  condiAons,  it  should  not  be  interpreted  to  be  a  commitment  on  the  part  of  AAgeo,  and  AAgeo  cannot  guarantee  the  accuracy  of  any  informaAon  provided  ager  the  date  of  this  presentaAon.    ATIGEO  MAKES  NO  WARRANTIES,  EXPRESS,  IMPLIED  OR  STATUTORY,  AS  TO  THE  INFORMATION  IN  THIS  PRESENTATION.