Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Embed Size (px)
Presenter: Feng Qu, Principal DBA at eBay Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Transcript of Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
- 1. Cassandra Best Prac-ces at ebay inc Feng Qu principal database engineer, ebay inc September 11, 2014 CassandraSummit2014 | #CassandraSummit
- 2. CassandraSummit2014 | #CassandraSummit Agenda ebay inc Cassandra footprints NoSQL life cycle Cassandra best prac?ces Q&A
- 3. CassandraSummit2014 | #CassandraSummit ebay inc
- 4. CassandraSummit2014 | #CassandraSummit ebay inc Database Pla5orms We manage thousands of databases powering eBay and PayPal
- 5. CassandraSummit2014 | #CassandraSummit Why NoSQL? Challenges of tradi?onal RDBMS Performance penalty to maintain ACID features Lack of na?ve sharding and replica?on features Lack of linear scalability Cost of soMware/hardware Higher cost of commit NoSQL used in eBay inc Cassandra, Couchbase, MongoDB managed by DBA HBase, Redis, OpenTSDB managed by developers
- 6. CassandraSummit2014 | #CassandraSummit Cassandra @ ebay inc Started in 2011 at eBay and later expanded to PayPal Started with Apache Cassandra 0.8, now using Apache Cassandra 2.0 and DataStax Enterprise 4.0 Over a dozen produc?on clusters on hundreds of servers across 3 data centers Choices between dedicated cluster for large/cri?cal use case and mul?-tenant cluster for small use cases Over 20 billions daily reads/writes to Cassandra Cluster size varies from 4-node to 80-node 100TB+ user data on HDD, local SSD and SSD array One cluster is es?mated to grow over few PBs
- 7. CassandraSummit2014 | #CassandraSummit Use Case Analysis Data Modeling Capacity NoSQL Life Cycle Operation Deployment Planning
- 8. CassandraSummit2014 | #CassandraSummit Data Modeling Phase Development team requests a review mee?ng for a new use case with data architect Once data architect understands requirement and then recommends a proper data store. It could be either one of RDBMS or one of NoSQL products we support Both par?es work on data modeling together Outputs the engagement are a set of ?ckets, for tracking purpose, which captures project informa?on and data configura?on for chosen data store.
- 9. CassandraSummit2014 | #CassandraSummit Data Modeling Best Prac-ces Unlike tradi?onal RDBMS, data modeling for Cassandra is quite different. Modeling around query pa_ern, not en?ty De-normalize to improve read performance Separate read heavy data from write heavy data Store values in column names as names are physical sorted already Former eBay architect Jay Patel published few technical blogs on Cassandra data modeling.
- 10. Data Modeling Best Prac-ces - indexing Secondary CassandraSummit2014 | #CassandraSummit index + Less overhead as built in + data and index are changed atomically - not scale well with high cardinality data Column family as index + No hot spot - index is maintained manually by applica?on - index change is not atomically Avoid secondary index and use column family as index if possible
- 11. CassandraSummit2014 | #CassandraSummit Benchmark Tes-ng Benchmark tes?ng is key to capacity planning Performance baseline with near-real traffic in produc?on size environment for different type of hardware for different soMware release for different use case or workload A proac?ve and repe??ve process
- 12. CassandraSummit2014 | #CassandraSummit Capacity Planning Phase Is key to avoid surprise in produc?on The concept behind capacity planning is simple, but the mechanics are harder. Business requirements may increase, need to forecast how much resource must be added to the system to ensure that user experience con?nues uninterrupted Input: clearly defined capacity goal coming from business requirement and performance baseline from benchmark test Output: Iden?fy resources to be added, such as memory, CPU, storage, I/O, network Always prepare for peak + headroom
- 13. CassandraSummit2014 | #CassandraSummit Deployment Best Prac-ces SoMware packages with customized op?miza?on kernel, JVM heap, compac?on Deployment automa?on for efficiency Mul? data center deployment for load balancing and disaster recovery Vnode is a must for manageability SSD as default storage requires addi?onal OS level tuning
- 14. CassandraSummit2014 | #CassandraSummit Opera-on Best Prac-ces Collect system and database metrics Monitoring and aler?ng event driven and metrics driven alerts Opera?on runbook Reduce human error Performance tuning runbook nodetool tpstats for dropped requests nodetool cdistograms for latency distribu?on Troubleshoo?ng runbook Document previous incidents as future reference
- 15. CassandraSummit2014 | #CassandraSummit Opera-on Best Prac-ces Rou?ne repair is not really needed if there is no deletes. You s?ll need run repair aMer bringing up a down node if it is dead for a while Use CNAME in client configura?on to avoid client conf change in case of hardware replacement with new IP/ name Reduce gc_grace to reduce overall data size Disable row cache, unless you have