Cassandra at NoSql Matters 2012

Click here to load reader

download Cassandra at NoSql Matters 2012

of 46

  • date post

    26-Jan-2015
  • Category

    Documents

  • view

    103
  • download

    0

Embed Size (px)

description

 

Transcript of Cassandra at NoSql Matters 2012

  • 1. Apache Cassandra:Real-world scalability, today!Jonathan EllisCTO

2. Cassandra Job Trends2012 DataStax 3. Big Data trend2012 DataStax 4. Why Big Data Matters Research done by McKinsey & Company shows the eye-opening, 10- year category growth rate differences between businesses that smartly use their big data and those that do not.2012 DataStax 5. Big data AnalyticsRealtime ? (Hadoop)(NoSQL)2012 DataStax 6. Some Casandra users 2012 DataStax 7. Industries & use cases Financial Time series data Social Media Messaging Advertising Ad tracking Entertainment Data mining Energy User activity streams E-tail User sessions Health care Anything requiring: Scalable performant Government + highly available2012 DataStax 8. Why Cassandra? Fully distributed, no SPOF Multi-master, multi-DC Linearly scalable Larger-than-memory datasets Best-in-class performance (not just writes!) Fully durable Integrated caching Tuneable consistency2012 DataStax 9. Availability There is no such thing as standbyinfrastructure: there is stuff you always use andstuff that wont work when you need it. -- BenBlack: founder, Boundary; ex-AWS The biggest problem with failover is that yourealmost never using it until it really hurts. Its likebackups that you never test. -- Rick Branson: instagram;ex-DataStax2012 DataStax 10. Classic partitioning with SPOF partition 1 partition 2partition 3 partition 4 router client2012 DataStax 11. Fully distributed, no SPOF clientp3p6p1 p1 p12012 DataStax 12. 2012 DataStax 13. Partitioningjim age: 36 car: camaro gender: M carolage: 37 car: subaru gender: F johnny age:12gender: M suzy age:10 gender: F2012 DataStax 14. Partitioning Primary key determines placement*jim age: 36 car: camaro gender: M carolage: 37 car: subaru gender: F johnny age:12gender: M suzy age:10 gender: F2012 DataStax 15. PKMD5 Hashjim 5e02739678...MD5 hashoperation yields carola9a0198010... a 128-bit johnny f4eb27cea7... number forkeys suzy 78b421309e... of any size.2012 DataStax 16. The token ring Node A Node B Node D Node C2012 DataStax 17. Start End 0xc000000000.. 0x0000000000.. A 10 0x0000000000.. 0x4000000000.. B 10 0x4000000000.. 0x8000000000.. C 10 0x8000000000.. 0xc000000000.. D 10jim 5e02739678... carola9a0198010... johnny f4eb27cea7... suzy 78b421309e...2012 DataStax 18. Start End 0xc000000000.. 0x0000000000.. A 10 0x0000000000.. 0x4000000000.. B 10 0x4000000000.. 0x8000000000.. C 10 0x8000000000.. 0xc000000000.. D 10jim 5e02739678... carola9a0198010... johnny f4eb27cea7... suzy 78b421309e...2012 DataStax 19. Start End 0xc000000000.. 0x0000000000.. A 10 0x0000000000.. 0x4000000000.. B 10 0x4000000000.. 0x8000000000.. C 10 0x8000000000.. 0xc000000000.. D 10jim 5e02739678... carola9a0198010... johnny f4eb27cea7... suzy 78b421309e...2012 DataStax 20. Start End 0xc000000000.. 0x0000000000.. A 10 0x0000000000.. 0x4000000000.. B 10 0x4000000000.. 0x8000000000.. C 10 0x8000000000.. 0xc000000000.. D 10jim 5e02739678... carola9a0198010... johnny f4eb27cea7... suzy 78b421309e...2012 DataStax 21. Start End 0xc000000000.. 0x0000000000.. A 10 0x0000000000.. 0x4000000000.. B 10 0x4000000000.. 0x8000000000.. C 10 0x8000000000.. 0xc000000000.. D 10jim 5e02739678... carola9a0198010... johnny f4eb27cea7... suzy 78b421309e...2012 DataStax 22. Replication Node A Node B Node D Node C carol a9a0198010...2012 DataStax 23. Node A Node B Node D Node C carol a9a0198010...2012 DataStax 24. Node A Node B Node D Node C carol a9a0198010...2012 DataStax 25. Highlights Adding capacity is application-transparent andrequires no downtime No SPOF, not even temporarily No primary replica Congurable synchronous/asynchronous Tolerates node failure; never have to restartreplication from scratch Smart replication avoids correlated failures2012 DataStax 26. What about performance? Log-structured storage engine avoids random i/o Excellent performance on both reads and writes Row-level isolation via concurrent algorithms no locking Built in compression improves cache hotness Row cache can replace memcached2012 DataStax 27. reads/swrites/s 3500030000 2500020000 15000 10000 5000 Cassandra 0.6 02012 DataStax Cassandra 1.0 28. 2012 DataStax 29. Netflix Application/Use Case Manage subscriber interactions with downloaded movies Need to handle distributed databases all over the world (40 countries) Need better TCO than Oraclesimple textWhy Cassandra? Easy scale and multi-data center support for geographical data distribution Data model perfect fit for customer interaction data Much better TCO than Oracle or SimpleDB I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, were ready.2012 DataStax 30. Constant ContactApplication/Use Case Manage marketing/email campaigns forsmall businesses Needed database to handle social mediadata that is very large in volume and must bemaintained for long time Data is unstructured in naturesimple textWhy Cassandra? Cassandra built for big data scale and ableto persist, manage, and quickly query bigdata Deployed application on Cassandra in1/3rd the time and 1/10th the cost ofOracle Whenever we need new capacity, we just add new nodes online and were able to meet whatever demand we have. Cassandra is great for that.2012 DataStax 31. ReachLocal Application/Use Case ReachLocal provides end-to-end Internet advertising services to small and medium- sized businesses in eight countries Must track most or all user interaction with marketing campaigns on web sitessimple textWhy Cassandra? The amount of information was beyond the scalability limits of traditional RDBMSs Has to replicate data to six data centers around the world Needed integration with real-time data and analytics/search2012 DataStax 32. Backupify Application/Use Case Cloud-based utility that enables backups and searches of Google Apps, Gmail, Facebook, Twitter, Blogger and other content. Must write lots of data very quicklysimple textWhy Cassandra? Big data requirements necessitated easy scale out and continuously available database architecture Strong Community support of Cassandra TCO was much better than others Cassandra was just a better design all around more truly horizontally scalable and with less management overhead and theres no single point of failure. I looked at Cassandras architecture and thought, Yeah, thats how you do it.2012 DataStax 33. OpenWave Application/Use Case Openwave Messaging delivers next generation converged messaging platform with cloud and social integration capabilities.simple text Why Cassandra? Needed new database that would support geographic redundancy, continuous availability, and big data scale Required high IOPS database speed Better TCO than prior Oracle database Here are the big checkbox items for us with Apache Cassandra: There is no single point of failure, it offers high read- and-write performance, and it has the ability to work on commodity hardware.2012 DataStax 34. Healthx Application/Use Case Develops and manages online portals for healthcare market Delivered via cloud platform Manages provider, patient, and other related datasimple textWhy DataStax Enterprise? Needed to scale, perform, and search data faster than previous Microsoft SQL Server database farm Integrated big data platform that provides one database cluster for all real-time and search data We really like the integration with Solr. We get the full redundancy that youd expect out of Cassandra as well as the full text indexing of Solr. The two things together make a win.2012 DataStax 35. Big data AnalyticsRealtime ? (Hadoop)(NoSQL)2012 DataStax 36. The evolution of Analytics Analytics + Realtime2012 DataStax 37. The evolution of Analytics replication Analytics Realtime2012 DataStax 38. The evolution of Analytics ETL2012 DataStax 39. Big data AnalyticsDatastax Realtime (Hadoop)Enterprise (Cassandra)2012 DataStax 40. Reunication of realtime + analytics2012 DataStax 41. 2012 DataStax 42. Portfolio Demo dataowPortfoliosPortfoliosHistorical Prices Live Prices forIntermediate todayResultsLargest lossLargest loss2012 DataStax 43. Better Hadoop than Hadoop Vanilla Hadoop 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) Single points of failure Cant separate online and offline processing DataStax Enterprise Single, simpliedcomponent Self-organizes based on workload Peer to peer JobTracker failover2012 DataStax 44. Enterprise search with Solr SELECT title FROM solr WHERE solr_query=title:natio*;title -------------------------------------------------------------------------- Bolivia national football team 2002List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999Israel mens national inline hockey team Bolivia national football team 20012012 DataStax 45. Managing & Monitoring Big DataDataStaxOpsCentermanages andmonitors allCassandra andHadoopoperations 2012 DataStax 46. Questions? http://www.datastax.com/docs http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1 http://www.datastax.com/products/enterprise2012 DataStax