Apache Cassandra at Target - Cassandra Summit 2014
Embed Size (px)
Pioneering NoSQL in a Big Enterprise. The problems we needed to solve, the journey we took to get there, and the lessons we learned along the way.
Transcript of Apache Cassandra at Target - Cassandra Summit 2014
- 1. Apache Cassandra at Target: Pioneering NoSQL in a Big Enterprise Dan Cundiff (@pmotch) Target
- 2. Context Targets API platform mostly REST APIs e.g. products, locations, inventory, etc. consumers inside and outside of Target wide variety of providing systems (legacy, in-house built, saas, packages, etc.)
- 3. Problems we needed to solve slow providing systems cost prohibitive to call directly unable to scale from increased demand need a place to aggregate data from multiple systems some data wasnt even in a database to begin with!
- 4. Barriers with existing tools, part 1 cost too much process for traditional DBs wasnt a fit too few tools/vendors
- 5. Barriers with existing tools, part 2 RDBMS isnt: distributed (multi-tenant) close to Guests (geographic distribution) distributed across our data centers distributed to the cloud!
- 6. Barriers with existing tools, part 3 lack of performance control process, not owning it all, flexibility on changes like indexing, etc availability systems before had outages, downtime, etc. not automate-able
- 7. Discovering the solution
- 8. Taking the idea back i just went and talked to Pete and we decided to do it! tried other things in the past show results by trying; succeed or fail fast
- 9. Reasons trying was attractive, part 1 fit 80% of our need years in development rich C* dev ecosystem
- 10. Reasons trying was attractive, part 2 google-able strong community a company who would support it
- 11. Reasons trying was attractive, part 3 chef-able aligned well with existing investments simple pricing model
- 12. Barriers to adoption enterprise IT; the nature of it selling it NoSQL for the first time automation (was happening at the time; scary to do) political
- 13. Challenges integrating bulk loading data keeping cassandra in sync many systems not event driven packaged software limited ways to integrate with providing systems
- 14. Challenges of standing it up, part 1 early distributed system (new to teams) needed local disk (always used SAN before) needed SSDs (always used spinning things) existing config conflicts (backups, monitoring, raid, swap, etc) use right sized server (dont settle for what your infra friends give you by default)
- 15. Challenges of standing it up, part 2 full stack ownership its new, dont hand it off support response is quick because we own it youre closest to the problem; youre best suited to solve it tuned to meet the needs of our APIs data is modeled for API performance gains
- 16. Challenges of standing it up, part 3 skills supply is low (but getting better) train your people be wary of promises from consultants grill them on what they claim to know
- 17. Challenges of development, part 1 skills ramp up (data modeling, datastax driver, etc) developers need to care encourage tweaking, research, make things better clients are equally as important to get the most out of C*
- 18. Challenges of development, part 2 mind shift from RDBMS started with Astyanax; switched to DataStax driver DataStax supported newer features
- 19. Ops challenges, part 1 lots of machines; dont config by hand wrote Chef cookbooks support people saw these odd servers and turned on things we disabled (like swap) cant use legacy testing, cassandra works differently; chaos stuff (turn off gossip, thrift, etc.)
- 20. Ops challenges, part 2 made logging awesome; we can see anything utilized C* jmx interface to send data in real-time to Splunk can correlate these events with the app tier (because app logs are in Splunk too!)
- 21. Ops challenges, part 3 useful mbeans: heap usage specific read/write latencies dropped reads/writes bloom filter ratios column count, size
- 22. Ops challenges, part 4 more useful mbeans: ss tables per read tombstones cache hits and ratios misbehaving queries (range slice)
- 23. Open source cookbook! https://github.com/target/dse-cookbook by Danny Parker pull requests encouraged
- 24. Blog post on tuning http://target.github.io/infrastructure/tuning-cassandra/ by Danny Parker (@dcparker88)
- 25. Results, part 1 from n00bs to production ready = 2 months! infra, operation testing, app dev, and deployed! just in time before peak season today our highest volume APIs depend on it
- 26. Results, part 2 growth ( functions + volume) = ~2000% increased adoption of our APIs C* unlocking things we couldn't do before quick changes possible makes Agile possible gets us close to continuous delivery
- 27. Results, part 3 other teams are using it; more coming sharing our cookbooks, lessons, etc. opened the door to other distributed systems
- 28. Future, part 1 Use across more of our APIs Remove remaining spinning disks
- 29. Future, part 2 move to cloud automate full stack down to infra scale, quick geo-distribute, flexibility to tweak new infra settings, etc.
- 30. Future, part 3 get better at data modeling designs less bulk loading remove compaction process overhead weave in Spark, Kafka more event-based updates
- 31. Future, part crazy Docker + Cassandra?
- 32. Were hiring! Come talk to us