Cassandra Summit 2014: Cassandra at Instagram 2014

download Cassandra Summit 2014: Cassandra at Instagram 2014

of 69

  • date post

    05-Dec-2014
  • Category

    Technology

  • view

    2.101
  • download

    3

Embed Size (px)

description

Presenter: Rick Branson, Infrastructure Engineer at Instagram As Instagram has scaled to over 200 million users, so has our use of Cassandra. We've built new features and rebuilt old on Cassandra, and it's become an extremely mission-critical foundation of our production infrastructure. Rick will deliver a refresh of our use cases and go deep on the technical challenges we faced during our expansion.

Transcript of Cassandra Summit 2014: Cassandra at Instagram 2014

  • 1. CASSANDRA @ INSTAGRAM 2014 Rick Branson, Software Engineer, Backend Engineering @rbranson Cassandra Summit September 11th, 2014
  • 2. 2013 in a Nutshell Moving some log-style data from Redis to Cassandra Saving $, better availability, more elasticity Some problems, but we worked through them
  • 3. Overview "The Trouble with Redis" A whole new category of functionality Useful consumer advice
  • 4. "THE TROUBLE WITH REDIS"
  • 5. REDIS: THE GD PARTS
  • 6. The Network Part It's a heap on the network with a great text-based protocol.
  • 7. Extremely Efficient It does a lot with every ounce of CPU
  • 8. Stable Very much so. Sets the bar for the vast NoSQL space.
  • 9. Useful, Rich Types You won't find anywhere else.
  • 10. REDIS: ZDARKSIDE
  • 11. What's in there? Hello?
  • 12. Snapshots (BGSAVE)
  • 13. Single-Threaded (!)
  • 14. Memory Allocator
  • 15. (Lack of) Durability
  • 16. Unsafe Replication
  • 17. THE PHILOSOPHY
  • 18. CACHE OR DATABASE? Neither.
  • 19. Memory Cliff
  • 20. Unlikely to Change :(
  • 21. ZOOKEEPER???
  • 22. ZOZO: *VERY* few writes, extremely high volume, no-latency reads without hotspots.
  • 23. zozo agent /var/cache/zozo Consumer Process Consumer Process Consumer Process WATCH /zozo/files Writer UPDATE /zozo/files/knobs ZooKeeper Ensemble
  • 24. Examples "Knobs" key/integer live configuration Small ( ... (SEEN, 902394) (SEEN, 94083) (LIKE, 902394) 1000000 (POST_HEADER, 0) => ... (BUMPED, 0) => 2000000 (SEEN, 1298333) (SEEN, 848183) ...
  • 45. Justin's Pending 25000 24000 23000 22000 21000 20000 19000 ... 1234 Justin's Inbox ... (1234, POST..) (1234, LIKE..) (1234, SEEN..) (1234, SEEN..) (1010, POST..) (1010, SEEN..) (1010, COMM..) ... Inbox ... (1234, POST..) (1234, LIKE..) (1234, SEEN..) (1234, SEEN..) (1010, POST..) (1010, SEEN..) (1010, COMM..) ... Inbox ... (1234, POST..) (1234, LIKE..) (1234, SEEN..) (1234, SEEN..) (1010, POST..) (1010, SEEN..) (1010, COMM..) ... Inbox ... (1234, POST..) (1234, LIKE..) (1234, SEEN..) (1234, SEEN..) (1010, POST..) (1010, SEEN..) (1010, COMM..) ... (1234, EXCLUDE, )
  • 46. Rick's Unseen 849300 738939 649019 641192 629012 610392 483201 348931 Rick's Unseen 849300 738939 649019 641192 629012 610392 483201 348931 Rick's Unseen 849300 738939 649019 641192 629012 610392 483201 348931 8
  • 47. DEPLOYMENT 60 x hi1.4xlarge EC2 instances 2X our highest estimates Shrunk cluster after we knew load Peace of mind Data models held together
  • 48. We ran C* in production. We learned some things.
  • 49. young gc fun!
  • 50. Yung GC has problems Reads = young gen garbage Double-collection bug in JDK 1.7+ Best practices were for nodes with more writes
  • 51. DEAR GOD ALMIGHTY!!! now we run: 10G Young Gen 20G Heap
  • 52. This won't work out of the box.
  • 53. -XX:+CMSScavengeBeforeRemark
  • 54. -XX:CMSMaxAbortablePrecleanTime=60000
  • 55. -XX:CMSWaitDuration=30000
  • 56. -XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled
  • 57. 20-core Ivy Bridge 144GB RAM PCIe Flash 15,000 reads/sec 2,000 writes/sec
  • 58. TWO ISSUES Counters. Don't use them. Hundreds of thousands / millions of tombstones.
  • 59. GET 12349081 Async Storage Port Handler READ 12349081 Client RPC Thread Pool Queue ReadStage Thread Pool Queue
  • 60. THANK YOU.