C*ollege Credit: Is My App a Good Fit for Cassandra?

21
Is My App A Good Fit For Cassandra? Eric Lubow @elubow [email protected]

Transcript of C*ollege Credit: Is My App a Good Fit for Cassandra?

Page 1: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra?

Eric Lubow @elubow [email protected]

Page 2: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Overview •  Planning

•  Data Stores

•  Comparisons

•  Use/Cases

•  Final Thoughts

•  Questions

Page 3: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Where am I

•  Planning Stages

•  MVP (Minimum Viable Product)

•  Iteration

•  “Final Decision”

Page 4: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

What Am I Building

•  User App

•  Hobby Project

•  Learning Project

•  Big Data System

Page 5: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

What is Big Data •  Depends on the user

•  Bigger Than Excel

•  Bigger Than One Server

•  Bigger Than One Rack

Page 6: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data.

Big Data Truth Bomb

Page 7: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Other Financial

Data

Planning Questions

Tech •  Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?

•  What kind of enterprise support is available?

•  What is the community like?

•  Does the product roadmap pertain to my roadmap?

•  Are my display requirements for realtime data?

•  Do I need to aggregate data on the fly?

•  Is my data structured or unstructured?

•  Does my data lend itself to a specific design pattern?

•  What are my query patterns?

•  Is my data ingestion high volume/high velocity?

•  Am I batch loading data?

•  Am I write heavy or read heavy?

•  Are data relationships important?

•  Does my data need to be immediately available everywhere?

•  Am I cloud based?

•  Am I hardware based?

•  Am I a cloud/iron hybrid?

•  How much am I willing to spend?

•  How much am I willing to spend if something goes wrong?

•  How fault tolerant is the system?

•  What supporting tools do I need?

•  Is there support for my language?

•  Is the encryption/authentication/authorization support sufficient for my needs?

•  Are there monitoring architectures already built?

•  Are there best practices guides already

•  Will the data need to be distributed?

Data Tech

Financial Other

Page 8: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Tools C*

Page 9: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Languages

Page 10: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Right Tool For The Job

Page 11: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  Large data volume ingestion at high velocity

•  Really fast writes to many locations (eventual consistency)

•  Query by column groups within rows (slicing)

•  Opscenter

•  Data toolkit: more than a data storage layer

•  TTLs for small group aggregation

Cassandra C*

Page 12: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  RowKey: 1345161600000:b198fa61-833a-6e78-fb83-233ec50b356e•  => (column=facebook:1345162260136000, value={"like_count":17,"url":"http://mysite.com/586352/celebrities-with-children/"},

timestamp=1345162260136000)•  => (column=facebook:1345162260167000, value={"like_count":18,"url":"http://mysite.com/586352/celebrities-with-children/"},

timestamp= 1345162260167000)•  => (column=facebook:1345162260261564, value={"like_count":21,"url":"http://mysite.com/586352/celebrities-with-children/"},

timestamp= 1345162260261564)•  => (column=pageviews:1345162259307830, value={"user-agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0;

OfficeLiveConnector.1.3; OfficeLivePatch.0.0; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727)","languages":"es-ve","user-id":"aede5694-3eb3-4cd0-810d-99d6bc2e0cb5","ip":"186.24.6.80"}, timestamp=1345162259307830)

•  => (column=pageviews:1345162259302140, value={"user-agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)","languages":"en-US","user-id":"a85679ab-9fd7-4aeb-93ab-2b66eddcf66a","ip":"192.168.255.182"}, timestamp=1345162259302140)

•  => (column=pageviews:1345162259302000, value={"referrer":"http://www.tv-links.eu/_gate_way.html?data=VfMjMzOTE2Nw==","user-agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)","languages":"en-NZ","user-id":"ba0c6320-c4ca-4cb8-b5d4-e6ef21dbdc3c","ip":"219.89.75.163"}, timestamp=1345162259302000)

•  => (column=pageviews:1345162259402000, value={"referrer":"http://foo.com/pop-culture/2012/09/40-Most-Weird-Comics-Ever","user-agent":"Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","languages":"en-US,en;q=0.8","user-id":"899f51ab-3e08-475a-9392-7eee5446edc3","ip":"24.118.178.215"}, timestamp=1345162259402000)

•  => (column=twitter:1345162260246000, value={"count":17,"url":"http://mysite.com/586352/celebrities-with-children/"}, timestamp=1345162260246000)

Cassandra Data

Page 13: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  Fast atomic increments (Node.js is native JSON)

•  Sharding

•  Solid ORM for Rails (MongoID)

•  Fast access for pub/sub of durable/persisted documents

•  B-Tree Indexes

•  Document based via JSON

•  TTLs for ephemeral data

MongoDB

Page 14: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  { "_id" : ObjectId("505a089275885cc53cd66520"),•  "account_id" : ObjectId("4e87f81ca782f3404200000a"),•  "day" : ISODate("2012-01-01T00:00:00Z"),•  "md5" : "54f762d1025aadd6e2687005db657dac", •  "stats" : {•  "sum" : { "fb" : 108, "fba" : 108, "fbc" : 1, "fbl" : 71, "fbr" : 326, "fbs" : 36, "gp" : 2,

"gpa" : 2, "li" : 3, "lia" : 3, "p" : 1840, "pi" : 1, "pia" : 1, "pspv" : 859.384, "soca" : 173, "socr" : 493, "srchr" : 27, "srt" : 86.48748533542772, "su" : 1, "sua" : 1, "tw" : 58, "twa" : 58, "twflc" : 4025418, "twfrc" : 139758, "twp" : 50, "twpa" : 50, "twr" : 167 },

•  "18" : { "sum" : { "fb" : 4, "fba" : 4, "fbl" : 2, "fbr" : 2, "fbs" : 2, "p" : 179, "pspv" : 105.1336, "soca" : 18, "socr" : 8, "srchr" : 10, "srt" : 60.337503923146954, "srtv" : 89.4550357667952, "tw" : 14, "twa" : 14, "twflc" : 107842, "twfrc" : 108111, "twg" : 8, "twp" : 7, "twpa" : 7, "twr" : 6 } },

•  "19" : { "sum" : { "fb" : 63, "fba" : 63, "fbl" : 40, "fbr" : 179, "fbs" : 23, "gp" : 2, "gpa" : 2, "p" : 498, "pi" : 1, "pia" : 1, "pspv" : 278.6148999999999, "soca" : 74, "socr" : 200, "srchr" : 5, "srt" : 74.27775496525277, "srtv" : 89.71819309386892, "tw" : 8, "twa" : 8, "twflc" : 9941, "twfrc" : 4228, "twg" : 7, "twp" : 7, "twpa" : 7, "twr" : 21 } } } }

MongoDB Data

Page 15: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

•  Supports hundreds of thousands transactions per second

•  Great caching engine

•  Supports useful variable types like sets, sorted set, lists

•  Everything is guaranteed to Memory Mapped (mmap)

•  Transactional and supports bulk operations

•  Centralized queueing and locking system

Redis

Page 16: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Cons •  Redis

•  Can only utilize a single core

•  Data must be smaller than memory

•  No clustering

•  Cassandra

•  No btree indexes

•  Mongo

•  Non-hashed shard keys

•  Indexes must fit in memory.

•  Forced replica ping times.

Page 17: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Use Cases •  Time Series

•  Counters

•  Feed Based Activity

•  Large Amounts of Data

Page 18: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

The Cloud •  Open source libraries into the API

•  Auto-scaling for magical scalabilty

•  Quickly test assumptions

•  Spot Instances

Page 19: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Support and Expertise •  What happens when you need help?

•  How do you become experts?

•  What happens when you need more experts?

Page 20: C*ollege Credit: Is My App a Good Fit for Cassandra?

Is My App A Good Fit For Cassandra Eric Lubow @elubow

Summary •  Have answers to the important questions

•  Know your data read/write patterns

•  Know the tools available to you

•  Know your compromises

Page 21: C*ollege Credit: Is My App a Good Fit for Cassandra?

Questions are guaranteed in life. Answers aren’t.

Eric Lubow

@elubow

[email protected]

Thank you.