C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit...
-
Upload
planet-cassandra -
Category
Technology
-
view
8.451 -
download
0
Transcript of C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit...
Intuit Proprietary & Confidential
people
The Consumer Financial Platform (CFP) Mohit Anchlia Architect, Intuit
Intuit Proprietary & Confidential
Agenda
2
• Background • Problem statement • Idea of a Platform • Why Cassandra? • CFP Stack • CFP Cassandra Data Model • Learning in Production • Q&A
Intuit Proprietary & Confidential
Background
3
• Intuit is maker of TurboTax, Quicken, Quickbooks and many other products for SBUs.
• Many services work together to deliver awesome product experience
Intuit Proprietary & Confidential
Problem Statement (Service explosion)
4
• Service explosion over the years
– Code duplication – Cross cutting concern – Data silos (information silos) – Operational challenges - schema design, installs – Added overhead to test and repeat test in production –
slow prototyping
Intuit Proprietary & Confidential 5
Idea of a Platform
• Brings information together to avoid data silos
• Quick turnaround time • Plug and play service
framework • Don’t need IT and
operations
• Highly personalized experience
• Security
• Share data between products, between users
to plug ‘n’ play
Intuit Proprietary & Confidential
Data Platform/Tier
6
• Principles – Highly Available, Highly Scalable, Fast, Easy to operate software only solution for structured and unstructured data (blobs)
• Projection – Petabyte in 2-3 yrs • Support – Critical application with 99.99%(5 nines) SLA • But Wait …No Stress
Intuit Proprietary & Confidential
Traditional RDBMS?
7
• Challenges with availability and scalability
• Sharding works well, but introduces new challenges as well
Intuit Proprietary & Confidential
NoSQL?
8
• Easy?
• Core use cases – Most of the use cases don’t need transactions and with good design, consistency can be managed properly.
• Evaluated Hbase, MongoDB and Cassandra.
Intuit Proprietary & Confidential
Why Cassandra?
9
• Scalable – Easy to scale horizontally
• Availability – Highly Available, can be designed for no SPOF – Easy to setup clusters and replication between DC – Fast snapshots – Rolling upgrades
• Operations – Easy to install and operate – Easy to make schema changes
• Fast – Given the right hardware, Cassandra provides low latency response times.
Intuit Proprietary & Confidential
High Level CFP Stack
10
Data Platform
Services Platform
Mule ESB
Queue Service Cache service
Cassandra RedHat Storage
(DFS)
Analytics Platform
Mule ESB (services)
Mule ESB
HBase Hadoop Search Engine MPP
Flume
• MuleSoft ESB for business logic orchestration, with frameworks for additional authoring
Cassandra-powered schemaless database wrapped in entity and relationship logic. RHS – a distributed file system for blob storage
Hadoop/Hbase/Solr/CEP-to meet batch processing and near real time analytics
Intuit Proprietary & Confidential
CFP Active/Active Multi-Data Center
11
Data Platform
Services Platform
Cassandra
RedHat Storage (DFS)
Analytics Platform
Hadoop
Mule
Data Platform
Services Platform
Cassandra
RedHat Storage (DFS)
Analytics Platform
Hadoop
Mule
Replication
Replication
Replication
Load Balancer
Load Balancer
Global Load Balancer
• 30mt Session stickiness
• Provides HA • Low Latency
DC-A DC-B
Intuit Proprietary & Confidential
CFP Schema
12
• Represented as a graph – Entity – Relationships
• Additional CF for indexes – Inverted Indexes driven by schema
Entity User Entity
Document
Index CF
Intuit Proprietary & Confidential
Learning in Production
13
• Monitor Heap Usage – High and uneven CPU usage – Add nodes if you can – Reduce Bloom Filters – Increase heap if you have to, don’t be scared
Before After
• Monitor Data per Node – Most importantly keys per node • Monitor disk IO