C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit...

14
Intuit Proprietary & Confidential people The Consumer Financial Platform (CFP) Mohit Anchlia Architect, Intuit

Transcript of C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit...

Intuit Proprietary & Confidential

people

The Consumer Financial Platform (CFP) Mohit Anchlia Architect, Intuit

Intuit Proprietary & Confidential

Agenda

2

• Background • Problem statement •  Idea of a Platform • Why Cassandra? • CFP Stack • CFP Cassandra Data Model •  Learning in Production • Q&A

Intuit Proprietary & Confidential

Background

3

•  Intuit is maker of TurboTax, Quicken, Quickbooks and many other products for SBUs.

•  Many services work together to deliver awesome product experience

Intuit Proprietary & Confidential

Problem Statement (Service explosion)

4

•  Service explosion over the years

– Code duplication – Cross cutting concern – Data silos (information silos) – Operational challenges - schema design, installs – Added overhead to test and repeat test in production –

slow prototyping

Intuit Proprietary & Confidential 5

Idea of a Platform

•  Brings information together to avoid data silos

•  Quick turnaround time •  Plug and play service

framework •  Don’t need IT and

operations

•  Highly personalized experience

•  Security

•  Share data between products, between users

to plug ‘n’ play

Intuit Proprietary & Confidential

Data Platform/Tier

6

•  Principles – Highly Available, Highly Scalable, Fast, Easy to operate software only solution for structured and unstructured data (blobs)

•  Projection – Petabyte in 2-3 yrs •  Support – Critical application with 99.99%(5 nines) SLA •  But Wait …No Stress

Intuit Proprietary & Confidential

Traditional RDBMS?

7

•  Challenges with availability and scalability

•  Sharding works well, but introduces new challenges as well

Intuit Proprietary & Confidential

NoSQL?

8

•  Easy?

•  Core use cases – Most of the use cases don’t need transactions and with good design, consistency can be managed properly.

•  Evaluated Hbase, MongoDB and Cassandra.

Intuit Proprietary & Confidential

Why Cassandra?

9

•  Scalable – Easy to scale horizontally

•  Availability –  Highly Available, can be designed for no SPOF –  Easy to setup clusters and replication between DC –  Fast snapshots –  Rolling upgrades

•  Operations –  Easy to install and operate –  Easy to make schema changes

•  Fast –  Given the right hardware, Cassandra provides low latency response times.

Intuit Proprietary & Confidential

High Level CFP Stack

10

Data Platform

Services Platform

Mule ESB

Queue Service Cache service

Cassandra RedHat Storage

(DFS)

Analytics Platform

Mule ESB (services)

Mule ESB

HBase Hadoop Search Engine MPP

Flume

•  MuleSoft ESB for business logic orchestration, with frameworks for additional authoring

Cassandra-powered schemaless database wrapped in entity and relationship logic. RHS – a distributed file system for blob storage

Hadoop/Hbase/Solr/CEP-to meet batch processing and near real time analytics

Intuit Proprietary & Confidential

CFP Active/Active Multi-Data Center

11

Data Platform

Services Platform

Cassandra

RedHat Storage (DFS)

Analytics Platform

Hadoop

Mule

Data Platform

Services Platform

Cassandra

RedHat Storage (DFS)

Analytics Platform

Hadoop

Mule

Replication

Replication

Replication

Load Balancer

Load Balancer

Global Load Balancer

•  30mt Session stickiness

•  Provides HA •  Low Latency

DC-A DC-B

Intuit Proprietary & Confidential

CFP Schema

12

•  Represented as a graph –  Entity –  Relationships

•  Additional CF for indexes –  Inverted Indexes driven by schema

Entity User Entity

Document

Index CF

Intuit Proprietary & Confidential

Learning in Production

13

•  Monitor Heap Usage –  High and uneven CPU usage –  Add nodes if you can –  Reduce Bloom Filters –  Increase heap if you have to, don’t be scared

Before After

•  Monitor Data per Node – Most importantly keys per node •  Monitor disk IO

Intuit Proprietary & Confidential

The End

14

We are hiring. Contact @ [email protected]