Download - Cassandra at eBay - Cassandra Summit 2013

Transcript
Page 1: Cassandra at eBay - Cassandra Summit 2013

Cassandra @ eBay Jay Patel Architect, Platform Systems @pateljay3001

Page 2: Cassandra at eBay - Cassandra Summit 2013

eBay Marketplaces

Thousands of servers Petabytes of data

Billions of SQLs/day 24x7x365 99.98+% Availability

turning over a TB every second Multiple Datacenters

Near-Real-time Always online

400+ million items for sale

$75 billion+ per year in goods are sold on eBay

Big Data

112 million active users

Billions of page views/day

Page 3: Cassandra at eBay - Cassandra Summit 2013

3

eBay Site Data Infrastructure

Don’t force! One size does not fit all.

It’s a mixture of multiple SQL & NoSQL databases. We use the right database for the right problem.

Page 4: Cassandra at eBay - Cassandra Summit 2013

eBay Site Data Infrastructure A heterogeneous mixture

Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned

Hundreds of nodes Persistent & in-memory > 40 billion SQLs/day

10+ clusters, 100+ nodes > 250 TB provisioned (local HDD + shared SSD) > 9 billion writes/day > 5 billion reads/day

Hundreds of nodes > 50 TB > 2 billion ops/day

Thousands of nodes The world largest cluster with 2K+ nodes

Dozens of nodes

Page 5: Cassandra at eBay - Cassandra Summit 2013

How do we scale RDBMS?

Shard

– Patterns: Modulus, lookup-based, range, etc.

– Application sees only logical shard/database

Replicate

– Disaster recovery, read availability & read scalability

Big NOs

– No transactions

– No joins

– No referential integrity constraints

5

Page 6: Cassandra at eBay - Cassandra Summit 2013

Why Cassandra?

Multi-datacenter (active-active)

Always Available - No SPOF

Easy to scale up & down

6

Write performance

Distributed counters

Hadoop support

Not replacing RDBMS, but complementing!

Some use cases don’t fit well in RDBMS - sparse data, big data, flexible schema, real-time analytics, …

Many use cases don’t need top-tier set-ups.

Page 7: Cassandra at eBay - Cassandra Summit 2013

Cassandra Growth

Au

g, 2

01

1

Au

g, 2

01

2

May

, 20

13

1

2

3

4

5

6

7

Billions (per day)

writes async. reads

sync. site reads

Terabytes

50

100

200

250

300

350 storage capacity

Doesn’t predict business

7

Page 8: Cassandra at eBay - Cassandra Summit 2013

eBay Use Cases on Cassandra Time-series data, real-time insights & immediate actions

• Fraud detection & prevention

• Quality Click Pricing for affiliates

• Order & shipment tracking and insights

• Mobile notification logging & tracking

• Cloud CMS change history storage

• RedLaser server logs and analytics

Server metrics collection for monitoring & alerting

Taste graph based next-gen recommendation system

Personalization Data Service

Social Signals on eBay Product & Item pages

Milo’s store-item availability inventory (evaluation phase)

8

Page 9: Cassandra at eBay - Cassandra Summit 2013

Real-time insights & actions for

9

Fraud Prevention Reporting

Quality Click Pricing More…

Page 10: Cassandra at eBay - Cassandra Summit 2013

10

System Overview

Business Event Stream

Checkout Shipping Refund & Recoup …

Order placed (bin/bid)

Paid Shipped Refunded

Raw

dat

a

Simple in-memory aggregations +/ Complex Event Processing +/

Cassandra’s distributed counters

Label printed per day per user User segmentation for affiliate pricing Orders per hour, …

Multiple Cassandra clusters

Payment

Act

in r

eal-

tim

e

Fraud Prevention

Affiliate Pricing Engine (eBay Partner Network)

Order tracking

Real-time reporting

… (Kept from several months to years)

Page 11: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

11

Historic & real-time insights per user per carrier. Sudden & drastic change might be suspicious.

User bucketing based on historic & real-time buying activity.

Page 12: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

12

Page 13: Cassandra at eBay - Cassandra Summit 2013

Fraud Detection & Prevention

13

Shop with Confidence

Page 14: Cassandra at eBay - Cassandra Summit 2013

System Overview

14

Cassandra

Fraud Detection & Prevention System

Sign

-in

in

fo

Business events (checkout, sell,…)

StaaS Oracle

Checkout Shipping … Payment Selling

Real-time Beacons data

Real-time Insights

Other data Machine

Learned Models

Page 15: Cassandra at eBay - Cassandra Summit 2013

15

A glimpse on Data Model

Collected at sign-in & stored as key-value.

Pulled periodically to StaaS for training machine learned models.

Page 16: Cassandra at eBay - Cassandra Summit 2013

Metrics collection for monitoring & alerting

16

Page 17: Cassandra at eBay - Cassandra Summit 2013

System Overview

17

Transport (HTTP, …)

Scalable NIO servers based on Netty

Thousands of production machines

Cassandra

Stats for CPU, Memory, Disk, ..

agent agent agent agent …

Server

Server

Server

Server

Server In-memory grid (hazelcast) for rollups

Page 18: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

18

Granular data points

Rolled up metrics for various time intervals

Page 19: Cassandra at eBay - Cassandra Summit 2013

Taste graph based recommendation system

19

Page 20: Cassandra at eBay - Cassandra Summit 2013

Data Model

20

Tast

e G

rap

h

Tast

e V

ect

or

50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD

Page 21: Cassandra at eBay - Cassandra Summit 2013

System Overview

21

Business Event Stream

Recommendation system

Taste Graph Taste Vector

1. Item purchased.

2a. Write purchase edge. 2b. Read other edges for this user & item.

4. Req. recommendations.

5. Finds other items close to user’s coordinates.

6. Reco. shown to user

More, http://www.slideshare.net/planetcassandra/e-bay-nyc

Page 22: Cassandra at eBay - Cassandra Summit 2013

Real-time Personalization Data Service

22

User performs search using keyword User gets personalized pages based on implicit/explicit profile

Page 23: Cassandra at eBay - Cassandra Summit 2013

System Overview

23

Personalization Data Service

CacheMesh (write-back cache)

Heavy writes

eBay site pages (personalized)

Every few mins

in-memory MySQL & XMP DB

Cassandra Oracle (scaled out)

Hea

vy r

ead

s

Cache miss

user profiles

Application SOA services (multiple)

Data Warehouse

Page 24: Cassandra at eBay - Cassandra Summit 2013

Data Model

24

• Keep column names short. • Don’t overload one CF with all the data:

- Split hot & cold data in separate CF. - Splitting & sharding can help compaction.

Static column families

Page 25: Cassandra at eBay - Cassandra Summit 2013

25

Served by Cassandra

Social Signals

Page 27: Cassandra at eBay - Cassandra Summit 2013

Multi-Datacenter Deployment

27

Topology - NTS RF - 1:1 or 2:2 or 3:3 Read CL - ONE/QUORUM Write CL - ONE

Data is backed up periodically to protect against human or software error

User request has no datacenter affinity

Non-sticky load balancing

Page 28: Cassandra at eBay - Cassandra Summit 2013

Multi-Datacenter Deployment

Topology - NTS RF – 1:1:1 or 2:2:2

Page 29: Cassandra at eBay - Cassandra Summit 2013

Lessons & Best Practices

• One size does not fit all

– Use Cassandra for the right use cases.

• Choose proper Replication Factor and Consistency Level

– They alter latency, availability, durability, consistency and cost.

– Cassandra supports tunable consistency, but remember strong consistency is not free.

• Many ways to model data in Cassandra

– The best way depends on your use case and query patterns.

• De-normalize and duplicate for read performance

– But don’t de-normalize if you don’t need to.

http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices

29

Page 30: Cassandra at eBay - Cassandra Summit 2013

Are you excited? Come Join Us!

30

Thank You @pateljay3001

#cassandra13