Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

20
Bullseye P13n Platform April 7, 2014 Charles Bracher Bullseye Dev Manager Ranjan Sinha, PhD Lead Research Scientist Bullseye

description

We will describe the architecture of a personalization platform that captures customer profiles and behavioral data. A Cassandra cluster is used as an intermediate storage backend to replicate updates to profile records and timeline events across multiple data-centers. A caching tier serves up the user data and provides a real-time execution environment where predictive models can calculate propensities or update category histograms, etc.. We delve into metrics that are used to track replication performance and data freshness. We also discuss applications and features like user badges that are powered by this new P13N platform.

Transcript of Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Page 1: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Bullseye P13n Platform

April 7, 2014

Charles Bracher Bullseye Dev Manager

Ranjan Sinha, PhD Lead Research Scientist

Bullseye  

Page 2: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Outline

P13n Platform

Why Cassandra?

Cassandra Setup

Cassandra Usage

Cassandra Issues and Resolutions

Hand over to Ranjan for the Data Science Perspective

Bullseye

Page 3: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Bullseye

Bullseye Functional Architecture

Offline Analysis

Offline Database/ Batch Processing

Recent User Data 1-5 days

(Cassandra)

Real Time Model Evaluation & Caching

(sharded/full user state in memory)

Client Access

Near Real Time Event Collection

Tracking

Long Term User Data

(Local SSD)

Page 4: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Why Cassandra?

Great write performance

Great replication performance

Reasonable read performance

Reasonable cost

Client controlled consistency settings

Bullseye

Page 5: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Cassandra Setup

Cassandra Version 1.2.9

We use Replication

–  Cassandra rings deployed to 3 datacenters

Cassandra clients

–  We use both the Datastax Java and C++ Beta clients

Using CQL Table specifications and commands

Not on SSDs

Bullseye

Page 6: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Cassandra Usage

Column Family Design:

– Avoid Tombstones

– Avoid Compaction

With Focus on Short Term Storage:

– Turn off automatic compaction / only manual compaction

– Use unique column key names to avoid tombstones

– Clear out old data with truncation

Bullseye

Page 7: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Cache Miss Flow (New Session)

Bullseye

CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); Write to active day column family with key user id. Truncate the oldest day column family. When going from one day to the next, do a manual compaction for the old day. On read, pull user id info from all col. families newer than the local SSD data.

Page 8: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Queuing Flow (Ongoing Activity)

Bullseye

CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME)); Read/Write from active hour with key timestamp rounded to nearest second Store the column family one hour old to offline DB Truncate the column family two hours old Do async probe of record for current second as well as recent seconds till state is captured. Data may be read 1-3 times. More if replication is lagging.

Page 9: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Cassandra Issues and Resolutions

Issues with C++ Datastax Cassandra beta client

– open sourced, so could apply fixes

Performance issues with the cache miss query

– increased heap size

– reduced replication factor

– turned off cross colo read repair

– deployed data center aware policy for C++

Bullseye

Page 10: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Personalization Applications

Ranjan Sinha, PhD Lead Research Scientist

April 7, 2014

Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.

Page 11: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Outline

Why Personalize? P13N Platform

– Introduction

– Conceptual architecture

– Modeling stages

P13N Applications – User badges – Search ranking

– Contextual models – Deals

Personalization Applications

Page 12: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Why Personalize?

Enable more relevant experience

Retention of existing users

New user acquisition

Reactivating churned users

Increasing activity per user

Improving conversion from visits to transactions

Personalization Applications

Page 13: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

P13N Platform: Introduction

Maintains activity timeline information

Enables event processing at near real-time

Enables in-session personalization

Provides environment for predictive model evaluation

Backup and restore to and from Hadoop/HBase

Personalization Applications

Page 14: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

P13N Platform: Conceptual Architecture

Personalization Applications

Tracking Event Source

m1 m3 m2 ….

Model Executor

Filters and forwards events

Activity Timeline

+ User Badges

In-memory Cache + Model

Evaluation

CEP Processor

Client Access

Hadoop/HBase

Offline Modeling Platform

User Badges

mn

Cassandra

Page 15: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

P13N Platform: Modeling stages

Realtime

– In-session user intent

– Contextual Models

Nearline

– Update propensity models (aka User Badges)

Offline

– Bootstrap propensity models by mining long-term behavior history

Personalization Applications

Page 16: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Application (1): User Badges

Personalization Applications

Name Description SaleType Auction vs. Buy-it-now

ItemCondition New vs. Used

Category Preference of categories

Price Price range of purchasing activity

Deals Propensity to purchase deals

Social Share Propensity to share items in social media

Profile based on long-term behavior

Page 17: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Application (2): Search Ranking …

Should all queries be personalized in the same manner? – For some queries (ebay or google), everyone would like the same results

– For other queries, different people may want completely different results

Personalization Applications

Query: “big ben puzzles”

Not_P13N Rank

P13N Rank

Sold IsNew Title

1 1 No No LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500 PUZZLES EUC

2 3 No Yes 1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan North Cascades National Park WA

3 2 Yes No COMPLETE Fishing Village,Smalls Island MB Big Ben Puzzle 1000 Piece Puzzle Size!

User: always buys used items

Page 18: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Application (3): Contextual models …

Personalization Applications

Infer categories that user is interested in within the current session

Long and Short term behavior – Historic behavior may provide benefits at the start of the session

– Short-term behavior may contribute gains in an extended search session

– Combination of session and historic behavior may outperform using either alone

e2

t

Nearline, after session expiry

Online, in-session

Offline, historical

e3 e1 …events… e1

Event source

Page 19: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

Application (4): Deals

Personalization Applications

Personalize categories

Personalize modules

Personalize tabs

Personalize items

Page 20: Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

fin Personalization Applications