Real-time recommendations for retail: Architecture, algorithms, and design

54
REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN Juliet Hougland and Jonathan Natkins

description

Juliet Hougland and Jonathan Natkins. Real-time recommendations for retail: Architecture, algorithms, and design. Who Are We?. Jonathan Natkins Field Engineer at WibiData Before that, Cloudera Software Engineer Before that, Vertica Software/Field Engineer. Juliet Hougland - PowerPoint PPT Presentation

Transcript of Real-time recommendations for retail: Architecture, algorithms, and design

Page 1: Real-time recommendations for retail:  Architecture, algorithms, and design

REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN

Juliet Hougland and Jonathan Natkins

Page 2: Real-time recommendations for retail:  Architecture, algorithms, and design

Who Are We?

Jonathan NatkinsField Engineer at WibiDataBefore that, Cloudera Software EngineerBefore that, Vertica Software/Field Engineer

Juliet HouglandData Scientist, previously at WibiDataMS in Applied MathBA in Math-Physics

Page 3: Real-time recommendations for retail:  Architecture, algorithms, and design

Recommendations in Retail

Personalized versus Non-Personalized

Page 4: Real-time recommendations for retail:  Architecture, algorithms, and design

Recommendations in Retail

Personalized versus Non-Personalized

Page 5: Real-time recommendations for retail:  Architecture, algorithms, and design

Recommendations in Retail

Personalized versus Non-Personalized

Page 6: Real-time recommendations for retail:  Architecture, algorithms, and design

Recommender ContextsTaste History

Based on everything you know about a userInterests over months/years

Current TasteBased on a user’s immediate historyInterests over minutes/hours

EphemeralExtreme version of current tasteFor example, location

Demographic*Similar to taste history, but less subjectiveGeographic region, age bracket, etc.

Page 7: Real-time recommendations for retail:  Architecture, algorithms, and design

Why Does Real-Time Matter?

Relevancy

Page 8: Real-time recommendations for retail:  Architecture, algorithms, and design

I am a Special Snowflake

Natty

Page 9: Real-time recommendations for retail:  Architecture, algorithms, and design

Requirements for a Real-Time System

General System RequirementsHandle millions of customers/usersSupport collection and storage of complex data

Static and event-series

Real-Time System RequirementsQuickly retrieve subsets of data for a single userAggregate/derive new, first-class data per user

Page 10: Real-time recommendations for retail:  Architecture, algorithms, and design

What is Kiji?

The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data

kiji.orggithub.com/kijiproject

Page 11: Real-time recommendations for retail:  Architecture, algorithms, and design

What is Kiji?

The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data

kiji.orggithub.com/kijiproject

Page 12: Real-time recommendations for retail:  Architecture, algorithms, and design

Three Challenges

Developing models for use in real-timeScoring models in real-timeDeploying models into a production environment

Page 13: Real-time recommendations for retail:  Architecture, algorithms, and design

How Can We Make Real-Time Models?

Population interests change slowly

Individual interests change quickly

Page 14: Real-time recommendations for retail:  Architecture, algorithms, and design

How Can We Make Real-Time Models?

Population interests change slowly

Individual interests change quickly

Models don’t need to be retrained

frequently

Page 15: Real-time recommendations for retail:  Architecture, algorithms, and design

How Can We Make Real-Time Models?

Population interests change slowly

Individual interests change quickly

Models don’t need to be retrained

frequently

Application of a model should be fast

Page 16: Real-time recommendations for retail:  Architecture, algorithms, and design

A Common Workflow

Train a model over the entire datasetSave fitted model parameters to a file or another tableAccess the model parameters when generating new recommendations based on new data

This is EXPENSIVE

Page 17: Real-time recommendations for retail:  Architecture, algorithms, and design

Developing Models

KijiExpressScala interface for interacting with Kiji dataUses Scalding for designing complex dataflows

Model LifecycleAllows analysts and data scientists to break apart a model into phases

Page 18: Real-time recommendations for retail:  Architecture, algorithms, and design

Scoring Models in Real-Time

Batch isn’t real-time

Page 19: Real-time recommendations for retail:  Architecture, algorithms, and design

Scoring Models in Real-Time

Batch isn’t real-time

Number ofUsers

Number of Interactions

Page 20: Real-time recommendations for retail:  Architecture, algorithms, and design

Scoring Models in Real-Time

Batch isn’t real-time

Number ofUsers

Number of Interactions

A few users withmany interactions

Page 21: Real-time recommendations for retail:  Architecture, algorithms, and design

Scoring Models in Real-Time

Batch isn’t real-time

Number ofUsers

Number of Interactions

A few users withmany interactions

A lot of users withfew interactions

Page 22: Real-time recommendations for retail:  Architecture, algorithms, and design

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase

Page 23: Real-time recommendations for retail:  Architecture, algorithms, and design

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase

Freshness Policy

Page 24: Real-time recommendations for retail:  Architecture, algorithms, and design

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase

Freshness PolicyYes, return to client

Page 25: Real-time recommendations for retail:  Architecture, algorithms, and design

Fresheners Compute Lazily

NO

Client

KijiScoring Server HBase

Read a column

Get from HBase

Freshness Policy

Scorer

Page 26: Real-time recommendations for retail:  Architecture, algorithms, and design

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase

Freshness Policy

ScorerYes, return to client

Write back for next time

Page 27: Real-time recommendations for retail:  Architecture, algorithms, and design

Kiji Application Stack

Page 28: Real-time recommendations for retail:  Architecture, algorithms, and design

Deployment Challenges

Page 29: Real-time recommendations for retail:  Architecture, algorithms, and design

Kiji Model Repository

Link between application and modelsStores Freshener metadata

FreshnessPolicy, Scorer, attached columnLocation of trained model

Stores Scorer codeCode repository makes model scoring code available to the application from a central location

New models can be deployed to the Model Repository and made immediately available to the application

Page 30: Real-time recommendations for retail:  Architecture, algorithms, and design

Kiji Model Repository

Page 31: Real-time recommendations for retail:  Architecture, algorithms, and design

Retail Recommendation

Page 32: Real-time recommendations for retail:  Architecture, algorithms, and design

Types of Recommenders

RecommendationAlgorithms

CollaborativeFilteringMethods

ContentBased

Methods

MemoryBased

ModelBased

Page 33: Real-time recommendations for retail:  Architecture, algorithms, and design

Content-Based Recommenders

Orange-Nosed

Lab Assistant

Meeps a lot

Build models around entities using features that we think reflect inherent characteristics

Page 34: Real-time recommendations for retail:  Architecture, algorithms, and design

Content-Based Recommenders

safer

faster knife

Page 35: Real-time recommendations for retail:  Architecture, algorithms, and design

Pandora: Content-Based

Expertly-CharacterizedMusic

Page 36: Real-time recommendations for retail:  Architecture, algorithms, and design

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Page 37: Real-time recommendations for retail:  Architecture, algorithms, and design

Aspirational Ratings

I put in my queue… I actually watch

Page 38: Real-time recommendations for retail:  Architecture, algorithms, and design

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Page 39: Real-time recommendations for retail:  Architecture, algorithms, and design

Simple aggregate predictors

Collaborative Filtering: How It WorksSimilar Users Similar Products

Page 40: Real-time recommendations for retail:  Architecture, algorithms, and design

Similar Entities

What do we mean by similar?Jaccard Index: a measure of set similarityCosine Similarity: the angle between two vectorsPearson Correlation: statistical measure, similar to cosine

Naively, we could compare every entity to each other

…But that would not scale will with increasing numbers of entities

Page 41: Real-time recommendations for retail:  Architecture, algorithms, and design

Building the Similarity Matrix

Page 42: Real-time recommendations for retail:  Architecture, algorithms, and design

Collaborative Filtering: Is This Useful?

Problem: Too much data!Tracking user preferences and all their events generates huge amounts of data

Problem: Too little data!Dimensions of user-space and item-space are usually very largeMore variables makes it more difficult to generate user preferences

Problem: Cold startIf you don’t know anything about a user, what should you recommend?

Problem: More ratings means slower computationsIdentifying neighborhoods of entities is expensive

Page 43: Real-time recommendations for retail:  Architecture, algorithms, and design

Collaborative Filtering: Why Is It Useful?

Because it worksContent-agnostic

All that matters is co-occurrence of events

Page 44: Real-time recommendations for retail:  Architecture, algorithms, and design

Amazon: Item-Item Collaborative Filtering

Used for personalized recommendationsFill screen real estate with related itemsProduces specific, but non-creepy recommendations

Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol.7, no.1, pp.76,80, Jan/Feb 2003

>

Page 45: Real-time recommendations for retail:  Architecture, algorithms, and design

Item-Item Collaborative Filtering

Beaker buys a banana slicerThen:

Generate list of candidate items to predict ratings forPredict ratings for candidate itemsSelect Top-N items

Page 46: Real-time recommendations for retail:  Architecture, algorithms, and design

Accessing External Data

KeyValueStore API enables external data access when applying a modelExternal data might be…

Trained model parametersHierarchical/Taxonomic dataGeo-lookup

Store external data flexiblyText files, sequence files, Kiji tables, etc.Data access is decoupled from use during execution

If the data doesn’t fit in memory, put it in a table

Page 47: Real-time recommendations for retail:  Architecture, algorithms, and design

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity

Page 48: Real-time recommendations for retail:  Architecture, algorithms, and design

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity

Page 49: Real-time recommendations for retail:  Architecture, algorithms, and design

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity

Ignore unrated items

Page 50: Real-time recommendations for retail:  Architecture, algorithms, and design

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity

Ignore dissimilar items

Page 51: Real-time recommendations for retail:  Architecture, algorithms, and design

How Much Less Work Can We Do?

If we only present a few recommendations, we don’t need to predict ratings for all itemsChoose your candidate set to estimate ratings wisely or infer from nearest neighbors

Page 52: Real-time recommendations for retail:  Architecture, algorithms, and design

Organizing Data in Item-Item CF

Page 53: Real-time recommendations for retail:  Architecture, algorithms, and design

Accessing Data During Freshening

Page 54: Real-time recommendations for retail:  Architecture, algorithms, and design

Want to Know More?

The Kiji Projectkiji.orggithub.com/kijiproject

Questions about this presentation?Twitter: @JulietHougland or @nattyiceEmail: [email protected]