Retail Reference Architecture Part 3: Scalable Insight Component Providing User History,...

34
Retail Reference Architecture with MongoDB Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal

description

During this session we will cover the best practices for implementing the insight component with MongoDB. This includes efficiently ingesting and managing a large volume of user activity logs, such as clickstreams, views, likes and sales. We'll dive into how you can derive user statistics, product maps and trends using different analytics tools like the aggregation framework, map/reduce or the Hadoop connector. We will also cover operational considerations, including low-latency data ingestion and seamless aggregation queries.

Transcript of Retail Reference Architecture Part 3: Scalable Insight Component Providing User History,...

Page 1: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Retail Reference Architecturewith MongoDB

Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal

Page 2: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Introduction

Page 3: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

4

• it is way too broad to tackle with one solution• data maps so well to the document model• needs for agility, performance and scaling• Many (e)retailers are already using MongoDB• Let's define the best ways and places for it!

Retail solution

Page 4: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

5

• Holds complex JSON structures• Dynamic Schema for Agility• complex querying and in-place updating• Secondary, compound and geo indexing• full consistency, durability, atomic operations• Near linear scaling via sharding• Overall, MongoDB is a unique fit!

MongoDB is a great fit

Page 5: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

6

MongoDB Strategic Advantages

Horizontally Scalable-Sharding

AgileFlexible

High Performance &Strong Consistency

Application

HighlyAvailable-Replica Sets

{ customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}

Page 6: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

7

build your data to fit your applicationRelational MongoDB

{ customer_id : 1,name : "Mark Smith",city : "San Francisco",orders: [ {

order_number : 13,store_id : 10,date: “2014-01-03”,products: [

{SKU: 24578234,

Qty: 3, Unit_price:

350},{SKU:

98762345, Qty: 1, Unit_Price:

110}]

},{ <...> }

]}

CustomerID First Name Last Name City0 John Doe New York1 Mark Smith San Francisco2 Jay Black Newark3 Meagan White London4 Edward Danields Boston

Order Number Store ID Product Customer ID10 100 Tablet 011 101 Smartphone 012 101 Dishwasher 013 200 Sofa 114 200 Coffee table 115 201 Suit 2

Page 7: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

8

Notions

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Page 8: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Retail Components Overview

Page 9: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

10

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Architecture Overview

Customer

ChannelsAmazon

Ebay…

StoresPOSKiosk

MobileSmartphone

Tablet

Website

Contact Center

APIData and Service

Integration

SocialFacebook

Twitter…

Data Warehouse

Analytics

Supply Chain Management

System

Suppliers

3rd Party

In Network

Web Servers

Application Servers

Page 10: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

11

Commerce Functional Components

Information Layer

Look & Feel

Navigation

Customization

Personalization

Branding

Promotions

Chat

Ads

Customer's Perspective

ResearchBrowseSearch

SelectShopping Cart

PurchaseCheckout

ReceiveTrack

UseFeedbackMaintain

DialogAssist

Market / Offer

Guide

Offer

Semantic Search

Recommend

Rule-based Decisions

Pricing

Coupons

Sell / FullfillOrders

Payments

Fraud Detection

Fulfillment

Business Rules

InsightSession CaptureActivity

Monitoring

Customer Enterprise

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Page 11: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Merchandising

Page 12: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

13

Merchandising

Merchandising

MongoDB

Variant

Hierarchy

Pricing

Promotions

Ratings & Reviews

Calendar

Semantic Search

Item

Localization

Page 13: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

14

• Single view of a product, one central catalog service

• Read volume high and sustained, 100k reads / s

• Write volume spikes up during catalog update

• Advanced indexing and querying

• Geographical distribution and low latency

• No need for a cache layer, CDN for assets

Merchandising - principles

Page 14: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

15

Merchandising - requirements

Requirement Example Challenge MongoDB

Single-view of product Blended description and hierarchy of product to ensure availability on all channels

Flexible document-oriented storage

High sustained read volume with low latency

Constant querying from online users and sales associates, requiring immediate response

Fast indexed querying, replication allows local copy of catalog, sharding for scaling

Spiky and real-time write volume

Bulk update of full catalog without impacting production, real-time touch update

Fast in-place updating, real-time indexing, , sharding for scaling

Advanced querying Find product based on color, size, description

Ad-hoc querying on any field, advanced secondary and compound indexing

Page 15: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

16

Merchandising - Product Page

Product images

General Informatio

n

List of Variants

External Informatio

n

Localized Descriptio

n

Page 16: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

17

> db.item.findOne(){ _id: "301671", // main item id

department: "Shoes",category: "Shoes/Women/Pumps",brand: "Guess",thumbnail: "http://cdn…/pump.jpg",image: "http://cdn…/pump1.jpg", // larger version of

thumbnailtitle: "Evening Platform Pumps",description: "Those evening platform pumps put the perfect

finishing touches on your most glamourous night-on-the-town outfit",

shortDescription: "Evening Platform Pumps",style: "Designer",type: "Platform",rating: 4.5, // user ratinglastUpdated: Date("2014/04/01"), // last update time… }

Merchandising - Item Model

Page 17: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

18

• Get item by id

db.definition.findOne( { _id: "301671" } )

• Get item from Product Ids

db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )

• Get items by department

db.definition.find({ department: "Shoes" })

• Get items by category prefix

db.definition.find( { category: /^Shoes\/Women/ } )

• Indices

productId, department, category, lastUpdated

Merchandising - Item Definition

Page 18: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

19

> db.variant.findOne(){

_id: "730223104376", // the skuitemId: "301671", // references item idthumbnail: "http://cdn…/pump-red.jpg", // variant

specificimage: "http://cdn…/pump-red.jpg",size: 6.0,color: "Red",width: "B",heelHeight: 5.0,lastUpdated: Date("2014/04/01"), // last update time…

}

Merchandising – Variant Model

Page 19: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

20

• Get variant from SKU

db.variation.find( { _id: "730223104376" } )

• Get all variants for a product, sorted by SKU

db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )

• Indices

productId, lastUpdated

Merchandising – Variant Model

Page 20: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

22

Per store Pricing could result in billions of documents,unless you build it in a modular

way

Price: {_id: "sku730223104376_store123",currency: "USD",price: 89.95,lastUpdated: Date("2014/04/01"), // last update time…

}

_id: concatenation of item and store.Item: can be an item id or skuStore: can be a store group or store id.

Indices: lastUpdated

Merchandising – per store Pricing

Page 21: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

23

• Get all prices for a given item

db.prices.find( { _id: /^p301671_/ )

• Get all prices for a given sku (price could be at item level)

db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])

• Get minimum and maximum prices for a sku

db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },

max: { $max : price} } })

• Get price for a sku and store id (returns up to 4 prices)

db.prices.find( { _id: { $in: [ "sku730223104376_store1234",

"sku730223104376_sgroup0",

"p301671_store1234",

"p301671_sgroup0"] , { price: 1 })

Merchandising – per store Pricing

Page 22: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

26

Merchandising – Browse and Search products

Browse by category

Special Lists

Filter by attributes

Lists hundreds of item

summaries

Ideally a single query is issued to the database to obtain all items and metadata to display

Page 23: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

27

The previous page presents many challenges:

• Response within milliseconds for hundreds of items

• Faceted search on many attributes: category, brand, …

• Attributes at the variant level: color, size, etc, and the variation's image should be shown

• thousands of variants for an item, need to de-duplicate

• Efficient sorting on several attributes: price, popularity

• Pagination feature which requires deterministic ordering

Merchandising – Browse and Search products

Page 24: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

28

Merchandising – Browse and Search products

Hundreds of sizes

One Item

Dozens of colors

A single item may have thousands of variants

Page 25: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

29

Merchandising – Browse and Search products

Images of the matching variants are displayed

Hierarchy Sort parameter

Faceted Search

Page 26: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

30

Merchandising – Traditional Architecture

Relational DBSystem of Records

Full Text SearchEngine

Indexing

#1 obtain search

results IDs

ApplicationCache

#2 obtain objects by

ID

Pre-joined into objects

Page 27: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

31

The traditional architecture issues:

• 3 different systems to maintain: RDBMS, Search engine, Caching layer

• search returns a list of IDs to be looked up in the cache, increases latency of response

• RDBMS schema is complex and static• The search index is expensive to update

• Setup does not allow efficient pagination

Merchandising – Traditional Architecture

Page 28: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

32

MongoDB Data Store

Merchandising - Architecture

SummariesItems Pricing

PromotionsVariants Ratings & Reviews

#1 Obtain results

Page 29: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

33

The summary relies on the following parameters:

• department e.g. "Shoes"

• An indexed attribute

– Category path, e.g. "Shoes/Women/Pumps"

– Price range

– List of Item Attributes, e.g. Brand = Guess

– List of Variant Attributes, e.g. Color = red

• A non-indexed attribute

– List of Item Secondary Attributes, e.g. Style = Designer

– List of Variant Secondary Attributes, e.g. heel height = 4.0

• Sorting, e.g. Price Low to High

Merchandising – Summary Model

Page 30: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

34

> db.summaries.findOne(){ "_id": "p39", "title": "Evening Platform Pumps 39", "department": "Shoes", "category": "Shoes/Women/Pumps", "thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg", "price": 145.99, "rating": 0.95, "attrs": [ { "brand" : "Guess"}, … ], "sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …], "vars": [ { "sku": "sku2441", "thumbnail": "http://cdn…/pump-small-39.jpg.Blue", "image": "http://cdn…/pump-39.jpg.Blue", "attrs": [ { "size": 6.0 }, { "color": "Blue" }, …], "sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …], }, … Many more skus … ] }

Merchandising – Summary Model

Page 31: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

35

• Get summary from item iddb.variation.find({ _id: "p301671" })

• Get summary's specific variation from SKUdb.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )

• Get summary by department, sorted by ratingdb.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )

• Get summary with mix of parametersdb.variation.find( { department : "Shoes" ,

"vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" :

180.99 } } )

Merchandising - Summary Model

Page 32: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

36

Merchandising – Summary Model

• The following indices are used:– department + attr + category + _id– department + vars.attrs + category + _id– department + category + _id– department + price + _id– department + rating + _id

• _id used for pagination• Can take advantage of index intersection• With several attributes specified (e.g. color=red

and size=6), which one is looked up?

Page 33: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

37

Facet samples:{ "_id" : "Accessory Type=Hosiery" , "count" : 14}{ "_id" : "Ladder Material=Steel" , "count" : 2}{ "_id" : "Gold Karat=14k" , "count" : 10138}{ "_id" : "Stone Color=Clear" , "count" : 1648}{ "_id" : "Metal=White gold" , "count" : 10852}

Single operations to insert / update:db.facet.update( { _id: "Accessory Type=Hosiery" },

{ $inc: 1 }, true, false)

The facet with lowest count is the most restrictive…It should come first in the query!

Merchandising – Facet

Page 34: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

38

Merchandising – Query stats

Department Category Price Primary attribute

Time Average (ms)

90th (ms) 95th (ms)

1 0 0 0 2 3 3

1 1 0 0 1 2 2

1 0 1 0 1 2 3

1 1 1 0 1 2 2

1 0 0 1 0 1 2

1 1 0 1 0 1 1

1 0 1 1 1 2 2

1 1 1 1 0 1 1

1 0 0 2 1 3 3

1 1 0 2 0 2 2

1 0 1 2 10 20 35

1 1 1 2 0 1 1