Large scale recommendation in e-commerce -- qiang yan
-
Upload
qiang-yan -
Category
Data & Analytics
-
view
1.295 -
download
16
description
Transcript of Large scale recommendation in e-commerce -- qiang yan
Large Scale Recommenda/on in E-‐Commerce
Qiang Yan, Quan Yuan
Taobao Search&P13N Team
Alibaba Group
Outline
• Introduc@on – Data in Taobao – Recommenda@on in Taobao
• Approaches to recommenda@on – eTREC – Rank
• Lessons we learn • Conclusion & Challenges
Outline
• Introduc/on – Data in Taobao – Recommenda@on in Taobao
• Approaches to recommenda@on – eTREC – Rank
• Lessons we learn • Conclusion & Challenges
Largest online and mobile commerce company in the world
Data in Taobao
Item Discovery
P13n in Ver/cal Industry
My Taobao – Guess You Like
Recommenda@ons in taobao.com
Shop Discovery
Recommenda@ons in Taobao Mobile Flash Sale
Dress rec for female
P13n HQ REC
New Items REC
Powered by Recommenda/on
Recommenda@ons in Taobao Mobile
Powered by Recommenda/on
P13n in Ver@cal Industry
Shop Discovery
Item Rec
Outline
• Introduc@on – Data in Taobao – Recommenda@on in Taobao
• Approaches to recommenda@on – eTREC – Rank
• Lessons we learn • Conclusion & Challenges
PlaIorm
Match/Retrieval
Rank
Applica/on
TPP Tair Hbase UPS JStrom ODPS
RT CF Content User-‐Based DT LC
RTP Xlib Olive
Re-‐Rank
Diversity Freshness Business Goal
HomePage(PC)
Ver/cal (PC)
HomePage(mobile)
Shopping Path
Ver/cal (mobile)
Overview
PlaIorm
Match/Retrieval
Rank
Applica/on
TPP Tair Hbase UPS JStrom ODPS
RT CF Content User-‐Based DT LC
RTP Xlib Olive
Re-‐Rank
Diversity Freshness Business Goal
HomePage(PC)
Ver/cal (PC)
HomePage(mobile)
Shopping Path
Ver/cal (mobile)
Overview
eTREC
eTREC
Items
Users
Content(word、tag)
User Item
ItemCF
UserCF
ContentCF
Feature-‐Based CF
User Item Features
Tags
Style
Latent Class
….
A high efficient distributed feature-‐based collabora/ve filtering tool
Implementa@on trick 1 – Operators
Jaccard Cosine
eTREC
Implementa@ons trick 2 – Less Map/Reduce • NormAndDot • CalSim
– Less emiSed item-‐item pairs
NormAndDot Job
CalSim Job
feature_id en/ty_id Preference payload
en/ty_id norm(i) <j,dot(i ,j)> ….
en/ty_id <j, sim(i,j)> ….
ItemCF in Mahout
eTREC
• Features – Fast
• 400M users X 200M items in less than 20 mins
– Easy to use – Scalable
• User-‐defined similarity (Default: cosine,jaccard,asymcosine)
• User-‐defined item-‐item pairs
eTREC
Rank: Olive • Olive = Real-‐@me Streaming System + Online Learning • Why need Online Learning?
– User — User interests shi_ing — Mixture account、Family account
– Item — Millions of new items per day — 10M updated items(@tle,price .etc)
– Context — Promo@ons, Discounts .etc — Fes@vals: Na@onal days, 11.11
Olive
Goal n Make real-‐@me response to P/N feedback , and improve the user experiences
n More accurate recommenda@ons n Stable model
Model n FTRL n AdPredictor
16
Asynchronous Distributed OGD
Parameter Server (Tair/Hbase)
Reducer
Reducer
w
△ w
w
△ w
Updater
Updater
FG
FG
IG
IG
Data Shard
Data Shard
Data Shard
Data Shard
Strom/Jstorm TT/MetaQ
Framework
• FTRL-‐Proximal
Update with (sub-‐)gradient Updated models not far from previous L1-‐Norm L2-‐Norm
Olive -‐-‐ FTRL
3.9B samples,1.7B features,Pre-‐train 21mins
n Cold start ü OWLQN-‐LR based FTRL Pre-‐train Model 0.5
0.55
0.6
0.65
0.7
0.75
0 1 2 3 4 5 6 7 8 9 10 11 12
AUC
Hour
-‐3
-‐2.5
-‐2
-‐1.5
-‐1
-‐0.5
0
beta
n Stability ü Residual-‐based Cascading online train ü |w-‐w0| Constrain ü Mini-‐batch update
FTRL in Ac/on
Experiments
Samples(3.9B): Offline: pre-‐train offline FTRL model based on pv and click data in 14 days Online: in the following 4 days
Model: FTRL
0.65
0.67
0.69
0.71
0.73
0.75
0.77
20140604 20140605 20140605 20140607
GAUC
LR FTRL
n Accuracy
n Stability
-‐2.196
-‐2.194
-‐2.192
-‐2.19
beta
0.15
0.25
0.35 gender_comb_1_1
Experiments
10%+
Olive — AdPredictor
n AdPredictor -‐-‐ Not Sparse ü Pruning parameters
n Advantages: ü Bayesian Model(easy to add domain knowledge) ü Model uncertainty explicitly ü Natural explora/on
Outline
• Introduc@on – Data in Taobao – Recommenda@on in Taobao
• Approaches to recommenda@on – eTREC – Rank
• Lessons we learn • Conclusion & Challenges
Lessons we learn • Ways to improve recommender systems • Relevance vs. User experiences
• Mobile (Contextual) features is very important in ranking of recommenda@ons on the mobile
RT Item-‐CF Content User-‐CF
Relevance
User Experiences
Re-‐rank 10%
Rank 20%
Match 30%
Data 40%
Outline
• Introduc@on – Data in Taobao – Recommenda@on in Taobao
• Approaches to recommenda@on – eTREC – Rank
• Lessons we learn • Challenges
Challenges
• Heterogeneous data(search, social, poi, image .etc) for recommenda@on
• Mul@modal inputs : images, speech, QR code
• Context-‐aware and interac@ve recommenda@on
• Recommenda@on traffic alloca@on to a beSer ecommerce eco-‐system
Reference • T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich.Web-‐scale Bayesian click-‐through rate
predic/on for sponsored search adver/sing in microsols bing search engine. In Proc. 27th Internat. Conf. on Machine Learning, 2010.
• H. B. McMahan. Follow-‐the-‐regularized-‐leader and mirror descent: Equivalence theorems and L1 regulariza/on. In AISTATS, 2011.
• H. B. McMahan and O. Muralidharan. On calibrated predic/ons for auc/on selec/on mechanisms. CoRR,abs/1211.3955, 2012.
• Jing Jiang , Jie Lu , Guangquan Zhang , Guodong Long, Scaling-‐Up Item-‐Based Collabora/ve Filtering Recommenda/on Algorithm Based on Hadoop, Proceedings of the 2011 IEEE World Congress on Services, p.490-‐497, July 04-‐09, 2011
• Chu, W., L. Li, et al. (2011). "Contextual Bandits with Linear Payoff Func/ons." JMLR.
• Peter Auer. (2002). " Using confidence bounds for exploita/on /explora/on trade-‐offs." JMLR.
WE’RE HIRING Qiang Yan [email protected] Chang Liu [email protected]
Large Scale Recommenda/on in E-‐Commerce