When Machine Learning Meets the Web
-
Upload
alfonso-roberts -
Category
Documents
-
view
25 -
download
0
description
Transcript of When Machine Learning Meets the Web
![Page 1: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/1.jpg)
When Machine Learning Meets the Web
Chao LiuInternet Services Research CenterMicrosoft Research-Redmond
![Page 2: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/2.jpg)
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 2
![Page 3: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/3.jpg)
Motivation & Challenges
Data on the Web Scale: terabyte-to-petabyte data
▪ Around 20TB log data per day from Bing Dynamics: evolving data streams
▪ Click data streams with evolving/emerging topics
Applications: Non-traditional ML tasks▪ Predicting clicks & ads
04/19/2023 3
![Page 4: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/4.jpg)
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 4
![Page 5: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/5.jpg)
Parallel vs. Distributed Computing
Parallel computing All processors have access to a shared
memory, which can be used to exchange information between processors
Distributed computing Each processor has its own private
memory (distributed memory), communicating over the network▪ Message passing ▪ MapReduce
04/19/2023 5
![Page 6: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/6.jpg)
MPI vs. MapReduce
MPI is for task parallelism Suitable for CPU-intensive jobs Fine-grained communication control,
powerful computation model
MapReduce is for data parallelism Suitable for data-intensive jobs A restricted computation model
04/19/2023 6
![Page 7: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/7.jpg)
Word Counting on MapReduce
7
Reducer
Aggregate values by keys
……
……Mapper
docs
(docId, doc) pairs
(w1,1)(w2,1)
(w3,1)
(w1,<1,1, 1>)
(w1, 3)
Mapper
docs
(docId, doc) pairs
(w1,1) (w3,1)
Mapper
docs
(docId, doc) pairs
(w1,1)(w2,1)
(w3,1)
Reducer
(w2,<1, 1>)
(w2, 2)
Reducer
(w3,<1,1,1>)
(w3, 3)
…
Web corpus on multiple machines
Mapper: for each word w in a doc, emit (w, 1)
Intermediate (key,value) pairs are aggregated by word
Reducer is copied to each machine to run over the intermediate data locally to produce the result
![Page 8: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/8.jpg)
Machine Learning on MapReduce
A big picture: Not Omnipotent but good enough
04/19/2023 8
Standard ML Algorithm Customized ML Algorithm
MapReduce Friendly
• Classification: Naïve Bayes, logistic regression, MART, etc• Clustering: k-means, NMF, co-clustering, etc• Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc
• PageRank• Click Models• Behavior Tageting
MapReduce Unfriendly
• Classification: SVM• Clustering: Spectrum clustering
• Learning-to-Rank
![Page 9: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/9.jpg)
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 9
![Page 10: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/10.jpg)
Classification: Naïve Bayes
P(C|X) P(C) P(X|C) =P(C)∏P(Xj|C)
10
……
Mapper
(x(i),y(i))
(j, xj(i),y(i))
(j, xj(i),y(i))
(j, xj(i),y(i))
Reduce on y(i)
P(C)
Reduce on j
P(Xj|C)(x(i),y(i)) Mapp
er
…………
![Page 11: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/11.jpg)
Clustering: Nonnegative Matrix Factorization [Liu et al., WWW2010]
Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] Interpretable dimensionality reduction [Lee & Seung, 1999] Document clustering [Shahnaz et al., 2006, Xu et al, 2006]
• Challenge: Can we scale NMF to million-by-million matrices
Am
n
WH
m
nkk
0,0,0 HWA
![Page 12: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/12.jpg)
NMF Algorithm [Lee & Seung, 2000]
Am
n
WH
m
nkk
0,0,0 HWA
![Page 13: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/13.jpg)
Distributed NMF
Data Partition: A, W and H across machines
A…
…
),,( , jiAji
W. . . . .
),( iwi
H
. . . . .
),( jhj
![Page 14: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/14.jpg)
Computing DNMF: The Big Picture
WAW
AWH
Y
XHH
T
T
*.*.
![Page 15: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/15.jpg)
… … …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
Map-IIIMap-IV
),0( WW T
Map-V
),0( iTi ww
…
),,,( jjj yxhj
…),( jyj
),(: iwiW ),(: jhjH
…
…
…
… ),( newjhj
Reduce-III
Reduce-V
![Page 16: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/16.jpg)
AWX T
… …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
),(: iwiW
…
…
…
X = WTA
![Page 17: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/17.jpg)
… …
Map-IIIMap-IV
),0( WW T
),0( iTi ww …),( jyj
),(: iwiW ),(: jhjH
Reduce-III WHWY T
m
ii
Ti
T wwWWC1
W
. . . . .
),( iwi
. . .
. . .
Y = WTWH
![Page 18: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/18.jpg)
…
),( jxj
Map-V
…
),,,( jjj yxhj
…),( jyj
),(: jhjH
…
… ),( newjhj
Reduce-V
H = H.*X/Y
![Page 19: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/19.jpg)
… … …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
Map-IIIMap-IV
),0( WW T
Map-V
),0( iTi ww
…
),,,( jjj yxhj
…),( jyj
),(: iwiW ),(: jhjH
…
…
…
… ),( newjhj
Reduce-III
Reduce-V
![Page 20: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/20.jpg)
Scalability w.r.t. Matrix Size
3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours
Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values
![Page 21: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/21.jpg)
General EM on MapReduce
Map Evaluate Compute
Reduce
04/19/2023 21
![Page 22: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/22.jpg)
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 22
![Page 23: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/23.jpg)
Click Modeling: Motivation
Clicks are good… Are these two
clicks equally “good”?
Non-clicks may have excuses: Not relevant Not examined
04/19/2023 23
![Page 24: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/24.jpg)
Eye-tracking User Study
2404/19/2023
![Page 25: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/25.jpg)
Bayesian Browsing Model [Liu et al., KDD2009]
query
URL1
URL2
URL3
URL4
C1 C2 C3 C4
S1 S2 S3 S4 Relevance
E1 E2 E3 E4
Examine Snippet
ClickThroughs
![Page 26: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/26.jpg)
Dependencies in BBM
S1
E1
E2
C1
S2
C2
…
…
…
Si
Ei
Ci
the preceding click position before i
i id i r
![Page 27: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/27.jpg)
Ultimate goal
Observation: conditional independence
Model Inference
![Page 28: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/28.jpg)
P(C|S) by Chain Rule
Likelihood of search instance
From S to R:
kC
![Page 29: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/29.jpg)
Putting Things Together
Posterior with
Re-organize by Rj’s
How many times dj
was clicked
How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r
1:nC
![Page 30: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/30.jpg)
What p(R|C1:n) Tells Us
Exact inference with joint posterior in closed form
Joint posterior factorizes and hence mutually independent
At most M(M+1)/2 + 1 numbers to fully characterize each posterior Count vector: 0 1 2 ( 1) 2( , , ,..., )M Me e e e e
![Page 31: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/31.jpg)
An Example
ComputeCount vector for
R4
r
0 0
0 0 0
0
0 1 2
d
3 2 1
0
N4
N4, r, d
1
1
![Page 32: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/32.jpg)
LearnBBM on MapReduce
Map: emit((q,u), idx)
Reduce: construct the count vector
![Page 33: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/33.jpg)
Example on MapReduce
(U1, 0)(U2, 4)(U3, 0)
Map
(U1, 1)(U3, 0)(U4, 7)
Map
(U1, 1)(U3, 0)(U4, 0)
Map
21 1 1( ) (1 )p R R R 2 2( ) 1 0.98p R R 3
3 3( )p R R 4 4 4( ) (1 )p R R R (U1, 0, 1, 1) (U2,
4)(U4, 0, 7)
(U3, 0, 0, 0)
Reduce
![Page 34: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/34.jpg)
Petabyte-Scale Experiment
Setup: 8 weeks data, 8
jobs Job k takes first k-
week data
• Experiment platform– SCOPE: Easy and Efficient Parallel Processing of
Massive Data Sets [Chaiken et al, VLDB’08]
![Page 35: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/35.jpg)
Scalability of BBM
Increasing computation load more queries, more urls, more impressions
Near-constant elapse time
Computation Overload Elapse Time on SCOPE
• 3 hours• Scan 265 terabyte
data• Full posteriors for
1.15 billion (query, url) pairs
![Page 36: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/36.jpg)
Large-scale Behavior Targeting [Ye et al., KDD2009]
Behavior targeting Ad serving based on users’ historical
behaviors Complementary to sponsored Ads and
content Ads
04/19/2023 36
![Page 37: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/37.jpg)
Problem Setting
Goal Given ads in a certain category, locate qualified users
based on users’ past behaviors
Data User is identified by cookie Past behavior, profiled as a vector x, includes ad clicks,
ad views, page views, search queries, clicks, etc
Challenges: Scale: e.g., 9TB ad data with 500B entries in Aug'08 Sparse: e.g., the CTR of automotive display ads is 0.05% Dynamic: i.e., user behavior changes over time.
04/19/2023 37
![Page 38: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/38.jpg)
Learning: Linear Poisson Model
CTR = ClickCnt/ViewCnt A model to predict expected click count A model to predict expected view count
Linear Poisson model
MLE on w
04/19/2023 38
![Page 39: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/39.jpg)
Implementation on MapReduce
Learning Map: Compute and Reduce: Update
Prediction
04/19/2023 39
![Page 40: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/40.jpg)
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 40
![Page 41: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/41.jpg)
Conclusions
Challenges imposed by Web data Scalability of standard algorithms Application-driven customized algorithms
Capability to consume huge amount of data outweighs algorithm sophistication Simple counting is no less powerful than sophisticated
algorithms when data is abundant or even infinite
MapReduce: a restricted computation model Not omnipotent but powerful enough Things we want to do turn out to be things we can do
04/19/2023 41
![Page 42: When Machine Learning Meets the Web](https://reader037.fdocuments.net/reader037/viewer/2022103100/56812ef0550346895d948c56/html5/thumbnails/42.jpg)
Q&A
Thank You!
04/19/2023 SEWM‘10 Keynote, Chengdu, China 42