Introduction of Online Machine Learning Algorithms
-
Upload
shao-yen-hung -
Category
Data & Analytics
-
view
74 -
download
5
Transcript of Introduction of Online Machine Learning Algorithms
![Page 1: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/1.jpg)
Paper Report for SDM course in 2016
Ad Click Prediction: a View from the Trenches(Online Machine Learning)
報 告 者:蔡宗倫、洪紹嚴、蔡佳盈日期: 2016/12/22
![Page 2: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/2.jpg)
2Final Presentation for SDM-2016
https://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/
![Page 3: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/3.jpg)
3Final Presentation for SDM-2016
![Page 4: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/4.jpg)
4Final Presentation for SDM-2016
READ DATA Time Memory
read.csv 264.5 (secs) 8.73 (GB)
fread 33.18 (secs) 2.98 (GB)
read.big.matrix 205.03 (secs) 0.2 (MB)
2GB 資料,四百萬筆資料, 200個變數
lm Time Memory
read.csv X X
fread X X
read.big.matrix 2.72 (mins) 83.6 (MB)
![Page 5: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/5.jpg)
5Final Presentation for SDM-2016
![Page 6: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/6.jpg)
6Final Presentation for SDM-2016
![Page 7: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/7.jpg)
7Final Presentation for SDM-2016
![Page 8: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/8.jpg)
8Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train • Memory• Time/Accuracy
Problem
• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)
• R-package: read.table, bigmemory, ff (GB)
• Online learning algorithms
Solutions
![Page 9: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/9.jpg)
9Final Presentation for SDM-2016
TG(2009, Microsoft)
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
AOGD(2007, IBM)
Adaptive online gradient descend Truncated Gradient
Online learning algorithms
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
![Page 10: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/10.jpg)
10Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
Newdata
Renewweights
• Memory• Time/accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
![Page 11: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/11.jpg)
11Final Presentation for SDM-2016
TG(2009, Microsoft)
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
AOGD(2007, IBM)
+¿ ¿
Online learning algorithms
Adaptive online gradient descend Truncated Gradient
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
![Page 12: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/12.jpg)
12
Online Gradient Descent-OGDKind of algorithms used on the online convex optimization
Can be formulated as a repeated game between a player and an adversary
At round , the player chooses an action from some convex subset , and then the
adversary chooses a convex loss function
A central question is how the regret grows with the number of rounds
of the game
Final Presentation for SDM-2016
![Page 13: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/13.jpg)
13
Online Gradient Descent-OGDZinkevich considered the following gradient descent algorithm, with step
size
Here,
Final Presentation for SDM-2016
![Page 14: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/14.jpg)
14
Forward-Backward Splitting (FOBOS)
(1)Loss function of Logistic Regression: =
=Batch gradient descend formula:Online gradient descend formula:
=η𝜕
𝑙 (𝑊 𝑡 ,𝑋 )𝜕𝑊 𝑡
Final Presentation for SDM-2016
![Page 15: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/15.jpg)
15
Forward-Backward Splitting (FOBOS)
Final Presentation for SDM-2016
(1)Loss function of Logistic Regression: =
=Batch gradient descend formula:Online gradient descend formula:
(2) FOBOS 的梯度下降公式,可以細分為兩部分: 前部分:微調發生在梯度下降的結果 () 附近 後部分:處理正則化,產生稀疏性
r(w) = (regularization functions)
=
![Page 16: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/16.jpg)
16Final Presentation for SDM-2016
(3) 要求得 (2) 最佳解的充分條件 : 0 屬於其 subgradient set 之中
(4) 因為 , (3) 可以改寫成:
(5) 換句話說,把 (4) 移項之後:
① 迭代前的狀態與梯度 backward
② 當次迭代的正則項資訊 forward
x
y
Forward-Backward Splitting (FOBOS)
![Page 17: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/17.jpg)
17
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
(A) :過去的累積梯度量(B) : regularization functions(C) : proximal: = learning rate ( 保證微調不會離 0 或已迭代後的解太遠 )
(non-smooth convex function) : certain subgradient of
![Page 18: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/18.jpg)
18
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
OGD 不夠稀疏 FOBOS 能產生更加好的稀疏特徵梯度下降類方法,精度比較好
RDA 可以在精度與稀疏性之間做更好的平衡稀疏性更加出色
最關鍵的不同點是累積 L1 懲罰項的處理方式FTRL-Proximal
綜合 FOBOS 的精度和 RDA 的稀疏性
![Page 19: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/19.jpg)
19Final Presentation for SDM-2016
![Page 20: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/20.jpg)
20Final Presentation for SDM-2016
f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F1 2 3 4
Per-Coordinate
![Page 21: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/21.jpg)
21Final Presentation for SDM-2016
f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F1 2 3 4
8 5 7 3
Per-Coordinate
![Page 22: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/22.jpg)
22Final Presentation for SDM-2016
f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F1 2 3 4
8 5 7 3
Per-Coordinate
![Page 23: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/23.jpg)
23Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
Newdata
Renew Weights(per-coordinate)
• Memory• Time/Accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
+¿
![Page 24: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/24.jpg)
24Final Presentation for SDM-2016
![Page 25: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/25.jpg)
25Final Presentation for SDM-2016
R package: FTRLProximal
![Page 26: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/26.jpg)
26Final Presentation for SDM-2016
![Page 27: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/27.jpg)
27Final Presentation for SDM-2016
![Page 28: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/28.jpg)
28Final Presentation for SDM-2016
![Page 29: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/29.jpg)
29Final Presentation for SDM-2016
![Page 30: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/30.jpg)
30Final Presentation for SDM-2016
![Page 31: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/31.jpg)
31Final Presentation for SDM-2016
![Page 32: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/32.jpg)
32Final Presentation for SDM-2016
![Page 33: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/33.jpg)
33Final Presentation for SDM-2016
![Page 34: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/34.jpg)
34Final Presentation for SDM-2016
https://www.kaggle.com/c/avazu-ctr-prediction
![Page 35: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/35.jpg)
35Final Presentation for SDM-2016
5.87GB
Prediction result
![Page 36: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/36.jpg)
36Final Presentation for SDM-2016
![Page 37: Introduction of Online Machine Learning Algorithms](https://reader035.fdocuments.net/reader035/viewer/2022062316/58ed895a1a28ab46518b4619/html5/thumbnails/37.jpg)
37Final Presentation for SDM-2016
[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009.
[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009.
[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010.
[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.
[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013.
[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007.
[7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.
Reference