Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

34
発表資料 Takuya Makino Saturday, March 23, 13

Transcript of Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Page 1: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

発表資料Takuya Makino

Saturday, March 23, 13

Page 2: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

紹介する論文

• Scalable Coordinate Descent Approached to Parallel Matrix Factorization for Recommender Systems (ICDM 2012)

• Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon

• Best Paperです

Saturday, March 23, 13

Page 3: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Motivation

• 行列分解 (Matrix factorization)は、行列の要素に欠損値がある場合、推薦システムにおいて良いテクニック

• web-scaleのデータを処理するための、並列・分散化が容易で、かつ効率的な行列分解の計算方法が必要

Saturday, March 23, 13

Page 4: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

Saturday, March 23, 13

Page 5: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 6: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)

観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 7: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

L2正則化

k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)

||・||_{F}は、フロベニウスノルムといい、行列の全要素の二乗の総和

観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 8: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

つまり• (推定に役に立たない素性の重みは0になるようにしつつ、)未観測な要素も含め、Aを近似行列WH^Tで推定できるように誤差を最小化W, Hを求める

• 制約なしの凸計画問題なのでStochastic

Gradient Descent (SGD)などの数値解法でW, Hを求める

• (1)が凸計画問題である証明はパス (See T村本)Saturday, March 23, 13

Page 9: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Coordinate Descent

• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法

• 変数を一つとみたときの目的関数は?

• どういう順番で変数を更新する?

Saturday, March 23, 13

Page 10: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Coordinate Descent

• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法

• 変数を一つとみたときの目的関数は?

• どういう順番で変数を更新する?実はここをうまく考えると計算量を削減できる!

Saturday, March 23, 13

Page 11: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

変数を一つとみたときの目的関数は?

(4)はw_{it}をzとした時の目的関数

Saturday, March 23, 13

Page 12: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

変数を一つとみたときの目的関数は?

(4)はw_{it}をzとした時の目的関数

(1)を、内積の中のw_{it}が関係している項をzに置き換えただけ

Saturday, March 23, 13

Page 13: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

式(4)を解くと

Saturday, March 23, 13

Page 14: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

式(4)を解くとk�

t=1

withjt

素直にz*を計算するとO(|Ω_i|k)

f ’(z)=0とおくと得られます

Saturday, March 23, 13

Page 15: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

residual matrix Rk�

t=1

withjtを毎回計算したくないのでRを保持

Saturday, March 23, 13

Page 16: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

パラメータの更新k�

t=1

withjt はここで保持されている

h_{jt}も同様にして更新可能

O(|Ω_i|k)から O(|Ω_i|)に

Saturday, March 23, 13

Page 17: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

更新の効率化

• residual matrix Rを保持することで計算時間が O(|Ω|k)から O(|Ω|)に

• ここは提案手法ではないです

Saturday, March 23, 13

Page 18: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

どういう順番で変数を更新する?

• Item/User-wise Update

• Feature-wise Update

( ) ( )1i or j

m or n

1 t k

1i or j

m or n

1 t k

Saturday, March 23, 13

Page 19: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Item/User-wise Update

( )

Saturday, March 23, 13

Page 20: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update観点を変えて、Aをk個の行列の積の総和と考える

t番目の素性によるm×n行列m×1行列と1×n行列の積はm×n行列

提案手法では、これを求めることを考えますSaturday, March 23, 13

Page 21: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

u, vを求めるsubproblem

とすると(15)は

と変形できるSaturday, March 23, 13

Page 22: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wiseの何がおいしいのかR̂ij = Rij + wtihtj

wit = wti hjt = htj なので注目しているtに関する項は下線部で相殺して消去される

つまり、u_iとv_jの更新のたびにR^を計算し直す必要がなくなる

= Aij �k�

t�=1

wit�hjt� + wtihtj

Saturday, March 23, 13

Page 23: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update

( )一度のsubploblemについて、Rの計算量はT CCD

iterations中の変数の計算量に比べてO(1/T)倍

Saturday, March 23, 13

Page 24: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update

( )一度のsubploblemについて、Rの計算量はT CCD

iterations中の変数の計算量に比べてO(1/T)倍

O(1 + 11 + 1

T

) = O(2T

T + 1) 倍速くなる

T回CCDをおこなうと、1回だけCCDをおこなった時より

Saturday, March 23, 13

Page 25: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 26: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

Saturday, March 23, 13

Page 27: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

12

rp

p個の小さなベクトルに分けて

Saturday, March 23, 13

Page 28: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

12

rp

p個の小さなベクトルに分けて並列で更新

(16)はu_iは他のuと独立Saturday, March 23, 13

Page 29: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 30: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

関連研究• Alternating Least Square (ALS)

Hを固定してWを求める、Wを固定してHを求める、を繰り返す

並列化は容易だけど計算量が多い

• Stochastic Gradient Descent (SGD)

計算量は少ないが並列化が難しい

収束が学習率に依存、性能が変数の更新の順序に依存

Saturday, March 23, 13

Page 31: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 32: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 33: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 34: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Conclusions

欠損があるAにおいて、CCD++ (Feature

wise-Update)は計算量が既存手法に比べて少なく、かつマルチコア環境、分散環境においてともに並列化が容易

Saturday, March 23, 13