Kdd15 - distributed personalization

Aug 11st, 2015Xu Miao, Lijun Tang, Yitong Zhou, Joel Young LinkedInChun-te Chu, MicrosoftAnmol Bhasin Groupon

Distributed Personalization

MotivationDistributed Learning

PersonalizationExperiments

Recommendation

Common Solution

Apps Tracking ETL

DM

Delivering

Common Solution -- Cold Start

Apps Tracking ETL

DM

Delivering

minutes

hours days

Apps

seconds

seconds

seconds

Common Solution -- Warm Start

Apps Tracking ETL

DM

Delivering

minutes

hours days

seconds

seconds

seconds

seconds

seconds

seconds

Bring ML Closer to Users

Apps Tracking ETL

DM

Delivering

minutes

hours days

Distributed Online Learning

▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the model w


▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the model w

▪ Challenges:– Users’ feedback data too few

▪ Distributed Learning


▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the models

▪ Challenges:– Users’ feedback data too few

▪ Distributed Learning– Everyone has different preferences

▪ Personalization

MotivationDistributed Learning

PersonalizationExperiments

▪Bulk Synchronous Parallel (Hadoop & Spark)– ~ Thousands of interactions to converge

Distributed Gradient Descent

▪Stale Synchronous Parallel [Ho and etc. 13’]– For some users, staleness is forever

Distributed Gradient Descent

What did I do?

▪Blessing– It is one of the key reasons for PGDs to converge

fast▪Challenge

– It goes diminished, and the data comes later has smaller and smaller impact

– Restart? Residue constant? Hard to manage

Learning Rate

Alternating Direction Method of Multipliers (ADMMs)

ADMMs -- Bulk Synchronous Parallel

ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]

timelinesV1

V1’

V1’’

t0

t1

t2


timelinesV1

V1’

V1’’

t0

t1

t2

t3

t4

V2

V1’’


Weighted Merge1

1

timelinesV1

V1’

t0

t1

t2V2 V3

t3

t4


Master Versions

timelines


▪Same convergence rate as Bulk Synchronous Parallel▪No learning rate

– Out-of-order sequences of mini-optimizations– Continuous Learning

MotivationDistributed LearningPersonalization

Experiments

Personalized Models

Personalized Models

▪The personalization strength:– Allow divergence of personal models from the

consensus model– Improve relevance– Improve convergence (speed)

MotivationDistributed LearningPersonalization

Experiments

Facial Expression Recognition

Job Recommendation

Conclusion

▪Asynchronous ADMMs– Continuous learning

▪Personalized Models– Fits users better– Improves convergence speed

Thank You and Questions

ADMMs -- Asynchronous Parallel

▪Delay variations – Weighted Merge (v.s. Stale Synchronous Parallel)– Flexible to handle non-stationary distribution

▪Crazy active users▪Passive important users▪Spammers

Kdd15 - distributed personalization

Presentations & Public Speaking

Transcript of Kdd15 - distributed personalization