µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai,...

27
HAWQ MADlib ݎവគଫአ Hong Wu: [email protected] Ruilong Huo: [email protected] Meetup @Shanghai, 2016/04/28

Transcript of µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai,...

Page 1: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

HAWQ MADlib

Hong Wu: [email protected] Ruilong Huo: [email protected] @Shanghai, 2016/04/28

Page 2: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Agenda

• RecSysReview

• RecommendationwithHAWQ

• Demo

• Q&A

2

Page 3: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Agenda

• RecSysReview

3

Page 4: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

What is RecSys

• Recommendation

• Morethansearchengine

• exploretheworldwithoutquery

• Personalization

• discoveritemsbasedonpreference

• Self-evolving

Page 5: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Industry Trends

Page 6: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

System Design

• Content-basedrecommendation

• featuresfromitemitself

• Collaborativefiltering

• user-usersimilarity&item-itemsimilarity

• Model-basedrecommendation

• factormodel

Page 7: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

System Design

• Content-basedrecommendation

• featuresfromitemitself

• Collaborativefiltering

• user-usersimilarity&item-itemsimilarity

• Model-basedrecommendation

• factormodel

Page 8: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Agenda

• RecSysReview

• RecommendationwithHAWQ

8

Page 9: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

System Overview

Import Training Data Matrix Factorization

Compute Item-Item

SimilaritiesMake Recommendation

Mod

el P

ipel

ine

Preparation Map original interactive data into user-item-rating tuples. Import into HAWQ Database.

Latent Factor Model Use lmf_igd_run function in MADLib to train user factors and item factors.

Similarity Apply CF strategy to compute similarity.

Recommendation Three ways to return(see later slides).

Page 10: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Preparation

• Userdata

• Itemdata

• Interactivedata

Page 11: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

• Userdata

• Itemdata

• Interactivedata• {(user,item):preferenceweight}

Preparation

Page 12: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Latent Factor Model

• Matrixfactorization• modeleachuser-itemasavectoroffactors

Page 13: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Latent Factor Model

Page 14: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Matrix Factorization in MADlib

SELECTmadlib.lmf_igd_run(output,—resulttabletostoremodelinput,—inputtablerow,—userattributenamecol,—itemattributenamevalue,—preferenceweightrow_dim,—thenumberofuserscol_dim,—thenumberofitemsmax_rank,—thenumberoffactorsstepsize,—learningrateiterations,—trainingroundstolerance);

Page 15: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Similarity

• WehaveuserfactorsUanditemfactorsV

• Applycollaborativefilteringstrategy

• computeitem-itemsimilarities

• lesscomputation,largescale

• gooddiversityvsoriginalCF

Page 16: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Similarity

CREATE TABLE item_sim AS ( SELECT ifacA.iid, ifacB.iid, madlib.cosine_similarity(ifacA.rating, ifacB.rating) as sim FROM ifactor AS ifacA, ifactor AS ifacB WHERE ifacA.iid < ifacB.iid sim DESC LIMIT K );

Page 17: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Recommendation

• Useitemsimilaritydirectly

radioFMbooklike-like

Page 18: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Recommendation

• Useuserfactoranditemfactordirectly

• dotproduct

Page 19: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Recommendation

• Useitemsimilaritytopredictrating

• weightedsum

Page 20: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

20

FeedStream

Playlist

MovieGuess

Page 21: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Agenda

• RecSysReview

• RecommendationwithHAWQ

• Demo

21

Page 22: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Try it Out

• https://github.com/xunzhang/xz_talk/tree/master/develop_recsys_with_hawq

Page 23: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Why Choose HAWQ to Build Machine Learning Application

• Simple&Efficient

• highlevelabstraction,notransfercost

• focusmoreonapplication’sownlogic

• dataflowthinking

• Highscalability&performance

• distributedmachinelearningwithMADlib

• parallelqueryenginewithHAWQ

23

Page 24: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Summary

• BriefintroofRecSys

• AhybridmodelpipelinebasedonHAWQ

24

Page 25: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Agenda

• RecSysReview

• RecommendationwithHAWQ

• Demo

• Q&A

25

Page 26: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Thanks Questions

Page 27: µ HAWQ ¾ MADlib O N 5xunzhangthu.org/talk/Develop_RecSys_with_HAWQ.pdf · Meetup @Shanghai, 2016/04/28. Agenda • RecSys Review • Recommendation with HAWQ • Demo • Q&A 2.

Backup Slide

SolutionofCFinlargescale:AB_similarity.

Storagedisasterofdotproductbasedonfactormodel:BuildBalltreeindex.