Apache Mahout Algorithms

55
Mahout Algorithms Mahmut Karakaya

description

 

Transcript of Apache Mahout Algorithms

Page 1: Apache Mahout Algorithms

Mahout AlgorithmsMahmut Karakaya

Page 2: Apache Mahout Algorithms

Agenda- Introduction- Collaborative Filtering- Map/Reduce- Clustering- Demo

Page 3: Apache Mahout Algorithms

What mahout meansElephant rider in Hindi

Page 4: Apache Mahout Algorithms

What Apache Mahout is- Java, Hadoop- Collaborative Filtering- Mahout In Action- [email protected] 0.9 (1-Feb-2014)

Page 5: Apache Mahout Algorithms

Who uses Mahout

Page 6: Apache Mahout Algorithms

Mahout in Apache Foundation

Page 7: Apache Mahout Algorithms

overstock.com saves $2m a year

Judd Bagley Saum Noursalehi

Page 8: Apache Mahout Algorithms

Others- Weka (Machine Learning Library)- Lenskit (Grouplens)- EasyRec (RestAPI)- Write yourself:)

Page 9: Apache Mahout Algorithms

Need to know ML?

Page 10: Apache Mahout Algorithms

Need to know ML?hadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

Page 11: Apache Mahout Algorithms

Data Model (u,i,r)

Page 12: Apache Mahout Algorithms

Similarity

Page 13: Apache Mahout Algorithms

Cosine Similarity

Page 14: Apache Mahout Algorithms

Cosine Similarity

Page 15: Apache Mahout Algorithms

Collaborative Filtering- Data format = userId, itemId, rating- Create Model + Predict

Page 16: Apache Mahout Algorithms

Item Based - Similarity Matrix (Item-Item)

Page 17: Apache Mahout Algorithms

Item Based - Predict- Weighted Sum:

r^(3,1) = 2 * 0.91 + ...

Page 18: Apache Mahout Algorithms

Item Based

Page 19: Apache Mahout Algorithms

Item Based.. Why in Mahout

- Generic recommender like User Based- User Based similarity matrix is heavier

Page 20: Apache Mahout Algorithms

Singular Value Decomposition (SVD)

Page 21: Apache Mahout Algorithms

SVDRecommeder

Page 22: Apache Mahout Algorithms

Factorization

Page 23: Apache Mahout Algorithms

Factorizer

Page 24: Apache Mahout Algorithms

Singular Value Decomposition (SVD)

Page 25: Apache Mahout Algorithms

m * n → m * k + n * k 10M → 100K + 10K

Lets say; m=10Kn = 1Kk=10

Singular Value Decomposition (SVD)

Page 26: Apache Mahout Algorithms

SVD k=3 λ=0.1 a=40 c.a=1

Page 27: Apache Mahout Algorithms

SVD k=3 λ=0.1 a=40 c.a=1

Page 28: Apache Mahout Algorithms

SVD k=3 λ=0.1 a=40 c.a=10

Page 29: Apache Mahout Algorithms

SVD.. Why in Mahout- Won Netflix Prize- Parallelizable by row, column

Page 30: Apache Mahout Algorithms

Map / Reduce Mapper1.txt 2.txtHello HelloHello

Page 31: Apache Mahout Algorithms

Map / Reduce Mapper

Page 32: Apache Mahout Algorithms

Map / Reduce MapperMap1 Map2

Hello,1 Hello,1Hello,1

Page 33: Apache Mahout Algorithms

Map / Reduce Reducer

Page 34: Apache Mahout Algorithms

Map / Reduce ReducerHello,3

Page 35: Apache Mahout Algorithms

Map / Reduce ItemBased

Page 36: Apache Mahout Algorithms

Map / Reduce ItemBasedhadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

Page 37: Apache Mahout Algorithms

Map / Reduce ItemBased

Page 38: Apache Mahout Algorithms

Map / Reduce ItemBased

Page 39: Apache Mahout Algorithms

Map / Reduce ItemBasedMap 1

Page 40: Apache Mahout Algorithms

Map / Reduce ItemBasedReduce 1

Page 41: Apache Mahout Algorithms

Map / Reduce ItemBasedReduce 1

Page 42: Apache Mahout Algorithms

Map / Reduce ItemBasedMap 2

Page 43: Apache Mahout Algorithms

Map / Reduce ItemBasedReduce 2

Page 44: Apache Mahout Algorithms

Map / Reduce ItemBased

Page 45: Apache Mahout Algorithms

Map / Reduce.. Why in Mahout

Page 46: Apache Mahout Algorithms

Clustering- KMeans Clustering (SM,MR)- Fuzzy kMeans (SM,MR)- Canopy Clustering (SM,MR)- Dirichlet (SM,MR)

Page 47: Apache Mahout Algorithms

Kmeans

Page 48: Apache Mahout Algorithms

Kmeans

Page 49: Apache Mahout Algorithms

Clustering Evaluation

Page 50: Apache Mahout Algorithms

Clustering Intra Distance

Page 51: Apache Mahout Algorithms

Clustering Inter Distance

Page 52: Apache Mahout Algorithms

Clustering.. Why in Mahout- Sparsity

- ~10m of 11m users registered 1 Sony product

Page 53: Apache Mahout Algorithms

Clustering.. Why in Mahout- Group Recommendation- Cluster Based Recommendation

Page 54: Apache Mahout Algorithms

Create WishList Experience

- Mahout (SVD)- Play- Heroku- MongoLab- Resthttp://recommenderplaybbs.herokuapp.com/

Page 55: Apache Mahout Algorithms

Thank you