Movie2Books by Sumin Tang
-
Upload
tangsm -
Category
Data & Analytics
-
view
190 -
download
1
Transcript of Movie2Books by Sumin Tang
Data Sources and Processing Flow
20M Reviews Genres
2800 Movies & 1000 Bookswith 20+ reviews & no missing attribute
Similarity Scores
Book Recommendations for Each Movie
Book cover images
Collaborative filtering using user rating scores?Unfortunately the data is too sparse...
Poor performance even after SVD
80% movie-book pairs have 0 common user
Similarity Metrics for Movie-Book Pairs
Review TextCosine Similarity (C)
GenresJaccard Similarity (J)
Final Similarity Score
Validation
Users liked movie A
Users liked book B
Users liked both
• Based on rating scores from users who rated both movies and books
• For each movie, calculate Jaccard index between the movie and: – Jrec: recommended books
– Jbase: all the books
• Median(Jrec/Jbase)=26:people are 26x more likely to like Movie2Books recommendation than the random baseline
Out of 20 million reviews from 3.7 million users, about half of the reviews were provided by 10% of the users.
Books
MoviesTop 10% users
Top 10% users
Some fun stuff…
Is this a highly rated movie at Amazon?
=
Ratings of the Movie Ratings of All Movies Re-scaled scores=
=
Most vs Least Reviewed Items
• Both have very skewed distribution in ratings, with mode being 5
• The most reviewed items have higher fraction of 5s: popular products are indeed more liked by people.
Books
Movies