Movie2Books by Sumin Tang

13
Recommending Books from your favorite Movie Sumin Tang Movie2Books.com

Transcript of Movie2Books by Sumin Tang

Recommending Books from your favorite Movie

Sumin Tang

Movie2Books.com

Data Sources and Processing Flow

20M Reviews Genres

2800 Movies & 1000 Bookswith 20+ reviews & no missing attribute

Similarity Scores

Book Recommendations for Each Movie

Book cover images

Collaborative filtering using user rating scores?Unfortunately the data is too sparse...

Poor performance even after SVD

80% movie-book pairs have 0 common user

Similarity Metrics for Movie-Book Pairs

Review TextCosine Similarity (C)

GenresJaccard Similarity (J)

Final Similarity Score

Validation

Users liked movie A

Users liked book B

Users liked both

• Based on rating scores from users who rated both movies and books

• For each movie, calculate Jaccard index between the movie and: – Jrec: recommended books

– Jbase: all the books

• Median(Jrec/Jbase)=26:people are 26x more likely to like Movie2Books recommendation than the random baseline

Sumin Tanghttps://www.linkedin.com/in/sumintang

Out of 20 million reviews from 3.7 million users, about half of the reviews were provided by 10% of the users.

Books

MoviesTop 10% users

Top 10% users

Some fun stuff…

Is this a highly rated movie at Amazon?

Don’t like it Really like it

Is this a highly rated movie at Amazon?

=

Ratings of the Movie Ratings of All Movies Re-scaled scores=

=

Most vs Least Reviewed Items

• Both have very skewed distribution in ratings, with mode being 5

• The most reviewed items have higher fraction of 5s: popular products are indeed more liked by people.

Books

Movies

Most vs Least Active Users

The least active users give more bad ratings (score=1): they are more likely to write a review if they really don’t like the product?

Books

Movies