Recommendations and Discovery at StumbleUpon

Recommendations and Discoveryat StumbleUpon

Sumanth Kolar,

Director, Engineering

StumbleUpon’s Mission

Help users find content they did not expect to find

Be the best way to discover new and interesting things from across

the Web.

How StumbleUpon works

1. Register 2. Tell us your interests 3. Start Stumbling and rating web pages

We use your interests and behavior to recommend new content for you!

There is a ongoing shift from search to discovery

Discovery is very different from search

Discovery at StumbleUpon Search

Serendipitous Intent driven

One at a time List of articles

Never repeats Always repeats

Constantly adapting Fixed results

Tailored for you Impersonal

StumbleUpon

StumbleUpon Overview

Discovery Crawled

Ingestion Pipeline

Sampling Pass?

Rec Engine

Users Automated

URL Index

What are the key challenges to good recommendations?

Pillars of good recommendations

Understand who the user is and what he is interested in.

Separate good content from the bad.

Learn from your recommendations.

Explore various techniques for matching users to content.

User self reports topics of interest

Part of the sign up flow…

User’s Interest Graph

Food/Cooking User

VintageCars

Italian Recipes

Continually Enhance a User’s Interest Graph

Analyze user’s StumbleUpon history to expand on interest preferences:

• Add/remove topics• Follow/block particular domains

Leverage social network data:

• Find friends & people to follow

• Find content trending in your social circles

• Find additional interests

Mine internal StumbleUpon rating and sharing data to suggest other stumblers, topics.

Enhanced Interest Graph

Food/Cooking

VintageCars

Italian Recipes

nasa.gov

1x.com

Friends

Trending

On average hundreds of URLs are ingested into the

StumbleUpon pipeline every minute.

• Sampling key goals:

1. Determine which URLs to sample and which to skip completely

2. Examine sampling results to identify good URLs

• URL features used when sampling:

• Known domain performance(ratings, timespent)• Content related features (#images, #ads, url length etc)• User features of the discoverer (spammer vs trusted user)

Sampling

Recommend

Classifier based on User Feedback (Timespent, Ratings)

Random Forest

Webpage

Recommendations at StumbleUpon: Sampling

Rating Timespent

Good 35sec

Good 22sec

Bad 15sec

Good 45sec

Good 14sec

Good 28sec

• Users who thumb-up good content and thumb-down bad content

• For example– Joe DiMaggio – Baseball– Julia Child- Food/Cooking– Da Vinci- Art and Architecture

• Ratings from Experts are more trustworthy and earn more weight.

Leveraging In-Network Experts

rtRecommendations at StumbleUpon: Experts

Challenge: User expectations are different

“I LOVE cars!”-Anonymous Stumbler

“Me too!”-Another Stumbler

• Find users who like content similar to the content you do

• Signals can be ratings, time spent, interests, etc.

• Use the content they’ve liked

Like-Minded Users

NeuroscienceAstronomySpace ExplorationComedy Movies

Astronomy Space ExplorationPhysics Classic Movies

Vintage CarsAction moviesAstronomyRobotics

Science

Movies

PLSI based like-minded

Total Pairwise Similarity Calculations

= 50K users * 5 million users * 1K features

= 250 Trillion Probabilistic Latent Semantic Index (PLSI)

based similarity over 500 trillion calculations PLSI based similarity framework computes in

less than an hour

Like-Minded Users: Challenges Scaling

Food/Cooking

VintageCars

Italian Recipes

nasa.gov

1x.com

Experts Friends

Trending

Grow User’s Interest Graph: Implicit + Explicit

LikemindedUsers

Different methods perform differently for different users at different times

User 1 User 2 User 3 User 4 User 50%

TrendingFollowBias domainsExpertsNewsLike-minded

Recommendation context

Two Main Signals from Recommendation

Rating Time Spent

Both present numerous challenges . . .

Users rate more during their initial experience

Why is this happening?

Ratings: volume decay

Images

T5 sec

T4 sec

• Ratings are sparse• < 10% of recommendations have explicit ratings.

• Using time spent decide whether the stumble was skipped• Timespent on videos is longer than images. • Solution: Estimate p(Like | Timespent)

• Model based on user, content patterns

T3 sec

Images

T2 sec

Time Spent

T1 sec

Installed plugin

Stumble Bar

Mobile / Tablets

Challenges: Time spent on different devices

5th percentile time spent per stumble

How do we know we are doing a good job?

Extensive A/B Testing

AB Tests on metrics such as session length, retention, rating behavior etc

Measurable Improvements In Rec Quality

12/1/0

8 0:00

9 0:00

10/1/0

9 0:00

12/1/0

9 0:00

0 0:00

10/1/1

0 0:00

12/1/1

0 0:00

1 0:00

10/1/1

1 0:00

12/1/1

1 0:00

2 0:00

2 0:000

R² = 0.737311794306772

Normalized Likes vs Dislikes

Recent Months

+111% improvement!

• Dupe detection• Anti-spam• News• Topic classification• Metrics, quality analysis• Trending• Search• User biases, mood• Many more…

Many other interesting problems…

We are HIRING !!!

Recommendations and Discovery at StumbleUpon

Technology

Transcript of Recommendations and Discovery at StumbleUpon

HBase and Hive at StumbleUpon Jean-Daniel Cryans DB Engineer at StumbleUpon HBase Committer @jdcryans, jdcryans@apache.org.

StumbleUpon .

Cutting-edge Meetings Tech: Feeds & Needs · • Use the “Ask a Question” button to ask questions at any time ... Discovery Engines •Quora •StumbleUpon •Trap.it. Is the

8 Super Simple Tips For Your Profile On StumbleUpon

DROPBOX / EVERNOTE / GMAIL / STUMBLEUPON / … · dropbox / evernote / gmail / stumbleupon / instagram / pinterest / delicious DROPBOX / EVERNOTE / GMAIL / STUMBLEUPON / INSTAGRAM

Traffic Getting Secret from StumbleUpon

HBase and Hive at StumbleUpon Presentation

StumbleUpon€¦ · StumbleUpon Submit reddit submit 0 0 Like SShaarree 0. cpJ NTA Newstime Africa . Created Date: 6/10/2018 12:37:58 PM ...

DISCOVERY IN NEW YORK CRIMINAL COURTS … · CRIMINAL COURTS SURVEY REPORT & RECOMMENDATIONS ... • Reliance upon formal discovery motion practice appears to prolong the …

Social bookmarking: Get started with StumbleUpon

StumbleUpon Mobile Advertising Webinar Recap - Oct 2012

Product Recommendations, Personalization, and Discovery

A.Frank 1 Internet Resources Discovery (IRD) Whither Search Engine (SE)?! Some Practical Recommendations.

HBase @ Stumbleupon

Sons of Perdition - (StumbleUpon) 8-6-2012

Our Secret Arsenal: StumbleUpon, Reddit and Digg

Will be Social Marketing at StumbleUpon Truly Rocket Science?

Query Recommendations for OLAP Discovery-driven …negre/fichiers_joints/IJDWM10.pdf · Query Recommendations for OLAP Discovery-driven Analysis Arnaud Giacometti, Patrick Marcel,

Technologies Presentation - StumbleUpon

StumbleUpon Pictures 4