A Scalable, High-performance Algorithm for Hybrid Job Recommendations

13
Toon De Pessemier, Kris Vanhecke, Luc Martens, September, 2016 iMinds – Ghent University, Belgium [email protected] A Scalable, High-performance Algorithm for Hybrid Job Recommendations

Transcript of A Scalable, High-performance Algorithm for Hybrid Job Recommendations

Toon De Pessemier, Kris Vanhecke, Luc Martens,

September, 2016

iMinds – Ghent University, Belgium

[email protected]

A Scalable, High-performance Algorithm for Hybrid Job Recommendations

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

2

Introduction: Job recommendations

Not a classic recommender storyNot a classic solution

Specific metadata characteristics Discipline, industry, career level, …

Detailed user profile Experience, education (university degree), employment

Limited availability in time (active_during_test)

Various user-item interactions Click, bookmark, reply, delete

Specific meaning of delete (click on “X” load new item)

Impressions Recommendations generated by XING’s recommender Bias

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

3

Our goals

XING’s evaluation measure

Reflects typical XING use case

Scalable

Number of users and items

Dataset = subset of XING users

Incremental updates

Continuous stream of new job items

Updating models instead of recalculating

Fast score calculation

New job items fast distribution to target users

Limited computational resources

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

4

Findings

Challenge = Prediction task

≠ Recommendation task

No influence on user behavior

Recommendations are not evaluated by the user

Important quality metrics are not evaluated Usefulness

Risk: Items already discovered by the user

Items that the user already interacted with, can be recommended

DiversityRisk: Too much of the same

SerendipityRisk: Items that are difficult to find but interesting, are unfairly evaluated as

“poor recommendations”

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

5

Findings The information value of impressions is

limited Recommendations of existing job

recommender Bias to Xing’s algorithm

Less diverse

Subset of recommendations

No guarantee that the user has seen the item

No cold start user Better results if only the interactions are used

Penalty for items with a limited visibility

Low visibility low probability of interaction

Low visibility penalty better results

Item visibility estimated by number of interactions in training set

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

6

Findings

Influence of the user’s region Expected: interest for jobs located in the user’s

home region or in adjacent regions

Observed: Many interactions for jobs located in non-adjacent or far away regions E.g. Users of Lower Saxony Jobs in Baden-

Württemberg

Many cold-start users No interactions, no impressions (9.7%)

CB recommendation based on explicit profile Risk: too general or to specific profile

Risk: not updated by the user

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

7

Findings

Traditional classification does not work

Positive class: click, bookmark, reply

Negative class: delete

Recommendations: items most typical for the positive class

Poor score

Reasoning: meaning of delete action Click on X button in recommendation list

New recommendation will be loaded and displayed

Deletes not sampled from complete job offer but from recommendations (bias: items more similar to the user’s interests than random items)

Not necessarily a disinterest of the user

Intension to click: new recommendation

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

8

Content-Based Recommender

Based on feature matching Explicit user profile Interactions counter for each feature

Interaction weight Updating counters Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)

Positive counters (posf,u) item has feature Negative counters (negf,u) item does not have feature

Score calculation α = 0.5 (positive counters are more important than negative counters) IDF = inverse document frequency: feature frequency across all jobs N = total number of items nf = number of items with feature f wf = weight per feature type (tag, discipline, industry, …) u = user i = item

score(u,i) =1

𝑓𝜖 𝑖

𝑓∈𝑖

𝑤𝑓 𝑝𝑜𝑠𝑓,𝑢 − 𝛼 𝑛𝑒𝑔𝑓,𝑢 𝑙𝑜𝑔𝑁

𝑛𝑓

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

9

Content-based calculation

Profile Offline calculation

Incremental updates of counters

IDF Slightly varying over time

Periodic updates

Target items Active items

Minimum matching threshold (positive counters and item have X features in common)

Algorithm running in parallel for different users

Fast calculation of the recommendations

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

10

Collaborative filtering: KNN

Traditional KNN Distance based on interactions

Our KNN solution Distanced based on interactions and metadata

2 items are similar if users have interacted with both 2 items are similar if they have metadata features in common

Feature distance: factor 𝑙𝑜𝑔 𝑁

𝑛𝑓

Fine-grained distance function Risk of ties is reduced

Method: For each candidate item:

Calculate distance to k-nearest items that the user has positively interacted with

Select items with shortest distance

𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 =1

𝑘 𝑘

𝐷𝑖𝑠𝑡𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘

𝐷𝑖𝑠𝑡𝑚𝑎𝑥

Based on Weka Framework BallTree implementation of NearestNeighbourSearch package

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

11

KNN calculation

Item distances Offline calculation

Slightly varying over time

If partially computed distance > threshold stop calculation

Score calculation Fast if distances are precomputed

Algorithm running in parallel for different users

Fast calculation of the recommendations

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

12

Results and fallback

CB: 286,041.10 KNN: 298,316.85 Hybrid: 344,264.37 Fallback cold start users:

No interactions: KNN based on interactions is not possible (26.5% of users) No interactions use impressions (16.8% of users) Solution without fallback to impressions (only based on profile):

292,909.26

No interactions and no impressions (9.7% of the users): Hybrid CB

CB cannot generate recommendations: For 1485 users Recommend the 30 most popular items (most positive interactions) Without fallback to most popular recommender: 344,241.51 Most popular recommender as the only solution: 73,298.13

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier, Kris Vanhecke, Luc Martens

13

Questions?