Best application paper at IEEE Big Data: Linkedin expertise search

Post on 25-Jan-2017

759 views 0 download

Transcript of Best application paper at IEEE Big Data: Linkedin expertise search

1Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Expertise Search @

Viet Ha-Thuc, Ganesh Venkataraman, Mario Rodriguez, Shakti Sinha, Senthil Sundaram and Lin Guo

Viet Ha-Thuc

2

• 200+ countries and territories

• 2+ new members per second

3

4

Talent SolutionsHelp recruiters and companies to search for the right talent with their desired expertise

5

Agenda

Introduction

Skill Reputation Scores

Personalized Learning-to-Rank

Results & Lessons

6

Introduction

Skills– 40K+ standardized skills– Members get endorsed on

skills– Represent professional

expertise

7

Introduction Expertise search on LinkedIn

– Skill and no personal name

8

Introduction Unique challenges to LinkedIn expertise Search

– Scale: 400M members x 40K standardized skills

– Sparsity of skills in profiles

– Personalization

9

Agenda

Introduction

Skill Reputation Scores

Personalized Learning-to-Rank

Results & Lessons

10

ReputationInformation a decision maker uses to make a

judgment on an entity with a record (*)

(*) “Building web reputation systems”, Glass and Farmer, 2010

11

Skill Reputation Scores

Decision Maker: searcher

Record: Professional career

Skill reputation: member expertise on a skill

Judgment: Hire?

12

Estimating Skill Reputation

Endorse profile

browsemap

? .85 .45? ? .35

? .42 ?

? ? .05Mem

bers

Skills

P(expert| member, skill)

Supervised Learning algorithm

13

Estimating Skill Reputation

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

? ? .05Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

Each row is a representation of a member in latent space

Each column represents a skill in

latent space

Matrix Factorization

14

Estimating Skill Reputation

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

.6 .85 .45

.14 .21 .35

.3 .42 .12

.02 .03 .05Mem

bers

SkillsFill in unknown cells in

the original matrix

15

Matrix Factorization

Matrix factorization by Alternating Least Squares optimization

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0Mem

bers

Skills

?

R M S

Si+1 = ArgminS ||R – Mi.S)||2

16

Matrix Factorization

Matrix factorization by Alternating Least Squares optimization

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

R M S

Mi+1 = ArgminM ||R – M.Si+1||2

?

17

Matrix Factorization

Matrix factorization by Alternating Least Squares optimization– Apache Mahout

Take skill co-occurrence patterns to infer missing skills– Members knowing “Big Data” are also likely to know “Hadoop”

18

Skill Reputation Feature

Project a query into latent space: Q = sj + sk

Reputation = mi . (sj+sk) = mi.sj + mi.sk

Efficiency: Pre-compute and index member-skill scores mi.sjSSkills

sj sk

Mem

bers

M

mi

19

Features Reputation feature

Social Connection

Homophily– Geo– Industry

Textual Features

20

Agenda

Introduction

Skill Reputation Scores

Personalized Learning-to-Rank

Results & Lessons

Ranking

▪ Manually tuning vs. Learning to Rank (LTR)

▪ Why Learning to Rank?– Hard to manually tune with very large number of features– Challenging to personalize– LTR allows leveraging large volume of click data in an

automated way

21

22

Training Data: click logs Top-K randomization

Uncertain (removed)

Bad: label = 0

Good: label = 1click

InMail Perfect: label = 3

23

Learning to Rank

Coordinate Ascent Listwise

– Consider relevance is relative to every query– Allow optimizing quality metric directly

Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels

24

Agenda

Introduction

Skill Reputation Scores

Personalized Learning-to-Rank

Results & Lessons

25

Experiments Query Tagging

Target Segment: skill and no-name Baseline

– No skill reputation feature– Hand-tuned

Search Products: Flagship and premium A/B Tests for 4 weeks

– Novelty effect: ignore 1st week– Size: hundreds of thousand searches

26

Results

CTR@10 # Messages per Search

Flagship +11% +20%

Premium +18% +37%

Improvements over the baseline

27

Take-Aways Going beyond text features

– Exploit structured data

Matrix factorization for large-scale reputation estimation – 400M members x 40K skills

Personalized Learning-to-Rank is crucial

Full Paper: http://arxiv.org/pdf/1602.04572v1.pdf

28

We are hiring!email: vhathuc@linkedin.com