Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.

Post on 19-Aug-2015

1.867 views 1 download

Tags:

Transcript of Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.

©2013 LinkedIn Corporation. All Rights Reserved.

Hacking Data SciencePatrick PhilipsVitaly Gordon

©2013 LinkedIn Corporation. All Rights Reserved. 2

Overview of ML pipeline

Gather data

Feature engineering

Model fitting

Evaluation

©2013 LinkedIn Corporation. All Rights Reserved. 3

Understanding Seniority

©2013 LinkedIn Corporation. All Rights Reserved. 4

Companies are not standard

©2013 LinkedIn Corporation. All Rights Reserved. 5

Titles are not enough

©2013 LinkedIn Corporation. All Rights Reserved. 6

Things change

©2013 LinkedIn Corporation. All Rights Reserved. 7

Learning to target better

©2013 LinkedIn Corporation. All Rights Reserved. 8

Classifying names to genders

©2013 LinkedIn Corporation. All Rights Reserved. 9

Let’s look at Monica again

©2013 LinkedIn Corporation. All Rights Reserved. 10

Not so fast …

©2013 LinkedIn Corporation. All Rights Reserved. 11

Not so fast …

©2013 LinkedIn Corporation. All Rights Reserved. 12

Even slower …

©2013 LinkedIn Corporation. All Rights Reserved. 13

Sometime the answer is just under your nose

©2013 LinkedIn Corporation. All Rights Reserved. 14

Comment Spam on Influencer content

©2013 LinkedIn Corporation. All Rights Reserved. 15

Challenge 1: Binary tasks are too guessable

©2013 LinkedIn Corporation. All Rights Reserved. 16

Challenge 2: Context matters

©2013 LinkedIn Corporation. All Rights Reserved. 17

Spam Comment Annotation Task

©2013 LinkedIn Corporation. All Rights Reserved. 18

Quality: Gold distributions and skewed datasets

©2013 LinkedIn Corporation. All Rights Reserved. 19

Using results to evaluate new features

Model ΔP ΔR ΔPRC

Baseline - - -

Variation 1 + - +

Variation 2 - + +

Variation 3 - ++ - -

Variation 4 - +++ ++

Variation 5 - +++ ++

Variation 6 - +++ ++

Variation 7 - ++++ +++

Variation 8 - ++++ +++

Variation 9 - ++++ +++

Variation 10 - ++++ +++

©2013 LinkedIn Corporation. All Rights Reserved. 20

“As simple as possible, but not simpler”

©2013 LinkedIn Corporation. All Rights Reserved. 21

Linkedin Channels

©2013 LinkedIn Corporation. All Rights Reserved. 22

Labels aren’t free

©2013 LinkedIn Corporation. All Rights Reserved. 23

Suggest likely candidates for topics then expand

©2013 LinkedIn Corporation. All Rights Reserved. 24

Evaluate suggested article-topic pairs

Using results to evaluate new implementations of spam classifier– Improve Prec without drop in Rec

18k comments labeled in 54 hrs for $180

©2013 LinkedIn Corporation. All Rights Reserved. 25

Quality: Not by Gold alone

©2013 LinkedIn Corporation. All Rights Reserved. 26

Using results to evaluate existing classification framework

©2013 LinkedIn Corporation. All Rights Reserved. 27

“Help your helpers”

©2013 LinkedIn Corporation. All Rights Reserved. 28

Search is a major portal to information

©2013 LinkedIn Corporation. All Rights Reserved. 29

LI Search is personalized

©2013 LinkedIn Corporation. All Rights Reserved. 30

Evaluation is still possible

©2013 LinkedIn Corporation. All Rights Reserved. 31

Search Evaluation – WTF@1

©2013 LinkedIn Corporation. All Rights Reserved. 32

Quality: Behavioral metrics are good too!

©2013 LinkedIn Corporation. All Rights Reserved. 33

“Pick a solvable problem”

©2013 LinkedIn Corporation. All Rights Reserved. 34

Standardizing titles

©2013 LinkedIn Corporation. All Rights Reserved. 35

©2013 LinkedIn Corporation. All Rights Reserved. 36

Which question is easier

1. Find a better name for the title “account executive”?

2. How similar are “account executive” and “sales executive”?

©2013 LinkedIn Corporation. All Rights Reserved. 37

©2013 LinkedIn Corporation. All Rights Reserved. 38

Notable Experts

©2013 LinkedIn Corporation. All Rights Reserved. 39

First attempt

©2013 LinkedIn Corporation. All Rights Reserved. 40

Second attempt

©2013 LinkedIn Corporation. All Rights Reserved. 41

Third attempt

©2013 LinkedIn Corporation. All Rights Reserved. 42

What makes the best data mining expert?

Education?

Industry experience?

Amount of publications?

Communication skills?

Hacking skills?

Knowledge of statistics?

Number of endorsements?

©2013 LinkedIn Corporation. All Rights Reserved. 43

“More bad data != better data”

©2013 LinkedIn Corporation. All Rights Reserved. 44

Summary

1. Use the data you already have

2. Keep it simple, but not too simple

3. Pick a solvable problem

4. Help your helpers

5. Sample intelligently

6. More (bad) data != better data

©2013 LinkedIn Corporation. All Rights Reserved. 45

Questions?