Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

36
Fast, Lenient, and Accurate Building Personalized Instant Search Experience at LinkedIn Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha LinkedIn

Transcript of Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Page 1: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Fast, Lenient, and AccurateBuilding Personalized Instant Search Experience at LinkedIn

Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti SinhaLinkedIn

Page 2: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Agenda

● LinkedIn● LinkedIn Search

○ Navigational vs Exploratory searches○ Typeahead vs SERP

● Big picture and problem statement● Instant search – Search-as-you-type

○ Query autocomplete○ Entity-aware suggestions○ Instant results

● Conclusions & Future work

Page 3: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Professional Identity

Page 4: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Professional Graph

Page 5: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Jobs

Page 6: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – And much more...

Companies

Skills

Professional Content

Page 7: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Massive Scale

Page 8: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search

Page 9: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Navigational Search

Looking for someone specific by name.

Query has a single correct result.

Page 10: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Exploratory SearchFinding people that match a given set of criteria.

Multiple results match the user’s query.

Page 11: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Search – Search-as-you-typeSatisfy navigational searches: Show instant search results.

Help frame exploratory searches: Complete the user’s query and show search suggestions.

Page 12: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Big PicturePartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Page 13: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Big PicturePartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Focus today:● Autocomplete● Search suggestions● Instant results

Page 14: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Problem StatementPartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Focus today:● Autocomplete● Search suggestions● Instant results

How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate?

● Instant search = Query autocomplete + search suggestions + instant results● Fast = Search-as-you-type latencies● Lenient = Handle spelling errors and common variations● Accurate = Highly relevant and personalized results

Page 15: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Tagging

PERSON

TITLE(ID=126)

COMPANY(ID=1337)

Entity types identified: Person name, job title, company, school, skills, locations.

Key part of query processing!Impacts: autocomplete, spelling correction, search suggestions,query rewriting, ranking.

Sequential prediction model(CRF – Conditional Random Fields)

Training data:● Standardized dictionaries (people names,

companies, schools, titles, skills, locations)● Query logs● Clickthrough (CTR) data● Crowdsourced labels

Page 16: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete

● Fast● Relevant and contextual● Resilient to spelling errors

Page 17: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Offline processing

linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..

[linkedin] [software engineer]

Query logs Entities Index

FST – Finite State Transducers

Compact + fast retrieval + fuzzy match (via Levenstein Automata)

Page 18: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Online processingTwo step process:

1. Retrieval (Candidate generation)

User’s query: [big data e]

Candidates = C(big data e) U C(data e) U C(e)= big data engineer, big data expert systems, big data entry, ...

linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..

Query logs

Page 19: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Online processingTwo step process:

2. Scoring (Ranking)

User’s query: [big data e]Candidate completions: “big data engineer”, “big data expert”, “big data entry”

Score(“big data engineer”):

P(s1, s2, s3…) ≈ P(s1)·P(s2|s1)·P(s3|s2).. // Bigram language model

Use entities : P([engineer] | [big data])Fall back to words : P(engineer | data)·P(data | big)

Page 20: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Suggestions – Autocomplete + query tagger

“linke” ⇒ “Linkedin” ⇒ COMPANY

“had” ⇒ “Hadoop” ⇒ SKILL

Page 21: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results

● Fast retrieval over 450+ million members● Highly personalized● Balance personalization & popularity● Resilient to spelling variations

Page 22: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

NAME: richardPREFIX: r, ri, ric, rich, richa, ...NAME: bransonPREFIX: b, br, bra, bran, brans, ...

● Inverted Index (Maps token to list of docs that contain that token):NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard”PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri”…

● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b

● Prefix-based tokenization:

DOCID 4

(posting lists)

Page 23: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

CONN: 1, 10, 15

● Inverted IndexCONN:4 => [1, 10, 15] // Everyone connected to Richard BransonCONN:1 => [4, ...]CONN:10 => [4, ...]...

● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b +CONN:1

(Everyone named richard b… and connected to User:1)

● Connections Index:

DOCID 4

Page 24: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

Early Termination

Problem: A query like [PREFIX:ri] might retrieve too many candidate documents.

How can we retrieve the most promising documents first so that we don’t need to score all of them?

Static Rank: Order documents based on their prior (query independent) likelihood of relevance:

A combination of:● Profile views● Spam and security related scores● Editorial rules (Celebrities, influencers, …)

numToScore: The number of documents to retrieve and score for any query

Page 25: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Balancing Popularity and Personalization

Query: richard b…

Are you looking for Richard Branson, or a colleague name Richard Burton?

(Assume searcher’s ID = 1)

Rewritten Query:

● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections.

● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections

Instant Results – Retrieval

Custom search operator: “Weighted OR”

Page 26: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Spelling Variations

weiner ⇔ wiener

catherine ⇔ kathryn

dipak ⇔ deepak

Page 27: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Name Clusters

Offline process to cluster together similar sounding or similarly spelt names.

Two step process:

1. Coarse clustering (optimized for broad coverage)Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f)Combination of edit distance & double metaphone (sound)E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff)

2. Fine-grained clustering (optimized for precision)Split up clusters based on more sophisticated rulesPosition and character-aware edit distanceQuery reformulation data (q1 → q2 → click)E.g. (jeff ≠ joff)

Instant Results – Spelling Variations

Page 28: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Spelling Variations

NAME: kathrynCLUSTER: katharine

Potential queries:katherinekathrynkatharinecatharine

Rewritten queries:?NAME:katherine ?CLUSTER:katharine?NAME:kathryn ?CLUSTER:katharine?NAME:katharine ?CLUSTER:katharine?NAME:catharine ?CLUSTER:katharine

Either match original query term or match the name cluster

Query time

Indexing time

Page 29: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Clicked result treated as positive.

All other shown results treated as negative.

Since this is navigational search, we assume there’s only 1 correct result => low presentation bias.

Learning to Rank (Machine-learned ranking)

Training data● Click data from previous typeahead sessions● <searcher, query, doc> ⇒ positive/negative

Features / signals● Textual match against various fields● Network distance, number of shared connections● Global popularity● Compound features

Instant Results – Scoring

+

Page 30: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Conclusions● Instant search experience

○ Directly satisfy navigational search uses in typeahead via Instant Results

○ Help the user frame exploratory search queries via Query Autocomplete & Search

Suggestions

● Combination of techniques○ Query tagger for entity extraction – “Things not Strings”○ FST-based query completion○ Inverted index-based instant results + Early termination + Weighted OR○ Name clusters for fuzzy name matching

Page 31: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Future Work● Personalized query completions

○ m ⇒ machine learning

○ m ⇒ machinist

● Multi-entity query suggestions○ Now : [linkedin] ⇒ “Find people who work at LinkedIn”

○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn”

● Better blending○ Autocomplete + query suggestions + instant results○ Query features – what does the query mean?○ Results features – what results come back from each system?

Page 32: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Thank You!

Page 33: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – The Economic Graph

Page 34: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – SERP (Jobs)

Page 35: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – Typeahead

Page 36: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – SERP