Socializing Search. Professionally.

25
Recruiting Solutions Sriram Sankar Daniel Tunkelang Principal Staff Engineer Head, Query Understanding 1 Srira m Danie l Socializing Search. Professionally.

description

Socializing Search. Professionally. Sriram Sankar and Daniel Tunkelang Presented at the O'Reilly Strata 2014 Conference LinkedIn has a unique data collection: the 277M+ members who use LinkedIn are also the most valuable entities in our corpus, which consists of people, companies, jobs, and a rich content ecosystem. Our members use LinkedIn to satisfy a diverse set of navigational and exploratory information needs, which we address by leveraging semi-structured and social content to understanding their query intent and deliver a personalized search experience. As a result, we’ve built a system quite different from those used for web or enterprise search. In this talk, we will discuss how we have addressed the unique scalability, performance, and search quality challenges in order to deliver billions of deeply personalized searches to our members. Although many of the challenges we face are unique to LinkedIn, we hope that the ideas we share will prove useful to other folks thinking about entity-oriented search or working with large-scale social network data.

Transcript of Socializing Search. Professionally.

Page 1: Socializing Search. Professionally.

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Sriram SankarDaniel TunkelangPrincipal Staff Engineer Head, Query Understanding 1

Sriram Daniel

Socializing Search. Professionally.

Page 2: Socializing Search. Professionally.

Whether you’ve tried to find an Apache committer…

Page 3: Socializing Search. Professionally.

3

…or an Apache commander,

Page 4: Socializing Search. Professionally.

4

you’ve probably used LinkedIn Search.

Page 5: Socializing Search. Professionally.

5

Let’s talk about…

• Infrastructure • Quality

Sriram Daniel

Page 6: Socializing Search. Professionally.

6

LinkedIn Search leverages the economic graph.

Page 7: Socializing Search. Professionally.

7

Social means that relevance is highly personalized.

Page 8: Socializing Search. Professionally.

8

Machine-learned ranking, socially.

Relevance models incorporate user features:

score = P (Document | Query, User)

Our model: tree with logistic regression leaves.

8

X 2=0

X2=?

X2=1

X10< 0.1234 ?

Yes

No

Page 9: Socializing Search. Professionally.

9

LinkedIn’s focus: entity-oriented search.

Company

Employees

Jobs

Name

Search

Page 10: Socializing Search. Professionally.

10

Query understanding can act as a relevance filter.

10

for i in [1..n] s w1 w2 … wi

if Pc(s) > 0 a new Segment() a.segs {s} a.prob Pc(s) B[i] {a} for j in [1..i-1] for b in B[j] s wj wj+1 … wi

if Pc(s) > 0 a new Segment() a.segs b.segs U {s} a.prob b.prob * Pc(s) B[i] B[i] U {a} sort B[i] by prob truncate B[i] to size k

Page 11: Socializing Search. Professionally.

11

Less is more.

warren buffett

Page 12: Socializing Search. Professionally.

Jobs at LinkedIn

Searchlink

People currently working at LinkedIn

People who used to work at LinkedIn

Coming soon: entity-driven search assist.

Page 13: Socializing Search. Professionally.

13

Infrastructure

Lucene Map of terms to documents – the index Provides an API to add and remove documents to the

index Provides an API to query the index

Page 14: Socializing Search. Professionally.

14

BLAH BLAH BLAH Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH

BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH2.

1.

Daniel Sriram LinkedIn

2

1

Inverted Index Forward Index

Page 15: Socializing Search. Professionally.

15

A standard scoring capability is built in

Page 16: Socializing Search. Professionally.

16

Extremely easy to build a search engine

But difficult to get sophisticated

Page 17: Socializing Search. Professionally.

17

The LinkedIn Search Stack

Query Rewriter

Index Retrieval

Scorer

Sorter/Blender

Request

Response

OfflineData

Building

Updates

LiveUpdates

Data

Page 18: Socializing Search. Professionally.

18

Search Index Served by Lucene

Inverted index Forward index Static rank based document ordering

Page 19: Socializing Search. Professionally.

19

Offline Data Builds on Hadoop

Multi-stage map-reduce pipeline allows complex data processing

Produces sharded single segment Lucene index with documents sorted by static rank

Produces data models for use in query rewriting

Page 20: Socializing Search. Professionally.

20

Live Data Updates

Feed based framework to support updates to offline data builds

Lucene enhanced with a partial index update capability

Page 21: Socializing Search. Professionally.

21

Query Rewriting (and Planning)

Accepts raw query and user metadata Produces Lucene retrieval query and metadata for

scoring May use data models built offline

Page 22: Socializing Search. Professionally.

22

Index Retrieval

Lucene query built by query rewriter is used to retrieve documents from the Lucene index

Documents are retrieved in static rank order (best document first)

Retrieval may be early-terminated – given that retrieval is in static rank order

No scoring is performed during retrieval

Page 23: Socializing Search. Professionally.

23

Scoring

Scoring is performed after retrieval Its input is the retrieved document (i.e., includes the

forward index), a description of how the retrieval query matched the document, and the scoring metadata produced by the rewriter

Costly features can be computed offline during the index building process in Hadoop – e.g., tf/idf calculations

Page 24: Socializing Search. Professionally.

24

Summary

Quality LinkedIn Search leverages the economic graph. Social means that relevance is highly personalized. Less is more: query understanding is a relevance filter. Moving in the direction of suggesting structured queries.

System Powered by Lucene, but with additional components. Offline data builds on Hadoop, partial index updates. Index uses static ranking and early termination. Scoring performed outside of Lucene.

Page 25: Socializing Search. Professionally.

25

Sriram SankarDaniel [email protected] [email protected]://linkedin.com/in/sriramxsankar https://linkedin.com/in/dtunkelang