©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform: A Big Data Application
Hadoop Summit, June 2013Hien Luu, Sid Anand
About Us
*
Hien Luu Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Our missionConnect the world’s professionals to make
them more productive and successful
Over 200M members and counting
2004 2005 2006 2007 2008 2009 2010 2011 2012
2 4 817
32
55
90
145
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
*
>88%Fortune 100 Companies use LinkedIn Talent Soln to hire
Company Pages>2.9M
Professional searches in 2012
>5.7BLanguages19
>30MFastest growing demographic: Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
*
• Headquartered in Mountain View, Calif., with offices around the world!
• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world
Source :
http://press.linkedin.com/about
Agenda
Company Overview
Big Data @ LinkedIn
The Segmentation & Targeting Problem
Solution : LinkedIn Segmentation & Targeting Platform
Q & A
Big Data @ LinkedIn
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn : Big Data Story
©2013 LinkedIn Corporation. All Rights Reserved.
Our Big Data Story depends on Infrastructure!• On-line Data Infrastructure
• Near-line Data Infrastructure
• Offline Data Infrastructure
Oracle or Espresso
Updates
Web Serving
Teradata
Data Streams
Near-lineOn-line Off-line
Big Data Story : On-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
On-line Data Infrastructure
• Supports typical OLTP requirements • Highly concurrent R/W access• Transactional guarantees• Back-up & Recovery
• Supports a central LinkedIn Data Principle! • “All data everywhere”
• All OLTP databases need to provide a time-line consistent change stream
• For this, we developed and open-sourced Databus!
Oracle or Espresso
Updates
Web Serving
On-line
Big Data Story : On-line Data
Oracle or Espresso Data Change Events
Search Index
Graph Index
Read Replicas
Updates
Standardization
A user updates the company, title, & school on his profile. He also accepts a connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:• the profile change is applied to the Standardization service
E.g. the many forms of IBM were canonicalized for search-friendliness• …. and to the Search Index
Recruiters can find you immediately by new keywords• the connection change is applied to the Graph Index service
The user can now start receiving feed updates from his new connections
Big Data Story : On-line Data
Databus streams also update Hadoop!
Oracle or Espresso
Search Index
Graph Index
Read Replica
Updates
Standardization
Data Change Events
Big Data Story : Near-line & Off-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
2 Main Sources of Data @ LinkedIn• User-provided data
• e.g. Member Profile data (e.g. employment, education history, endorsements)
• Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares
Oracle or Espresso
Updates
Databus
Web Servers Kafka
Teradata
The
Segmentation & Targeting
Problem
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting Attribute types
Bhaskar Ghosh
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion Pick members where• Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"
Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion Pick members where• Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"
Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Attributes
SegmentDefinition
Segment
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Problem Definition
• The business wants to launch new campaigns often
• The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes
• The attributes often need to be computed to fulfill the targeting criteria
• This data resides on Hadoop or TD
• The business is most comfortable with SQL-like languages
Segmentation & Targeting Solution
©2013 LinkedIn Corporation. All Rights Reserved.
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Attribute Computation
Engine
Attribute Serving Engine
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Attribute Computation
Engine
Self-service
Support various data sources
Attribute
consolidation
Attribute
availability
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Attribute computation
~225M
PB
TB
TB
~240
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Attribute Portal Web Application
Attribute & DefinitionMetadata
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Attribute & DefinitionMetadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
M/RStitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
DataLoader
Attribute consolidation & availability
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
LinkedIn big table, the most sought after data
Segmentation
Propensity Model
Ad hoc analysis
LinkedIn big table
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Attribute Serving Engine
Self-service
Attribute predicateexpression
Build
segments
Build lists
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Serving Engine
$
count filter sumcomplex
expressions
Σ1234
LinkedIn big table
~225M
~240
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Inverted Index
Inverted Index
Inverted Index
M/RIndexer
LinkedIn big table
Attribute & DefinitionMetadata
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
JSON Predicate Expression
JSON Lucene Query Parser
Inverted Index
Inverted Index
Inverted Index
Segment &List
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Complex tree-like attribute predicate expressions
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
A marketing campaign is represented by a list
©2013 LinkedIn Corporation. All Rights Reserved.
Conclusion
Move at business speed and scale at LinkedIn scale
Segmentation & Targeting Platform– Self-service– Multiple data sources & massive data volume– Support complex expression evaluation in seconds– Attribute availability at business speed
©2013 LinkedIn Corporation. All Rights Reserved.
Engineering Team
Jessica Ho Swetha Karthik Raj Rangaswamy Tony Tong Ajinkya Harkare Hien Luu Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com
Top Related