Deep Learning for Personalized Search and Recommender Systems

113
Deep Learning for Personalized Search and Recommender Systems Ganesh Venkataraman Airbnb Nadia Fawaz, Saurabh Kataria, Benjamin Le, Liang Zhang LinkedIn 1

Transcript of Deep Learning for Personalized Search and Recommender Systems

Page 1: Deep Learning for Personalized Search and Recommender Systems

DeepLearningforPersonalizedSearchandRecommender

Systems

GaneshVenkataramanAirbnb

NadiaFawaz,SaurabhKataria,BenjaminLe,LiangZhangLinkedIn

1

Page 2: Deep Learning for Personalized Search and Recommender Systems

TutorialOutline

• PartI(45min)DeepLearningKeyconcepts• PartII(45min)DeeplearningforSearchandRecommendationsatScale

• Coffeebreak (30min)

• DeepLearningCaseStudies• PartIII (45min)JobsYouMayBeInterestedIn(JYMBII)atLinkedIn• PartIV(45min)JobSearchatLinkedIn

Q&Aattheendofeachpart2

Page 3: Deep Learning for Personalized Search and Recommender Systems

Motivation– WhyRecommenderSystems?

• Recommendationsystemsareeverywhere.Someexamplesofimpact:• “Netflixvaluesrecommendationsathalfabilliondollarstothecompany”[netflix recsys]

• “LinkedInjobmatchingalgorithmstoimprovesperformanceby50%”[SanJoseMercuryNews]

• “Instagramswitchestousingalgorithmicfeed”[Instagramblog]

3

Page 4: Deep Learning for Personalized Search and Recommender Systems

Motivation– WhySearch?

4

PERSONALIZEDSEARCH

4

Query=“thingstodoinhalifax”Searchview– thisisaclassicIRproblemRecommendationsview– Forthisquery,whataretherecommendedresults?

Page 5: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearning?Whynow?

• Manyofthefundamentalalgorithmictechniqueshaveexistedsincethe80sorbefore

2.5Exobytes ofdataproducedperdayOr530,000,000songs150,000,000iPhones 5

Page 6: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearning?

ImageclassificationeCommercefraudSearchRecommendationsNLP

Deeplearningiseatingtheworld

6

Page 7: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearningandRecommenderSystems?• Features

• Semanticunderstandingofwords/sentencespossiblewithembeddings• Betterclassificationofimages(identifyingcatsinYouTubevideos)

• Modeling• Canwecastmatchingproblemsintoadeep(andpossibly)widenetandlearnfamilyoffunctions?

7

Page 8: Deep Learning for Personalized Search and Recommender Systems

PartI– RepresentationLearningandDeepLearning:KeyConcepts

8

Page 9: Deep Learning for Personalized Search and Recommender Systems

DeepLearningandAI

http://www.deeplearningbook.org/contents/intro.html 9

Page 10: Deep Learning for Personalized Search and Recommender Systems

PartIOutline

• ShallowModelsforEmbeddingLearning• Word2Vec

• DeepArchitectures• FF,CNN,RNN

• TrainingDeepNeuralNetworks• SGD,Backpropagation,LearningRateSchedule,Regularization,Pre-Training

10

Page 11: Deep Learning for Personalized Search and Recommender Systems

LearningEmbeddings

11

Page 12: Deep Learning for Personalized Search and Recommender Systems

Representationlearningforautomatedfeaturegeneration

• NaturalLanguageProcessing• Wordembedding:word2vec,GloVe• SequencemodelingusingRNN’sandLSTM’s

• GraphInputs• DeepWalk

• MultipleHierarchyoffeaturesforvaryinggranularitiesforsemanticmeaningwithdeepnetworks

12

Page 13: Deep Learning for Personalized Search and Recommender Systems

ExampleApplicationofRepresentationLearning- UnderstandingText• Oneofthekeystoanycontentbasedrecommendersystemisunderstandingtext

• Whatdoes“understanding”mean?• Howsimilar/dissimilarareanytwowords?• Whatdoesthewordrepresent?(NamedEntityRecognition)• “AbrahamLincoln,the16th President...”• “MycousindrivesaLincoln”

13

Page 14: Deep Learning for Personalized Search and Recommender Systems

Howtorepresentaword?

• Vocabulary– run,jog,math• Simplerepresentation:

• [1,0,0],[0,1,0],[0,0,1]

• Norepresentationofmeaning• Cooccurrenceinaword/documentmatrix

14

Page 15: Deep Learning for Personalized Search and Recommender Systems

Howtorepresentaword?

• Troublewithcooccurrencematrix• Largedimension,lotsofmemory

• DimensionalityreductionusingSVD• Highcomputationaltimenxm matrix=>O(mn^2)• Addingnewword=>redoeverything

15

Page 16: Deep Learning for Personalized Search and Recommender Systems

Wordembeddingstakingcontext

• KeyConjecture• Contextmatters.• Wordsthatconveyacertaincontextoccurtogether

• “AbrahamLincolnwasthe16th PresidentoftheUnitedStates”• Bigrammodel

• P(“Lincoln”|”Abraham”)

• SkipGramModel• Considerallwordswithincontextandignoreposition• P(Context|Word)

16

Page 17: Deep Learning for Personalized Search and Recommender Systems

Word2vec

17

Page 18: Deep Learning for Personalized Search and Recommender Systems

Word2Vec:SkipGramModel

• Basicnotations:• w representsaword,C(w) representsallthecontextaroundaword• 𝜃 representstheparameterspace• Drepresentallthe(w,c)pairs

• 𝑝 𝑐 𝑤; 𝜃 representstheprobabilityofcontextcgivenwordwparametrizedby𝜃

• Theprobabilityofallthecontextappearinggivenawordisgivenby:• ∏ 𝑝(𝑐|𝑤; 𝜃)�

+∈-(.)

• Thelossfunctionthenbecomes:• 𝑎𝑟𝑔𝑚𝑎𝑥𝜃∏ 𝑝(𝑐|𝑤; 𝜃)�

.,+ ∈6

18

Page 19: Deep Learning for Personalized Search and Recommender Systems

Word2vecdetails

• Let𝑣.and𝑣+ representthecurrentwordandcontext.Notethat𝑣+and𝑣. areparameterswewanttolearn

• p c w; 𝜃 = <=>∗=@

∑ <=B∗=@B∈C

• C representssetofallavailablecontexts

19

Page 20: Deep Learning for Personalized Search and Recommender Systems

NegativeSampling– basicintuition

p c w; 𝜃 = 𝑒E>∗E@

∑ 𝑒EB∗E@F∈-

• Samplefromunigramdistributioninsteadoftakingallcontextsintoaccount

• Word2vecitselfisashallowmodelandcanbeusedtoinitializeadeepmodel

20

Page 21: Deep Learning for Personalized Search and Recommender Systems

DeepArchitecturesFF,CNN,RNN

21

Page 22: Deep Learning for Personalized Search and Recommender Systems

Neuron:ComputationalUnit

• Inputvector:x=[x1,x2,… ,xn]

• Neuron• Weightvector:W• Bias:b• Activationfunction:f

• Outputa=f(WTx+b)

x1

x2

x3

x4

Wbf

a=f(WTx +b)

Inputx Neuron Outputa 22

Page 23: Deep Learning for Personalized Search and Recommender Systems

ActivationFunctions• Tanh:ℝ → (-1,1)

tanh(𝑥) =𝑒M − 𝑒OM

𝑒M + 𝑒OM

• Sigmoid:ℝ → (0,1)

𝜎 𝑥 =1

1 + 𝑒OM

• ReLU:ℝ → [0,+∞)

𝑓 𝑥 = max 0, 𝑥 = 𝑥Whttp://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

23

Page 24: Deep Learning for Personalized Search and Recommender Systems

Layer

• Layerl:nl neurons• weightmatrix:W=[W1,…,Wnl]• biasvector:b=[b1,…,bnl]• activationfunction:f

• outputvector• a=f(WTx+b)

x1

x2

x3

x4

W1b1f

a1 =f(W1Tx+b1)

W2b2f

a2=f(W2Tx+b2)

Inputx Layer Outputa

W3b3f

a3=f(W3Tx+b3)

24

Page 25: Deep Learning for Personalized Search and Recommender Systems

Layer:MatrixNotation

• Layerl:nl neurons• weightmatrix:W• biasvector:b• activationfunction:f

• outputvector• a=f(WTx+b)

• morecompactnotation• fast-linearalgebraroutinesforquickcomputationsinnetwork

x1

x2

x3

x4

Inputx Layer Outputa

a =f(WTa +b )

W,b ,f

25

Page 26: Deep Learning for Personalized Search and Recommender Systems

FeedForwardNetwork• DepthLlayers

• Activationatlayerl+1a(l+1)=f(W(l)Ta(l)+b(l) )

• Output:predictioninsupervisedlearning

• goal:approximatey=F(x)

x1

x2

x3

x4

InputLayer1 HiddenLayer3

a(3)

HiddenLayer2W(1) ,b(1) ,f(1) W(2) ,b(2) ,f(2)

a(2)

DepthL=4

a(L)

W(3) ,b(3) ,f(3)

26OutputLayer4:Predictionlayer

Page 27: Deep Learning for Personalized Search and Recommender Systems

WhyCNN:ConvolutionalNeuralNetworks?

• Largesizegridstructureddata• 1D:timeseries• 2D:image

• Convolutiontoextractfeaturesfromimage(e.g.edges,texture)• Localconnectivity• Parametersharing• Equivariance totranslation:smalltranslationsininputdonotaffectoutput

Page 28: Deep Learning for Personalized Search and Recommender Systems

Convolutionexample

https://docs.gimp.org/en/plug-in-convmatrix.html

Edgedetectkernel Sharpenkernel

Page 29: Deep Learning for Personalized Search and Recommender Systems

2Dconvolution

http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/

2Dkernel(3x3)

W1 W2 W3 W4

inputmatrix

Kernelmatrix(2x2)

29

Page 30: Deep Learning for Personalized Search and Recommender Systems

• Fullyconnected• hiddenunitconnectedtoallinputunits• computationallyexpensive

• LargeimageNxN pixelsandHiddenlayerKfeatures• Numberofparameters:~KN2

• Locallyconnected• hiddenunitconnectedtosomecontiguousinputunits

• noparametersharing

• Convolution• locallyconnected• kernel:parametersharing

• 1DKernelvector[W1,W2]• 1DToeplitzweightmatrixW

• Scalingtolargeinput,images• Equivariance totranslation

30

W11 W12 W22 W23 W33 W34

W1 W2 W1 W2 W1 W2

W11 W12 W13 W14

W21 W22 W23 W24

W31 W32 W33 W34

W11 W12 0 0

0 W22 W23 0

0 0 W33 W34

Kernelvector

WeightmatrixW

Convolution

W1 W2 0 0

0 W1 W2 0

0 0 W1 W2

Page 31: Deep Learning for Personalized Search and Recommender Systems

Pooling

• Summarystatistics• Aggregateoverregion• Reducesize• Lessoverfitting

• Translationinvariance• Max,mean

http://ufldl.stanford.edu/tutorial/supervised/Pooling/

31

Page 32: Deep Learning for Personalized Search and Recommender Systems

CNN:ConvolutionalNeuralNetwork

Combination• Convolutionallayers• Poolinglayers• Fullyconnectedlayers

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

32

[LeCun etal.,1998]

Page 33: Deep Learning for Personalized Search and Recommender Systems

CNNexampleforimagerecognition:ImageNet[Krizhevsky etal.,2012]

Picturescourtesyof[Krizhevsky etal.,2012],http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

33

1st GPU

2nd GPU

filterslearnedbyfirstCNNlayer

Page 34: Deep Learning for Personalized Search and Recommender Systems

WhyRNN:RecurrentNeuralNetwork?• Sequentialdataprocessing

• ex:predictnextwordinsentence:“IwasborninFrance.Icanspeak…”

• RNN• Persistinformationthroughfeedbackloop

• looppassesinformationfromonesteptothenext

• Parametersharingacrosstimeindexes• outputunitdependsonpreviousoutputunitsthroughsameupdaterule.

xt

ht

ht-1

Page 35: Deep Learning for Personalized Search and Recommender Systems

UnfoldedRNN• CopiesofNNpassingfeedbacktooneanother

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

35

Page 36: Deep Learning for Personalized Search and Recommender Systems

LSTM:LongShortTermMemory[Hochreiter etal.,1997]• Avoidvanishingorexplodinggradient

• Cellstateupdatesregulatedbygates• Forget:howmuchinfofromcellstatetoletthrough

• Input:whichcellstatecomponentstoupdate• Tanh:valuestoaddtocellstate• Output:selectcomponentvaluestooutput

picturecourtesyofhttp://colah.github.io/posts/2015-08-Understanding-LSTMs/

Cellstate

• Longtermdependencies• largegapbetweenrelevantinformationand

whereitisneeded• Cellstate:long-termmemory• Canrememberrelevantinformationoverlong

periodoftime

36

Page 37: Deep Learning for Personalized Search and Recommender Systems

ExamplesofRNNapplication

• Speechrecognition[Gravesetal.,2013]• Languagemodeling[Mikolov,2012]

• Machinetranslation[Kalchbrenner etal.,2013][Sustkever etal.,2014]• Imagecaptioning[Vinyals etal.,2014]

37

Page 38: Deep Learning for Personalized Search and Recommender Systems

TrainingaDeepNeuralNetwork

38

Page 39: Deep Learning for Personalized Search and Recommender Systems

CostFunction• mtrainingsamples(featurevector,label)

(𝑥 X , 𝑦 X ), … , (𝑥 [ , 𝑦 [ )

• Persamplecost:errorbetweenlabelandoutputfrompredictionlayer

𝐽 𝑊, 𝑏; 𝑥 _ , 𝑦 _ = 𝑎(`) 𝑥 _ − 𝑦(_)a

• Minimizecostfunctionoverparameters:weightsWandbiasesb

𝐽 𝑊, 𝑏 = 1𝑚b𝐽(𝑊, 𝑏; 𝑥 _ , 𝑦(_))

[

_cX

+𝜆2b 𝑊(f)

ga

`

fcX

Averageerror Regularization39

Page 40: Deep Learning for Personalized Search and Recommender Systems

GradientDescent

• Randomparameterinitialization:symmetrybreaking

• Gradientdescentstep:updateforeveryparameterWij(l)andbi(l)

𝜃 = 𝜃 − 𝛼𝛻j𝔼[𝐽(𝜃)]

• GradientcomputedbyBackpropagation• Highcostofbackpropagationoverfulltrainingset

40

Page 41: Deep Learning for Personalized Search and Recommender Systems

StochasticGradientDescent(SGD)

• SGD:follownegativegradientafter• singlesample

𝜃 = 𝜃 − 𝛼𝛻nJ(θ; 𝑥 _ , 𝑦(_))

• afewsamples:mini-batch(256)

• Epoch:fullpassthroughtrainingset• Randomlyshuffledatapriortoeachtrainingepoch

41

Page 42: Deep Learning for Personalized Search and Recommender Systems

Backpropagation[Rumelhart etal.,1986]

Goal:Computegradientnumerically

RecursivelyapplychainruleforderivativeofcompositionoffunctionsLet𝑦 = 𝑔 𝑥 and𝑧 = 𝑓 𝑦 = 𝑓(𝑔(𝑥))

thenstsM= st

sususM= 𝑓v 𝑔 𝑥 𝑔′(𝑥)

Backpropagationsteps1. Feedforwardpass:computeallactivations2. Outputerror:measuresnodecontributiontooutputerror3. Backpropagate errorthroughalllayers4. Computepartialderivatives

42

Page 43: Deep Learning for Personalized Search and Recommender Systems

Trainingoptimization• LearningRateSchedule

• Changinglearningrateaslearningprogresses

• Pre-training• Goal:trainingsimplemodelonsimpletaskbeforetrainingdesiredmodeltoperformdesiredtask• Greedysupervisedpre-training:pre-trainfortaskonsubsetoflayersasinitializationforfinalnetwork

• Regularizationtocurboverfitting• Goal:reducegeneralizationerror• Penalizeparameternorm:L2,L1• Augmentdataset:trainonmoredata• Earlystopping:returnparametersetatpointintimewithlowestvalidationerror• Dropout[Srivatstava,2013] :trainensembleofallsubnetworksformedbyremovingnon-outputunits

• Gradientclippingtoavoidexplodinggradient• normclipping• elementwiseclipping

43

Page 44: Deep Learning for Personalized Search and Recommender Systems

PartII– DeepLearningforPersonalizedRecommenderSystemsatScale

44

Page 45: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

45

Page 46: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

JobSearch

46

Page 47: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

47

Page 48: Deep Learning for Personalized Search and Recommender Systems

itemj fromasetofcandidates

Useriwith<userfeatures,query(optional)>(e.g.,industry,behavioralfeatures,Demographicfeatures,……)

(i, j):responseyijvisits

Algorithmselects

(actionornot,e.g.click,like,share,apply…)

Whichitem(s)shouldwerecommendtotheuser?• Theitem(s)withthebestexpectedutility• Utilityexamples:

• CTR,Revenue,JobApplyrates,Adsconversionrates,…• Canbeacombinationoftheabovefortrade-offs

Personalized Recommender Systems

48

Page 49: Deep Learning for Personalized Search and Recommender Systems

AnExampleArchitectureofPersonalizedRecommender

Systems

49

Page 50: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

UserFeatureStore

ItemStore+Features

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps1

2

4

5

OfflineSystem OnlineSystem

3

AnexampleofRecommenderSystemArchitecture

Itemderivedfeatures

50

Page 51: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

Search-basedCandidateSelection&Retrieval

QueryConstruction

UserFeatureStore

SearchIndexofItems

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps

1

2

3

4 5

6

7

OfflineSystem OnlineSystem

Itemderivedfeatures

AnexampleofPersonalizedSearchSystemArchitecture

51

Page 52: Deep Learning for Personalized Search and Recommender Systems

KeyComponents– OfflineModeling

• Trainthemodeloffline(e.g.Hadoop)• Pushmodeltoonlinerankingmodelstore• Pre-generateuser/itemderivedfeaturesforonlinesystemstoconsume

• E.g.user/itemembeddings fromword2vec/DNNsbasedontherawfeatures

52

Page 53: Deep Learning for Personalized Search and Recommender Systems

KeyComponents– CandidateSelection

• PersonalizedSearch(Withuserquery):• Formaquerytotheindexbasedonuserqueryannotation[Aryaetal.,2016]• Example:PandaExpressSunnyvaleà +restaurant:panda express+location:sunnyvale

• Recommendersystem(Optional):• Canhelpdramaticallyreducethenumberofitemstoscoreinrankingsteps[Cheng,etal.,2016,Borisyuk etal.2016]

• Formaquerybasedontheuserfeatures• Goal:Fetchonlytheitemswithatleastsomematchwithuserfeature

• Example:auserwithtitlesoftwareengineer->+title:software engineerforjobsrecommendation

53

Page 54: Deep Learning for Personalized Search and Recommender Systems

KeyComponents- Ranking

• RecommendationRanking• ThemainMLmodelthatranksitemsretrievedbycandidateselectionbasedontheexpectedutility

• AdditionalRe-rankingSteps• Oftenforuserexperienceoptimizationrelatedtobusinessrules,e.g.

• Diversificationoftherankingresults• Recency boost• Impressiondiscounting• …

54

Page 55: Deep Learning for Personalized Search and Recommender Systems

IntegrationofDeepLearningModelsintoPersonalizedRecommender

SystemsatScale

55

Page 56: Deep Learning for Personalized Search and Recommender Systems

Literature:DeepLearningforRecommendationSystems

• RBMforCollaborativeFiltering[Salakhutdinov etal.,2007]• DeepBeliefNetworks[Hintonetal.,2006]• NeuralAutoregressiveDistributionEstimator(NADE)[Zheng,2016]• NeuralCollaborativeFiltering[He,etal.,2017]• Siamesenetworksforuseritemmatching[Huangetal.,2013]• DeepBeliefNetworkswithPre-training[Hintonetal.,2006]• CollaborativeDeepLearning[Wangetal.,2015]

56

Page 57: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

Search-basedCandidateSelection&Retrieval

QueryConstruction

UserFeatureStore

SearchIndexofItems

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps

1

2

3

4 5

6

7

OfflineSystem OnlineSystem

Itemderivedfeatures

57

Page 58: Deep Learning for Personalized Search and Recommender Systems

OfflineModeling+User/ItemEmbeddings

UserFeatures ItemFeatures

UserEmbeddingVector

ItemEmbeddingVector

Sim(U,I)

UserFeatureStore

ItemStore/IndexwithFeatures

58

Page 59: Deep Learning for Personalized Search and Recommender Systems

QueryFormulation&CandidateSelection

• Issuesofusingrawtext:Noisyorincorrectquerytaggingdueto• Failuretocapturesemanticmeaning

• Ex.Query:Applewatch->+food:apple+product:watchor+product:applewatch?• Multilingualtext

• Query:熊猫快餐 -> +restaurant:pandaexpress• Cross-domainunderstanding

• Peoplesearchvsjobsearch

59

Page 60: Deep Learning for Personalized Search and Recommender Systems

QueryFormulation&CandidateSelection

• RepresentQueryasanembedding

• Expandquerytosimilarqueriesinasemanticspace

• KNNsearchindensefeaturespacewithInvertedIndex[Cheng,etal.,2016]

Q=“AppleWatch”

D=“iphone”

D=“OrangeSwatch”

D=“ipad”

60

Page 61: Deep Learning for Personalized Search and Recommender Systems

RecommendationRankingModels

• WideandDeepModelstocaptureallpossiblesignals[Cheng,etal.,2016]

https://arxiv.org/pdf/1606.07792.pdf

61

Page 62: Deep Learning for Personalized Search and Recommender Systems

Challenges&OpenProblemsforDeepLearningatRecommenderSystems• Distributedtrainingonverylargedata

• Tensorflow onSpark(https://github.com/yahoo/TensorFlowOnSpark)• CNTK(https://github.com/Microsoft/CNTK)• MXNet (http://mxnet.io/)• Caffe (http://caffe.berkeleyvision.org/)• …

• LatencyIssuesfromOnlineScoring• Pre-generationofuser/itemembeddings• Multi-layerscoring(simplemodels=>complex)

• Batchvsonlinetraining

62

Page 63: Deep Learning for Personalized Search and Recommender Systems

PartIII– CaseStudy:JobsYouMayBeInterestedIn(JYMBII)

63

Page 64: Deep Learning for Personalized Search and Recommender Systems

Outline

• Introduction• GeneratingEmbeddingsviaWord2vec• GeneratingEmbeddingsviaDeepNetworks• TreeFeatureTransformsinDeep+WideFramework

64

Page 65: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBII

65

Page 66: Deep Learning for Personalized Search and Recommender Systems

Introduction:ProblemFormulation

• Rankjobsby𝑃 User𝑢appliestoJob𝑗 𝑢, 𝑗)• Modelresponsegiven:

66

CareersHistory,Skills,Education,Connections JobTitle,Description,Location,Company

66

Page 67: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBIIModeling- Generalization

Recommend

• Modelshouldlearngeneralrulestopredictwhichjobstorecommendtoamember.

• Learngeneralizationsbasedonsimilarityintitle,skill,location,etc betweenprofile andjobposting

67

Page 68: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBIIModeling- Memorization

Appliesto

68

• Modelshouldmemorizeexceptionstotherules• Learnexceptionsbasedonfrequentco-

occurrenceoffeatures

Page 69: Deep Learning for Personalized Search and Recommender Systems

Introduction:BaselineFeatures• Dense BoW SimilarityFeaturesforGeneralization

• i.e:Similarityintitletextgoodpredictorofresponse

• Sparse Two-DepthCrossFeaturesforMemorization• i.e:Memorizethatcomputersciencestudentswilltransitiontoentryengineeringroles

VectorBoW SimilarityFeatureSim(UserTitleBoW,JobTitleBoW)

SparseCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

SparseCrossFeatureAND(user=InSiliconValley,job=InAustin,TX)

SparseCrossFeatureAND(user=MLEngineer,job=UXDesigner)

69

Page 70: Deep Learning for Personalized Search and Recommender Systems

Introduction:Issues

• BoW Featuresdon’tcapturesemanticsimilaritybetweenuser/job• CosineSimilaritybetweenApplicationDeveloperandSoftwareEngineeris0

• Generatingthree-depth,four-depthcrossfeatureswon’tscale• i.e.MemorizingthatFactoryWorkersfromDetroit areapplyingtoFrackingjobsinPennsylvania

• Hand-engineeredfeaturestimeconsumingandwillhavelowcoverage• Permutationsofthree-depth,four-depthcrossfeaturesgrowsexponentially

70

Page 71: Deep Learning for Personalized Search and Recommender Systems

Introduction:Deep+WideforJYMBII

• BoW Featuresdon’tcapturesemanticsimilaritybetweenuser/job• GenerateembeddingstocaptureGeneralization throughsemanticsimilarity• Deep+WidemodelforJYMBII[Chengetal.,2016]

SemanticSimilarityFeatureSim(UserEmbedding,JobEmbedding)

GlobalModelCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

UserModelCrossFeatureAND(user=User2,job=JobLatentFeature1)

JobModelCrossFeatureAND(user=UserLatentFeature,job=Job1)

71

SparseCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

SparseCrossFeatureAND(user=InSiliconValley,job=InAustin,TX)

SparseCrossFeatureAND(user=MLEngineer,job=UXDesigner)

VectorBoW SimilarityFeatureSim(UserTitleBoW,JobTitleBoW)

Page 72: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:TrainingWordVectors• KeyIdeas

• Sameusers(context)applytosimilarjobs(target)• Similarusers(target)willapplytothesamejobs(context)

ApplicationDeveloper=>SoftwareEngineer

• Trainwordvectorsviaword2vecskip-gramarchitecture• Concatenateuser’scurrenttitle andtheappliedjob’stitleasinput

UserTitle AppliedJobTitle

72

Page 73: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:ModelStructure

Application,Developer Software,EngineerTokenizedTitles

WordEmbeddingLookupPre-trainedWordVectors

EntityEmbeddingsViaAveragePooling

WordVectors

ResponsePrediction(LogisticRegression)

CosineSimilarity

User Job 73

Page 74: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:ResultsandNextSteps• ReceiverOperatingCharacteristic– AreaUnderCurveforevaluation

• Responsepredictionisbinaryclassification:Applyordon’tApply• Highlyskeweddata:LowCTRforApplyAction• Goodmetricforrankingquality:Focusondiscriminatoryabilityofmodel

• Marginal0.87% ROCAUCGain• Howtoimprovequalityofembeddings?

• Optimizeembeddingsforpredictiontaskwithsupervisedtraining• Leveragerichercontextaboutuserandjob

74

Page 75: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:ModelStructure

User Job

ResponsePrediction(LogisticRegression)

SparseFeatures(Title,Skill,Company)

EmbeddingLayer

HiddenLayer

EntityEmbedding

Hadamard Product(ElementwiseProduct)

75

Page 76: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:HyperParameters,LotsofKnobs!• OptimizerUsed

• SGDw/Momentumandexponentialdecayvs.Adam[Kingma etal.,2015](Adam)• LearningRate

• 10O� to10O� (𝟏𝟎O𝟒)• EmbeddingLayerSize

• 50to200(100)• Dropout

• 0%to50%dropout(0%dropout)• SharingParameterSpaceforbothuser/jobembeddings

• Assumescommunitivepropertyofrecommendations(a+b=b+a)(Nosharedparameterspace)• HiddenLayerSizes

• 0to2HiddenLayers(200->200 HiddenLayerSize)• ActivationFunction

• ReLU vs.Tanh (ReLU)

76

Page 77: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:TrainingChallenges• Millionsofrowsoftrainingdataimpossibletostoreallinmemory

• Streamdataincrementallydirectlyfromfilesintoafixedsizeexamplepool• Addshufflingbyrandomlysamplingfromexamplepoolfortrainingbatches

• Extremedimensionalityofcompanysparsefeature• Reducedimensionalityofcompanyfeaturefrommillions->tensofthousands• Performfeatureselectionbyfrequencyintrainingset

• Hyperparametertuning• DistributegridsearchthroughparallelmodelinginsingledriverSparkjobs

77

Page 78: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:ResultsModel ROC AUC

Baseline Model 0.753

Deep +WideModel 0.790(+4.91%***)

***Forreference,apreviousmajorJYMBIImodelingimprovementwitha20%liftinROCAUC resultedina30%liftinJobApplications

78

Page 79: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TheCurrentDeep+WideModel

DeepEmbeddingFeatures(FeedForwardNN)

• Generatingthree-depth,four-depthcrossfeatureswon’tscale• Smartfeatureselectionrequired

WideSparseCrossFeatures(Two-Depth)

79

Page 80: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:FeatureSelectionviaGradientBoostedDecisionTrees

Eachtreeoutputsapathfromroottoleafencodingacombinationoffeaturecrosses[Heetal.,2014]

GDBT’sselectthemostusefulcombinationsoffeaturecrossesformemorization

MemberSeniority:VicePresident

Yes

No

MemberIndustry:Banking

Yes

No

MemberLocation:SiliconValley

MemberSkill:Statistics

Yes No

80

Yes No

JobSeniority:CXO

NoYes

JobTitle:MLEngineer

Yes No

Page 81: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TheFullPicture

HowtotrainboththeNNmodelandGBDTmodeljointlywitheachother?

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GBDT)

81

Page 82: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:JointTrainingviaBlock-wiseCyclicCoordinateDescent• TreatNNmodelandGBDTmodelasseparateblock-wisecoordinates• Implementedby

1. TrainingtheNNuntilconvergence2. TrainingGBDTw/fixedNNembeddings3. Trainingtheregressionlayerweightsw/generatedcrossfeaturesfromGBDT4. TrainingtheNNuntilconvergencew/fixedcrossfeatures5. Cyclestep2-4untilglobalconvergencecriteria

82

Page 83: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainNNUntilConvergence

Initiallynotreesareinourforest

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

83

Page 84: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainGDBTw/NNSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

84

Page 85: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainGDBTw/NNSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

85

Page 86: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainRegressionLayerWeights

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

86

Page 87: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainNNw/GDBTSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

87

Page 88: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:Block-wiseCoordinateDescent ResultsModel ROC AUC

Baseline Model 0.753

Deep +WideModel 0.790(+4.91%)

Deep +WideModelw/GBDTIteration1 0.792(+5.18%)

Deep +WideModelw/GBDTIteration2 0.794(+5.44%)

Deep +WideModelw/GBDTIteration3 0.795 (+5.57%)

Deep +WideModelw/GBDTIteration4 0.796(+5.71%)

88

Page 89: Deep Learning for Personalized Search and Recommender Systems

JYMBIIDeep+Wide:FutureDirection

• GeneratingEmbeddingsw/LSTMNetworks• Leveragesequentialcareerhistorydata• PromisingresultsinNEMO:NextCareerMovePredictionwithContextualEmbedding[Lietal.,2017]

• Semi-SupervisedTraining• Leveragepre-trainedtitle,skill,andcompanyembeddingsonprofiledata

• ReplaceHadamard Productforentityembeddingsimilarityfunction• DeepCrossing[Shanetal.,2016]

• Addevenrichercontext• i.e.Location,Education,andNetworkfeatures

89

Page 90: Deep Learning for Personalized Search and Recommender Systems

PartIV– CaseStudy:DeepLearningNetworksforJobSearch

90

Page 91: Deep Learning for Personalized Search and Recommender Systems

Outline

• Introduction• RepresentationsviaWord2vec• RobustRepresentationsviaDSSM

91

Page 92: Deep Learning for Personalized Search and Recommender Systems

Introduction:JobSearch

92

Page 93: Deep Learning for Personalized Search and Recommender Systems

Introduction:SearchArchitecture

Index

Indexer

Top-Kretrieval

ResultsOffline Training/Model

Result Ranking

User QueryQueryUnderstanding

93

Page 94: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding-SegmentationandTagging• Firstdividethesearchqueryintosegments

• Tagquerysegmentsbasedonrecognizedentitytags

OracleJava

Application Developer

OracleJava Application Developer

QuerySegmentations

COMPANY = Oracle SKILL = Java

TITLE = Application Developer

COMPANY = Oracle TITLE = Java Application

Developer

QueryTagging

94

Page 95: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding–Expansion• Taskofaddingadditionalsynonyms/relatedentitiestothequerytoimproverecall

• CurrentApproach:Curateddictionaryforcommonsynonymsandrelatedentities

COMPANY = Oracle OR NetSuite OR Taleo OR Sun Microsystems OR …

SKILL = Java OR Java EE OR J2EE OR JVM OR JRE OR JDK …

TITLE = Application Developer OR Software Engineer OR

Software Developer ORProgrammer …

Green – SynonymsBlue – RelatedEntities

95

Page 96: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding- RetrievalandRanking

COMPANY = Oracle OR NetSuite OR Taleo OR Sun Microsystems OR …

SKILL = Java OR Java EE OR J2EE OR JVM OR JRE OR JDK …

TITLE = Application Developer OR Software Engineer OR

Software Developer ORProgrammer …

Title

Title

Skills

Company

96

Page 97: Deep Learning for Personalized Search and Recommender Systems

Introduction: Issues– RetrievalandRanking

• Termretrievalhaslimitations• Crosslanguageretrieval

• Softwareentwickleró Softwaredeveloper• WordInflections

• EngineeringManagementó EngineeringManager

• Queryexpansionviacurateddictionaryofsynonymsisnotscalable• Expensivetorefreshandstoresynonymsforallpossibleentities

• Heavyrelianceonquerytaggingisnotrobustenough• Noveltitle,skill,andcompanyentitieswillnotbetaggedcorrectly• Errorsupstreampropagatestopoorretrievalandranking

97

Page 98: Deep Learning for Personalized Search and Recommender Systems

Introduction:Solution– DeepLearningforQueryandDocumentRepresentations• Queryanddocumentrepresentations

• Mapqueriesanddocumenttexttovectorsinsemanticspace• RobusttoHandleOutofVocabularywords

• Termretrievalhaslimitations• Queryexpansionviacurateddictionaryofsynonymsisnotscalable

• Mapsynonyms,translationsandinflectionstosimilarvectorsinsemanticspace• TermretrievalonclusteridorKNNbasedretrieval

• Heavyrelianceonquerytaggingisnotrobustenough• Complimentstructuredqueryrepresentationswithsemanticrepresentations

98

Page 99: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:LeverageJYMBIIWork• KeyIdeas

• Similarusers(context)applytothesamejob(target)• Thesameuser(target)willapplytosimilarjobs(context)

ApplicationDeveloper=>SoftwareEngineer

• Trainwordvectorsviaword2vecskip-gramarchitecture• Concatenateuser’scurrenttitle andtheappliedjob’stitleasinput

UserTitle AppliedJobTitle

99

Page 100: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:Word2vecinRanking

Application,Developer Software,EngineerTokenizedText

WordEmbeddingLookupPre-trainedWordVectors

EntityEmbeddingsViaAveragePooling

WordVectors

LearningtoRankModel(NDCGLoss)

CosineSimilarity

JobQuery 100

Page 101: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:RankingModelResultsModel Normalized Cumulative

DiscountedGain@5(NDCG@5)CTR@5(%)

BaselineModel 0.582 +0.0%

BaselineModel+Word2VecFeature 0.595(+2.2%) +1.6%

101

Page 102: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:OptimizeEmbeddingsforJobSearchUseCase• Leverageapplyandclickfeedbacktoguidelearningofembeddings

• Finetuneembeddingsfortaskusingsupervisedfeedback

• Handleoutofvocabularywordsandscaletoqueryvocabularysize• ComparedtoJYMBII,queryvocabularyismuchlargerandlesswell-formed

• Misspellings• WordInflections• Freetextsearch

• Needtomakerepresentationsmorerobustforthesefreetextqueries

102

Page 103: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DeepStructuredSemanticModel[Huangetal.,2013]

Query AppliedJob(Positive)

ApplicationDeveloper SoftwareEngineerRawText

#Ap,App,ppl… #So,Sof,oft…Tri-letterHashing #Ha,Hai,air…

HairdresserRandomlySampled

AppliedJob(Negative)

HiddenLayer3

HiddenLayer2

HiddenLayer1

CosineSimilarity

Softmax w/CrossEntropyLoss

103

Page 104: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:Tri-letterHashing• Tri-letterHashingExample

• Engineer->#en,eng,ngi,gin,ine,nee,eer,er#

• BenefitsofTri-letterHashing• MorecompactBagofTri-lettersvs.BagofWordsrepresentation

• 700KWordVocabulary->75KTri-letters• Cangeneralizeforoutofvocabularywords• Tri-letterhashingrobusttominormisspellingsandinflectionsofwords

• Engneer ->#en,eng,ngn,gne,nee,eer,er#

104

Page 105: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:TrainingDetails

105

• ParameterSharingHelps• Betterandfasterconvergence• Modelsizeisreduced

• Regularization• L2performsbetterthandropout

• ToolkitComparisons(CNTKvsTensorFlow)• CNTK:Fasterconvergenceandbettermodelquality• TensorFlow:Easytoimplementandbettercommunitysupport.Comparativemodelquality

Trainingperformancewith/oparametersharing

Page 106: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:LessonsinProductionEnvironment

106

+100%

+70%

+40%

• BottlenecksinProductionEnvironment

• Latencyduetoextracomputation• LatencyduetoGCactivity• FatJarsinJVMenvironment

• PracticalLessons• AvoidJVMHeapwhileservingthemodel

• Cachingmostaccessedentities’embedding

Page 107: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMQualitativeResultsSoftwareEngineer DataMining LinkedIn Softwareentwickler

EngineerSoftware DataMiner Google Software

SoftwareEngineers MachineLearningEngineer

SoftwareEngineers SoftwareEngineer

SoftwareEngineering Microsoft Research SoftwareEngineer EngineerSoftware

Forqualitativeresults,onlytopheadqueriesaretakentoanalyzesimilaritytoeachother

107

Page 108: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMMetricResultsModel Normalized Cumulative

DiscountedGain@5(NDCG@5)CTR@5Lift (%)

BaselineModel 0.582 +0.0%

BaselineModel+Word2Vec Feature 0.595(+2.2%) +1.6%

BaselineModel+DSSM Feature 0.602(+3.4%) +3.2%

108

Page 109: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMFutureDirection• LeverageCurrentQueryUnderstandingIntoDSSMModel

• Querytagentityinformationforrichercontextembeddings• Querysegmentationstructurecanbeconsideredintothenetworkdesign

• DeepCrossingforSimilarityLayer[Shanetal.,2016]• ConvolutionalDSSM[Shenetal.,2014]

109

Page 110: Deep Learning for Personalized Search and Recommender Systems

Conclusion

• RecommenderSystemsandpersonalizedsearchareverysimilarproblems

• DeepLearningisheretostayandcanhavesignificantimpactonboth• Understandingandconstructingqueries• Ranking

• Deeplearningandmoretraditionaltechniquesare*not*mutuallyexclusive(hint:Deep+Wide)

110

Page 111: Deep Learning for Personalized Search and Recommender Systems

References• [Rumelhart etal.,1986]Learningrepresentationsbyback-propagatingerrors,Nature1986• [Hochreiter etal.,1997]Longshort-termmemory,Neuralcomputation 1997• [LeCun etal.,1998]Gradient-basedlearningappliedtodocumentrecognition, ProceedingsoftheIEEE 1998

• [Krizhevsky etal.,2012]Imagenet classificationwithdeepconvolutionalneuralnetworks, NIPS2012

• [Gravesetal.,2013]Speechrecognition with deep recurrent neuralnetworks,ICASSP2013• [Mikolov,2012]Statisticallanguage models based on neuralnetworks,PhD Thesis,BrnoUniversity of Technology,2012

• [Kalchbrenner etal.,2013]Recurrent continuous translation models,EMNLP2013• [Srivatstava,2013]Improving neuralnetworks with dropout,PhD Thesis,University of Toronto,2013

• [Sustkever etal.,2014]Sequence tosequence learningg with neuralnetworks,NIPS2014• [Vinyals etal.,2014]Showandtell:aneuralimagecaption generator,Arxiv 2014• [Zaremba etal.,2015]Recurrent NeuralNetworkRegularization,ICLR2015

111

Page 112: Deep Learning for Personalized Search and Recommender Systems

References(continued)• [Aryaetal.,2016]PersonalizedFederatedSearchatLinkedIn,CIKM2016• [Chengetal.,2016]Wide&DeepLearningforRecommenderSystems,DLRS2016• [Heetal.,2014]PracticalLessonsfromPredictingClicksonAdsatFacebook,ADKDD2014• [Kingma etal.,2015]Adam:AMethodforStochasticOptimization,ICLR2015• [Huangetal.,2013]LearningDeepStructuredSemanticModelsforWebSearchusingClickthrough Data,CIKM2013• [Lietal.,2017]NEMO:NextCareerMovePredictionwithContextualEmbedding,WWW2017• [Shanetal.,2016]DeepCrossing:Web-scalemodelingwithoutmanuallycraftedcombinatorialfeatures,KDD2016• [Zhangetal.,2016]GLMix:GeneralizedLinearMixedModelsForLarge-ScaleResponsePrediction,KDD2016• [Salakhutdinov etal.,2007]RestrictedBoltzmannMachinesforCollaborativeFiltering,ICML2007• [Zheng,2016]http://tech.hulu.com/blog/2016/08/01/cfnade.html• [Hintonetal.,2006]Afastlearningalgorithmfordeepbeliefnets,NeuralComputations2006• [Wangetal.,2015]CollaborativeDeepLearningforRecommenderSystems,KDD2015• [Heetal.,2017]NeuralCollaborativeFiltering,WWW2017• [Borisyuk etal.2016].CaSMoS:AFrameworkforLearningCandidateSelectionModelsoverStructuredQueriesand

Documents,KDD2016

112

Page 113: Deep Learning for Personalized Search and Recommender Systems

References(continued)

• [netflix recsys]http://nordic.businessinsider.com/netflix-recommendation-engine-worth-1-billion-per-year-2016-6/

• [SanJoseMercuryNews]http://www.mercurynews.com/2017/01/06/at-linkedin-artificial-intelligence-is-like-oxygen/

• [Instagramblog]http://blog.instagram.com/post/145322772067/160602-news

113