Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1...

6
1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen June 12, 2008 Outline • About me • Expert search & recommendaGon • Recommending scienGfic arGcles About me • EducaGon 1997‐2001 Master’s degree in InformaGon Management & Technology 2002‐2004 Master’s degree in ComputaGonal LinguisGcs & ArGficial Intelligence • Employment 2005‐now PhD student in the A Propos project about pro‐acGve document recommendaGon • Teaching 2006‐now various guest lectures about search engines and IR 2007 InformaGon Search, Retrieval, and RecommendaGon 2008 InformaGon Search, Retrieval, and RecommendaGon About me Tilburg University Outline • About me • Expert search & recommendaGon – DefiniGon & history – Tasks & approaches – A look at evaluaGon & test collecGons – ExperGse seeking – A university‐wide expert search engine • Recommending scienGfic arGcles

Transcript of Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1...

Page 1: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

1

RecommendingExpertsandScien3ficAr3cles

ToineBogers

Researchtalk@RSLIS,Copenhagen

June12,2008

Outline

•  Aboutme

•  Expertsearch&recommendaGon

•  RecommendingscienGficarGcles

Aboutme

•  EducaGon–  1997‐2001Master’sdegreeinInformaGonManagement&Technology

–  2002‐2004Master’sdegreeinComputaGonalLinguisGcs&ArGficialIntelligence

•  Employment–  2005‐nowPhDstudentintheAProposprojectaboutpro‐acGvedocument

recommendaGon

•  Teaching–  2006‐nowvariousguestlecturesaboutsearchenginesandIR–  2007InformaGonSearch,Retrieval,andRecommendaGon

–  2008InformaGonSearch,Retrieval,andRecommendaGon

Aboutme

TilburgUniversity Outline

•  Aboutme

•  Expertsearch&recommendaGon–  DefiniGon&history–  Tasks&approaches–  AlookatevaluaGon&testcollecGons–  ExperGseseeking–  Auniversity‐wideexpertsearchengine

•  RecommendingscienGficarGcles

Page 2: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

2

Whatisit?

•  Basically:searchingforexpertsinsteadofdocuments

Historyofexpertsearch

•  In80’sand90’s–  Implementedaslarge‐scaledatabasescontainingemployeeskills

–  Problems•  Putstheworkloadonemployees

•  ‘Unnatural’approach•  Easilyout‐of‐date

•  TREC2005EnterpriseTrackintroducedtheExpertSearchTask–  Large‐scaleevaluaGoneffortofexpertfinding

•  2005&2006:W3CcollecGon•  2007&2008:CSIROcollecGon

–  HugeboostinresearchintoautomaGcapproaches

–  Usuallyco‐occurrenceofpeopleandtopicsisseenasevidenceofexperGse

Evidenceofexper3se

•  Content‐basedevidence–  Documents

–  E‐mails–  Homepages

•  Evidencefromsocialnetworks–  OrganizaGonalstructure–  E‐mailnetworks–  BibliographicinformaGon

•  AcGvity‐basedevidence–  ProjectGme

–  Searchhistory–  PublicaGonhistory

Tasksandapproaches

•  Differenttasks–  Expertfinding

•  Findtheexpertsonaspecifictopic–  Expertprofiling

•  Findoutwhatoneexpertknowsabout differenttopics

–  Recommendingsimilarexperts•  Findexpertswhosharethesameprofiles

Evalua3on

•  MajorityofworkisevaluatedusingTRECcollecGons–  W3CcollecGon

•  5.7GBand331,037documents(Webpages,mailinglists,projectpages)

•  Topicsaregroupnames•  Relevancejudgments

–  2005:groupmembersareexperts

–  2006:TRECparGcipantsjudgeexperGsethemselves

–  CSIROcollecGon•  4.2GBand370,715documents(similardiversityasW3C)

• WorktaskscreatedbyactualCSIROsciencecommunicators–  Goalistocreateanoverviewpageonacertaintopic

•  Relevancejudgmentsdonebysciencecommunicatorsin2007and2008

UvTExpertCollec3on

•  ProblemswithTRECcollecGons–  ExperGseisneverself‐assessed–  OnlyonespecifictypeoforganizaGon–  OnlyinEnglish

•  WethereforecreatedtheUvTExpertCollecGon–  Crawlofamedium‐sizedDutchuniversity

–  BasedonWebwijs(“Webwise”),ouronlineexpertprofilingdatabase•  1168experts•  1400self‐assessedexperGsetopics•  Bilingual(DutchandEnglish)

–  DocumentsincludepublicaGons,coursepages,researchdescripGons,andhomepages

–  InformaGonaboutorganizaGonalstructureandtopichierarchy

–  SeeSIGIR‘07paperformoreinformaGon

Page 3: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

3

Exper3seseeking

•  AllexpertfindingworksofarhasbeenfromanIRperspecGve–  WhatismissingisanISperspecGve:exper9seseeking

•  Whatwedidtoremedythis–  Focusedonthetaskofrecommendingsimilarexperts

•  Scenariosketch:“Themediawishestocommunicatewiththetopexpert,butheisunavailableforawhile.Whowouldyourecommendtotaketheirplace?”

–  Got6ofouruniversity’scommunicaGonadvisorstoparGcipateinourstudy

–  Two‐foldpurposeofourquesGonnaire•  InvesGgateexperGseseekingbehavior•  GetrealisGcrelevancejudgmentsforthe‘similarexperts’‐task

–  Hadtojudge10recommendedexpertsfor10familiar‘focus’experts

–  SeeSIGIR‘08workshoppaperformoreinformaGon

Exper3seseeking

•  InvesGgateexperGseseekingbehavior–  Inspiredby2007IP&MpaperbyWoudstraandVandenHooff

•  IdenGfied11importantfactorsforsourceselecGon(topicofknowledge,familiarity,reliability,availability,perspec9ve,up‐to‐dateness,approachability,cogni9veeffort,contacts,physicalproximity,saves9me)

–  AskedparGcipantstodescribe•  TypicalrequestsforexperGse•  Reasonsforpickingandnotpickingspecificexperts•  Howimportanteachfactorwasfortheirdecisions

•  Somefindings–  Topicofknowledgewasmostimportantinrecommendingsomeone

–  Familiaritywiththeexpertwasalsoimportant–  NewfactorsweidenGfied

•  OrganizaGonalstructure(professorsandprojectleadersarepreferred)• Mediaexperience(“oneofthemisnotsuitablefortalkingtothemedia”)

Exper3seseeking

•  GetrealisGcrelevancejudgments–  Used44uniquefocusexpertsdividedoverthe6PRadvisors(10each)–  First,parGcipantswereaskedfortheirownsuggesGons–  Generated10recommendedexpertsforeachusingsystempooling

–  ParGcipantsthenrankedthesesuggestedexpertsona10‐pointscale•  Integratedthefactorsintoexpertfindingmodels

–  EvaluatedusingMRRandNDCG@10

•  Somefindings–  Bestbaselineapproachcombinedtermsfromdocumentswiththe

self‐assessedexperGseareas

–  Integratedthefollowingfactorsintoretrievalmodels:organiza9onalstructure,mediaexperience,reliability,up‐to‐dateness,qualityofcontacts

–  Significantimprovementsusingreliability,up‐to‐dateness,andorganiza9onalstructure

Auniversity‐wideexpertsearchengine

•  WorkinprogressbyMaster’sstudentRuudLiebregts–  DesigningandevaluaGngauniversity‐wideexpertsearchengine

•  Design–  DatasourcesincludepublicaGons,theses,coursedescripGons,research

descripGons,self‐assessedexperGseareas

–  Allowsforfilteringonlanguageandfaculty–  ShowscollaboraGonnetworksforpapersandthesissupervision

•  EvaluaGon–  System‐basedevaluaGon

•  240testtopics–  120Dutchand120English–  120basedonthesissupervisorsand120basedonpaperauthors

•  Goldstandardjudgmentsfromuser‐basedevaluaGon(seenextslide)

Auniversity‐wideexpertsearchengine

•  EvaluaGon(cont’d)–  User‐basedevaluaGon

•  30employeesaskedto–  DescribeoneoftheirexperGseareas–  Listandrank5possibleexperts–  Formulateaquerybasedonthetopic

–  Judgethetop10searchengineresults•  ±30UvTstudentswillberandomlyassigned

–  3outof5possibleexpertfindingworktasks–  3outof5possiblesupervisorfindingworktasks–  Comparingthenewsearchenginevs.everythingelsetheuniversityhastooffer

–  Usingthebaseline3Gmesandthenewsearchengine3Gmes

•  Possibly–  ±30peopleexternaltoTilburgUniversity–  Twoclassesof±50Dutchhighschoolseniors

Auniversity‐wideexpertsearchengine

Page 4: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

4

Auniversity‐wideexpertsearchengine Outline

•  Aboutme

•  Expertsearch&recommendaGon

•  RecommendingscienGficarGcles–  Whatisit?

–  Approaches–  Socialbookmarking

–  RecommendingusingCiteULike

–  Atthelibraryschool–  Futurework

Whatisit?

•  FormaldefiniGon–  ArecommendersystemtriestoidenGfysetsofitemsthatarelikelytobeof

interesttoacertainusergivensomeinformaGonfromthatuser’sprofile.

•  MorecasualdefiniGon

FEEL VIOLATED IN THEIR PRIVACY “Customerswhoboughtthisproduct

onenboughtabagofpotatochipswithit.”

BEER

Approachestorecommenda3on

•  Somepopularapproaches–  Mostpopularitem(non‐personalizedrecommendaGon)

–  DemographicrecommendaGon(usesuserfeatures)–  Content‐basedrecommendaGon(usingIRmodels)

•  Useitemfeaturestofinditemssimilartopastitems

•  Goodcontentmatchbetweenitems,butnoqualitycontrol

–  Collabora9vefiltering(miningusagepaoerns)•  Usesuser‐itempreferences(e.g.explicitraGngsdata,purchasedata)

•  Goodforareaswherecontentanalysisishard(e.g.movies,music)

•  Twotypes–  User‐basedfiltering–  Item‐basedfiltering

•  User‐basedfiltering(findsimilarusers)

•  Item‐basedfiltering(findsimilaritems)

Approachestorecommenda3on

5 8 4 10 7 8

10 1 6

2 3 10 4 9 9

7 3 9 10

1 5

2 9 5 10

U1

users

items

I1 I2 I3 I4 I5 I6 I7 I8 I9 I10

U2

U3

U4

U5

Ux ?

Socialbookmarking

•  Wayofstoring,organizing,andmanagingbookmarksofWebpages,scienGficarGcles,books,etc.–  Userscanaddbookmarks–  Canbemadepublicorkeptprivate

–  Onenallowuserstotag/describetheiritem

–  Lotsofsocialbookmarkingservicesavailable

Page 5: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

5

Researchsofar

•  TherearegoldenopportuniGeshere!–  Tonsoffree,usefuldata

•  Largeamountsofcontentdescribedusingtagsandothermetadata

•  UsersrevealinformaGonaboutthemselvesbyaddingandtaggingitems•  Treasuretroveofuser‐itempreferences

–  Canbeusedtopredictnewitems

•  However,researchsGllinitsinfancy–  MostlyexploratoryandtheoreGcal

–  SomescaoeredaoemptsatimprovingIRusingtags–  RecommendaGonforsocialbookmarking

• MostlytagrecommendaGon(easytoevaluate)

•  Andofcoursethere’sStumbleUpon

Mainfocus

•  Mymainfocus–  RecommendinginteresGngbookmarksbasedonuserprofilesfromsocial

bookmarkingwebsites

–  Experimentwithdifferent•  Algorithms

•  ContextualrepresentaGons•  Aspects(temporal,growthcurves,spam,duplicates)

•  CombinaGonsofapproaches(datafusion)

–  EvaluaGon•  System‐basedevaluaGon

•  User‐basedevaluaGon–  Preferablyfortwodifferentareas

•  ScienGficarGcles(CiteULike,Bibsonomy)• Webpages(Delicious)

CiteULike

•  SocialbookmarkingforscienGficpapers–  Accordingtotheirwebsite,itis“afreeservicetohelpyoutostore,

organise,andsharethescholarlypapersyouarereading”

–  Somefeatures•  ArGclemetadata

•  Tagging•  Groups•  Comments

•  ReadingprioriGes•  BatchimporGng

CiteULike

•  CreaGngacollecGon–  Dailydatabasedumpsavailable

•  Containuser‐item‐tagtripleswithGmestamps

•  ButnoneoftheaddiGonalinformaGonavailableonthewebsite

–  UsedtheNovember2,2007dumpasastarGngpoint

–  Crawledtherestofthewebsite•  ArGcleandusermetadata

•  GroupinformaGon

•  ReadingprioriGes

•  SomestaGsGcs–  803,521items(metadataavailablefor67%)

–  25,375users(29%spamprofiles)

–  232,937tags

Experimentalsetup&evalua3on

•  System‐basedevaluaGon–  Weknowwhatpapersauserlikedfromhisprofile

•  Howwellcanwepredictwhatwealreadyknow?•  Userprofileswehaveareuser‐itempairs

–  Formalsetup•  Takeout10itemsfromeachuserprofile

•  Trainonremainingprofile,predictmissingitems

•  Userswith≥20itemsandarGclesaddedatleasttwice

•  10‐foldcross‐validaGontopreventoverfitng

–  EvaluaGon•  Ifwerecommendthemissingitems,that’sgood!

• MAP,MRR,Precision@10,usercoverage

–  Wecanusethissamesetupforallexperiments

Atthelibraryschool

•  FirstexperimentsusingcollaboraGvefiltering–  BestmodelhasaMAPof0.2478andsimilarP@10

–  User‐basedfilteringperformedbest•  OpGmalnumberofneighborswas5

–  Usercoverageishighat99.6%•  Forhowmanyuserscanwepredictsomething?

•  SomeuserstooneworeclecGc

–  Difficulttaskbecauseofhighsparsity(99.98%)• MAPof1.0notnecessarilyachievable(orrealisGc)

–  Performanceokay,butroomforimprovement

Page 6: Outline Recommending Experts andtoinebogers.com/content/slides/200806-recommending... · 1 Recommending Experts and Scien3fic Ar3cles Toine Bogers Research talk @ RSLIS, Copenhagen

6

Atthelibraryschool

•  WhatcontextdowehaveinCiteULike?

(3) session context

signs

(1) intra-object

structures

(2) inter-object structures

(7) historic contexts

(6) techno-economic and societal contexts

(4) individual

(4-5) social, systemic,

conceptual, emotional contexts

(5) collective

IngwersenandJärvelin(2006)

Atthelibraryschool

•  WhatcontextdowehaveinCiteULike?

(1) Intra‐objectstructuresProperGesofthedocumentsthemselves,such

asarGclemetadataandtheabstract(available33%oftheGme)

(2) Inter‐objectstructuresRelaGonsbetweendocuments,suchasthose

availablethroughauthorshipinformaGon,assignedtags,and

inclusionbyusers.

(4‐5)Social,systemic,conceptual,andemo3onalcontextsThefolksonomy

canrepresentsocial,conceptual,andemoGalcontext.The

informaGonaboutthegroupsandtheusagepaoernsareallsocial contextfortherecommender.

(7) HistoricalcontextsAcGvitylogsallowfor,forinstance,temporalanalysis.

Atthelibraryschool

•  Birthofarecommender–  WhenisasocialbookmarkingwebsitebigenoughforrecommendaGons?

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

2008200720062005

MA

P

date

model 1 (item-based w/ cosine)model 2 (item-based w/ cond. prob.)

model 3 (user-based w/ cosine)

Atthelibraryschool

•  Node‐duplicaGonbyCiteULikeuponentry•  Manyduplicates

–  EarlyesGmatesofaround10%(onmanuallyannotatedtestset)• MismatchesonGtle,year,authors,etc.

–  With20%ofthosearGcleshavingover20duplicates

–  Mostduplicatesoccuronlyonce•  But:someduplicatesareverypopular(40occurrencesvs.31)

•  Whateffectdoesthishaveonperformance?–  Canwerejoindisconnectedbutsimilarusers?

Collective dynamics of "small-world" networks Collective dynamics of 'small world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networks.Collective dynamics of 'small-world' networks. Collective dynamics of 'small-world' networks.Collective dynamics of 'small-world' networks. Collective dynamics of ``small-world'' networksCollective dynamics of ``small-world'' networks Collective dynamics of `small-world' networksCollective dynamics of `small-world' networks Collective dynamics of `small-world' networksCollective dynamics of `small-world' networks Collective dynamics of `small-world' networks.Collective dynamics of small-world networks Collective dynamics of small-world networksCollective dynamics of small-world networks Collective dynamics of small-world networksCollective dynamics of small-world networks. Collective dynamics of small-world networks.Collective dynamics ofsmall-worldnetworks. Collective dynamics ofsmall-worldnetworks.

Futurework

•  Experimentwithdifferent–  Algorithms

–  ContextualrepresentaGons–  Aspects(temporal,growthcurves,spam,duplicates)

–  CombinaGonsofapproaches(datafusion)

–  Datasets•  User‐basedevaluaGon

–  PickoneortwotasksinpaperrecommendaGondefinedbyMcNee(2006)• Maintainawareness

•  Exploreresearchinterest•  Findmorelikethis

–  Evaluatealgorithmsusingusers•  Basedontheiractualprofile•  BysimulaGngoneoftheserecommendertasks

Ques3ons?Comments?Sugges3ons?