Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta...
Transcript of Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta...
MetaStructure:Compu/ngRelevanceinLargeHeterogeneousInforma/onNetworks
ZhipengHuangh@p://i.cs.hku.hk/~zphuang/
Introduc/on
• Compu/ngrelevanceonnetworks(e.g.,socialnetwork,co-authornetwork)supportsmanyapplica/ons:– similaritysearch– recommenda/on
• Manymeasureshavebeenstudied:– Jaccardcoefficient,commonneighbors,shortestpath– PageRank, PersonalizePageRank, SimRank,etc.
HeterogeneousInforma/onNetwork
• HIN:Directedgraphwithmul/plenodetypesandedgetypes.
a1 a2 a3
p1,2p1,1 p2,1 p2,2 p3,2p3,1
v1 v2 v3 v4t1 t2 t3 t4
KDD “mining” AAAIVLDB “efficient” “privacy”
AAAI’15 VLDB’15KDD’15KDD’07
ICDM “social”
ICDM’12
write publishmention
VLDB’06
author paper venue topicobject types:
edge types:
MetaPath-BasedRelevanceMeasures
• MetaPath:asequenceofnodeandedgetypes.
• Measures:PathCount[1],PathSim[1]andPCRW[2]• Source:Automa/callygeneratemetapath(WWW’15)• Limita&on:Failtodiscovercommonnodes.– Example:Aresearcherwantstosearchforsomeauthorswhohavepublishedpapersinthesamevenueandinthesametopicwithhispapers.
LinearCombina/on
• R(a1,a2)= R(a1,a2|P1)+R(a1,a2|P2)= 1+1= 2• R(a2,a3)= R(a2,a3|P1)+R(a2,a3|P2)= 1+1= 2
a1 a2 a3
p1,2p1,1 p2,1 p2,2 p3,2p3,1
v1 v2 v3 v4t1 t2 t3 t4
KDD “mining” AAAIVLDB “efficient” “privacy”
AAAI’15 VLDB’15KDD’15KDD’07
ICDM “social”
ICDM’12
write publishmention
VLDB’06
author paper venue topicobject types:
edge types:
MetaStructure
• Apowerfulextensionofmetapath,adirectedacyclicgraph(DAG).
• MorePowerful.– Containmoreinforma/onthanametapath.Canexpressmoreseman/cmeaning.
• Challenges:– Howtodefinemeasuresbasedonmetastructure?– Morecomplexleadstohighcomputa/onalcost.– Howtoderiveametastructure?(Notyetstudiedwell)
RelevanceMeasures
• StructCount:extensionofPathCount• StructureConstrainedRandomWalk
• BiasedStructureConstrainedRandomWalk,acombina/onoftheprevioustwomeasures.
)|,()|,( 0000 SyxGraphInsSyxtStructCoun =
1.01.0
1.0 1.0
0.5
0.25
0.0
0.0
0.0
0.5 0.0
RecursiveTree
• TocalculatetheBSCSErelevanceofa2anda1:
a1 a2 a3
p1,2p1,1 p2,1 p2,2 p3,2p3,1
v1 v2 v3 v4t1 t2 t3 t4
KDD “mining” AAAIVLDB “efficient” “privacy”
AAAI’15 VLDB’15KDD’15KDD’07
ICDM “social”
ICDM’12
write publishmention
VLDB’06
author paper venue topicobject types:
edge types:
i-LTable
• Indextheprobabilitydistribu/onstar/ngfromthei-thlevelofametastructure.
a1 a2 a3
p1,2p1,1 p2,1 p2,2 p3,2p3,1
v1 v2 v3 v4t1 t2 t3 t4
KDD “mining” AAAIVLDB “efficient” “privacy”
AAAI’15 VLDB’15KDD’15KDD’07
ICDM “social”
ICDM’12
write publishmention
VLDB’06
author paper venue topicobject types:
edge types:
Experiment:En/tyResolu/on
• Tofindduplicateden//esinYAGO– Barack_ObamaandPresidency_Of_Barack_Obama
• Metric:AUC
P1 P2
Measure PathCount PCRW PathSim PathCount PCRW PathSim
AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002
LinearCombina/on(op/mal) MetaStructureS
Measure PathCount PCRW PathSim SC SCSE BSCSE*
AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.5640
RelevanceRanking
• WelabeltherelevanceofvenuesinDBLP_4_Area.
• 0fornotrelevant, 1forrelevantand2forstronglyrelevant.
• Considerbothscopeandlevelofthevenues.(likeSIGMODandVLDBare2)
• NormalizedDiscountedCumula/veGain(NDCG)
RelevanceRanking
P1 P2
Measure PathCount PCRW PathSim PathCount PCRW PathSim
nDCG 0.9004 0.9047 0.9083 0.8224 0.8901 0.8834
LinearCombina/on(op/mal) MetaStructureS
Measure PathCount PCRW PathSim SC SCSE BSCSE*
nDCG 0.9004 0.9100 0.9083 0.9056 0.9104 0.9130
Reference • [1]SunYizhou,etal.“Pathsim:Metapath-basedtop-k
similaritysearchinheterogeneousinforma/onnetworks."VLDB’11(2011).
• [2]Lao,Ni,andWilliamW.Cohen."Rela/onalretrievalusingacombina/onofpath-constrainedrandomwalks."Machinelearning81.1.010):53-67
• [3]Meng,Changping,etal."Discoveringmeta-pathsinlargeheterogeneousinforma/onnetworks."WWW’15.
• [4]ZhipengHuang,YudianZheng,ReynoldCheng,YizhouSun,NikosMamoulis,XiangLi,“MetaStructure:Compu/ngRelevanceinLargeHeterogeneousInforma/onNetworks”,SIGKDD’16