Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
-
Upload
i-tiddi -
Category
Presentations & Public Speaking
-
view
91 -
download
0
description
Transcript of Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
![Page 1: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/1.jpg)
Dedalo: looking for Clusters’ Explanations in aLabyrinth of Linked Data
Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta
Knowledge Media Institute, The Open University
May 28, 2014
![Page 2: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/2.jpg)
The Knowledge Discovery process
• Explaining patterns requires background knowledge.
• Background knowledge is attributed to the experts.
• Background knowledge comes from different domains.
• Experts might not be aware of some background knowledge.
![Page 3: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/3.jpg)
Explaining clusters: an example
Authors clustered according to the papers they wrote together.
How to explain those clusters?
![Page 4: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/4.jpg)
Explaining clusters – the easy solution
Use an expert
“each cluster represents a research group in KMi ”
Can one trust those experts?
![Page 5: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/5.jpg)
Explaining clusters – the easy solution
Use an expert
“each cluster represents a research group in KMi ”
Can one trust those experts?
![Page 6: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/6.jpg)
Explaining clusters – the easy solution
Use an expert
“each cluster represents a research group in KMi ”
Can one trust those experts?
![Page 7: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/7.jpg)
Explaining clusters – the nice solution
Use Inductive Logic Programming (ILP)
E+ (positive examples) E− (negative examples)
attendsESWC(M.dAquin).attendsESWC(E.Motta).
attendsESWC(V.Lopez).
B: knowledge about E = E+ ∪ E−submitted(M.dAquin). submitted(V.Lopez).
submitted(E.Motta).accepted(V.Lopez). accepted(M.dAquin).
Learn a complete (B ∪H � E+) and consistent (B ∪H 2 E−)explanation for the relation attendsESWC(X).
attendsESWC(X) <- submitted(X)∧accepted(X)
![Page 8: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/8.jpg)
Explaining clusters – still the nice solution
E+ (positive examples) E− (negative examples)
inMyCluster(M.dAquin).inMyCluster(M.Fernandez).
inMyCluster(V.Lopez).inMyCluster(H.Saif).
inMyCluster(M.Sabou).inMyCluster(C.Pedrinaci).inMyCluster(J.Domingue).
B: knowledge about E = E+ ∪ E−
B?
inMyCluster(X) <– ?
![Page 9: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/9.jpg)
Explaining clusters – the cool solution
Integrate ILP with Linked Data
![Page 10: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/10.jpg)
Explaining clusters – the cool solution
E+ (positive examples) E− (negative examples)
inMyCluster(M.dAquin).inMyCluster(M.Fernandez).
inMyCluster(V.Lopez).inMyCluster(H.Saif).
inMyCluster(M.Sabou).inMyCluster(C.Pedrinaci).inMyCluster(J.Domingue).
B: knowledge about E = E+ ∪ E−topic(M.dAquin, SemanticWeb). topic(M.Sabou, SemanticWeb).
topic(V.Lopez, SemanticWeb). topic(H.Saif, SocialWeb).topic(C.Pedrinaci, SemanticWebServices).topic(J.Domingue, SemanticWebServices).
topic(M.Fernandez, SocialWeb).
inMyCluster(X) <- topic(X,SemanticWeb)
Is this enough?
![Page 11: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/11.jpg)
Producing Linked Data Explanations
on similar topicsPeople working in the same place are likely to write papers together.
on the same project
![Page 12: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/12.jpg)
Producing Linked Data Explanations
People workingunder the same person
are likely to write papers together.with the same partner
![Page 13: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/13.jpg)
Producing Linked Data Explanations
People working under people interested in the same thing write papers together.
![Page 14: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/14.jpg)
Integrating ILP and Linked Data
Add to B each Linked Data explanation hi = 〈pk〉.〈vk〉*,where:
• pk (path): a chain of RDF propertiespk = {prop0 → prop1 → . . .→ propn}
• vk (value): a final instance
• roots(hi ): elements ∈ Ci having hi in commonroots(hi )={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}
*spread across different datasets
hi = 〈ou:project→ou:ledBy→foaf:topic〉pk .〈edu:SemanticWeb〉vk
Building each hi :– how?– which chains of properties?– where to find the good ones?
![Page 15: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/15.jpg)
Dedalo – An iterative Linked Data traversal
Scoring hypotheses
WRacc1(hi ) = |roots(hi )|
|R|
(|roots(hi )∩Ci ||roots(hi )| −
|Ci ||R|
)
1 Geng et al. (2006). Interestingness measures for data mining: A survey.
![Page 16: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/16.jpg)
Dedalo – An iterative Linked Data traversal
How to define the interestingness of a path pk?How to reach the best hi in the shortest time?
![Page 17: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/17.jpg)
Dedalo – Comparing Heuristics
• We chose to compare different strategies.
• We want to find the path pk leading to the best hi in the shortest time.
• We want to save time and computational complexity
Path Length length of pk in number of properties composing itPath Frequency frequency of the paths in the graph
Adapted PMI joint and individual distribution of pk and CiAdapted TF–IDF how important is pk (term) in Ci (doc)
Delta |vals(pk)| ≈ |C|Entropy2 distribution of |vals(pk)|
Conditional Entropy distribution of |vals(pk)| w.r.t. Ci
2Shannon, C. (1948). A Mathematical Theory of Communication.
![Page 18: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/18.jpg)
Dedalo’s Heuristics
Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}
Path Frequency top(pk)=〈foaf:topic〉
![Page 19: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/19.jpg)
Dedalo’s Heuristics
Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}
Adapted TF–IDF top(pk)=〈ou:exMember〉
![Page 20: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/20.jpg)
Dedalo’s Heuristics
Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}
Entropy top(pk)=〈ou:project→ou:ledBy→foaf:topic〉
![Page 21: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/21.jpg)
Experiments – KMi co-authorship
• Authors clustered according to their co-authorships.
• Network Partitioning clustering, |R|=92, |C|= 6
Cycles
Wra
cc
0 5 10 15
0.00
0.04
0.08
0.12
Semantic Web authorsLenFqDEntC.EntTFIDFPMI
Cycles
Wra
cc0 5 10 15
0.00
0.04
0.08
0.12
Learning Analytics authorsLenFqDEntC.EntTFIDFPMI
|Ci | hi WRacc
22 〈org:hasMembership→ox:hasPrincipal-0.128
Investigator→org:hasMembership〉p.〈ou:SmartProducts〉v123 〈org:hasMembership→ox:hasPrincipalInvestigator
0.127→org:hasMembership〉p.〈ou:SocialLearn〉v2
![Page 22: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/22.jpg)
Experiments – KMi Publications
• Papers clustered according to their keywords.
• XK-Means clustering, |R|=865, |C|= 6
Cycles
Wra
cc
0 2 4 6 8 10
0.00
0.01
0.02
0.03
0.04
0.05
Learning Analytics papersLenFqDEntC.EntTFIDFPMI
Cycles
Wra
cc0 2 4 6 8 10
0.00
0.02
0.04
0.06
0.08
0.10
Semantic Web papersLenFqDEntC.EntTFIDFPMI
|Ci | hi WRacc
601 〈dc:creator→ntag:isRelatedTo〉p.〈ou:LearningAnalytics〉v1 0.042220 〈dc:creator→ntag:isRelatedTo〉p.〈ou:SemanticWeb〉v2 0.073
![Page 23: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/23.jpg)
Experiments –Huddersfield’s dataset
• Books clustered according to the students’ Faculties.
• K-Means clustering, |R|=6969, |C|= 14
Cycles
Wra
cc
0 5 10 15
0.00
00.
001
0.00
20.
003
0.00
40.
005 Music students' borrowings
LenFqDEntC.EntTFIDFPMI
Cycles
Wra
cc
0 5 10 150.
000
0.00
50.
010
0.01
5 Theatre students' borrowingsLenFqDEntC.EntTFIDFPMI
|Ci | hi WRacc
335 〈dc:subject→skos:broader〉p1 .〈lcsh:PhysicalScience〉v 0.005919 〈dc:creator→bl:hasCreated→dc:subject〉p2 .〈bl:EnglishDrama〉v 0.013
![Page 24: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/24.jpg)
Experiments – Comparing heuristics
Heuristics speed comparison in seconds.
KMiA1 KMiA2 KMiP1 KMiP2 Hud1 Hud2Len 1.64 4.15 8.95 9.01 69.13 135.5Freq 2.57 4.35 7.5 9.29 180 180PMI 2.05 3.88 11.28 18.42 180 180
TF–IDF 1.69 3.18 10.61 17.19 180 180Delta 2.02 3.92 180 180 180 180
Entropy 4.19 3.27 7.1 7.3 41.15 105.09Conditional Entropy 2.64 3.89 7.48 7.55 70.91 40.89
/ – Len, Freq : fast but inaccurate baselines
, – Entropy/Conditional Entropy: outperforming measures,reducing redundancy (following wrong paths) and time efforts
/ – PMI , TFIDF, Delta : they might work on less homogeneousclusters
![Page 25: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/25.jpg)
Conclusions
• Linked Data – automatically explaining clusters
• Dedalo – traversing Linked Data to reveal explanations
• Entropy – driving the search in the Linked Data cloud
Beyond Dedalo.Dedalo works as far as there is a limited domain.New use-cases require its extension.
![Page 26: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/26.jpg)
Future work: the OU students enrolment dataset
• Add sameAs linking
• Use of literals
• Aggregation of atomic rules
• Explore new hypotheses evaluation measures
![Page 27: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data](https://reader034.fdocuments.net/reader034/viewer/2022051515/55941fdd1a28ab4a128b462d/html5/thumbnails/27.jpg)
Thanks for your attention!3
[email protected]@open.ac.uk
Questions?Better asking the robot than the experts
3Special thanks to the KMi (happy) faces.