RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
-
Upload
national-institute-of-informatics -
Category
Technology
-
view
641 -
download
4
Transcript of RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF GRAPH VISUALIZATION BY INTERPRETING LINKED DATA AS KNOWLEDGE
Rathachai CHAWUTHAI & Prof.Hideaki TAKEDA National Institute of Informatics , and SOKENDAI
RDF4U
JIST2015 Yichang, China 11-13 Nov 2015
AGENDA
• Motivation
• Methods
• Graph Simplification
• Triple Ranking • Property Selection
• Outcome
• Future Plan
THE ROLE OF SEMANTIC WEB IN KNOWLEDGE MANAGEMENT
DDaattaa ttiieerr
SSeerrvviiccee ttiieerr
VViissuuaalliissaattiioonn ttiieerr
SSPPAARRQQLL JJEENNAA eettcc..
4
AApppplliiccaattiioonn//PPrreesseennttaattiioonn//
At Visualisation Tier, • RDF data are transformed into
Chart, Geographic Map, etc. and then serve users.
It’s cool, but • Users are far from RDF data, so
they do not understand the power of Semantic Web and do not realise how to contribute RDF data.
For this reason, • It could be good if users can read
RDF data directly using node-link diagram or concept-map diagram.
read
READING FROM A QUERY GRAPH
5
Querying the 2-hop neighbourhood (or more hops) of a given URI gives wider information on the topic.
CCaaffffee MMoocchhaa
EEsspprreessssoo CChhooccoollaattee
SSuuggaarr MMiillkk
CCooffffeeeettyyppee
sswweeeett
ttyyppee
ttaassttee
ssuuggaarrccaannee
mmaaddee ffrroomm
ccooww
pprroodduucceess
wwhhiittee
ccoolloorr
ccooccooaa
ccoonnttaaiinnss
aa sshhoott ooff
ttooppppeedd bbyyccoonnttaaiinnss
hhaass llaayyeerr ooff
ccaaffffeeiinnee ccoonnttaaiinn
443300 mmgg//LL
bbllaacckk
ccoolloorr
bbiitttteerrttaassttee
PROBLEMS
1) A Query Graph is TOO Complicated to Read.
http://lod.ac/species/Bubohttp://dbpedia.org/resource/Tokyo
6
PROBLEMS
7
2) Lacking of Reading Flow of RDF Data
All triples are equal, so Background Content and Main Point are NOT structured in any RDF graphs.
≠ TTooppiicc
GOAL
8
we prefer …….
✦ A Simply Readable Graph ✦ A Well-Reading-Flow Graph
TTooppiicc
TTooppiicc
Common Information
Topic-Specific Information
DEMO
http://my.tv.sohu.com/us/271745761/81854223.shtml
9
https://www.youtube.com/watch?v=z3roA9-Cp8g
bit.ly/youtube_rdf4u
bit.ly/sohu_rdf4u
Full urls
OVERALL
11
Prop
erty
Sel
ectio
n
Gra
ph
Sim
plifi
cati
on
Trip
le R
anki
ng
RDF4U Human-Readable Graph
Original Query Graph
display/hide properties
select simplification rules
choose a proper rank
User
GRAPH SIMPLICATION
12
• Some well-prepared RDF repositories did reasoning on ontologies in order to support a SPARQL service.
• One impact is that the inferred triples create giant components in a graph.
• A closer look at the data indicates that the following situations are commonly found in any complex RDF graph. • equivalent or same-as instances (owl:sameAs), • transitive properties (e.g. skos:broaderTransitive), and • hierarchical classification (rdf:type & rdfs:subClassOf)
• Thus, this method aims to remove some redundant triples by using the mechanism of Semantic Web rules.
xx CC11
CC22
rrddffss::ssuubbCCllaassssOOffrrddff::ttyyppee
xx
yy
zzPP
PP
GRAPH SIMPLICATION
13
ss11 oo11
oo22
pp11
pp22ss22
oowwll::ssaammeeAAss and fD(s1) > fD(s2) ss11
pp11
pp22
oo11
oo22
To merge same-as nodes
To remove transitive links
To remove inferred type hierarchies
xx
yy
zzPP
PP
PP
and p rdf:type owl:TransitiveProperty .
xxCC11
CC22rrddff::ttyyppee
rrddff::ttyyppee
rrddffss::ssuubbCCllaassssOOff
11
22
33
GRAPH SIMPLICATION
Example Result
14
Graph Simplification
Superorder(Order(
owls(
Strigiformes(
Family(
Common(Name(Strigidae(Aves(
Bubo(
eagle(owls(
Genus(
Class(
birds(
Coelurosauria(
Neognathae(
Taxon(Name(
hasSynonym)
hasSynonym)
hasParentTaxon)
hasParentTaxon)hasParentTaxon)
hasTaxonRank)
hasTaxonRank)
hasTaxonRank)
hasTaxonRank)
hasSynonym)
hasParentTaxon)
hasTaxonRank)
type)type)
type)type)
type)
ScienAfic(Name(
http://lod.ac/species/Bubo
Simplified GraphOriginal Query Graph
TRIPLE RANKING
15
Since users have different background knowledge in a specific topic, beginners may interested in reading common information before getting topic-specific information, while experts may prefer to read only topic-specific information.
• Concept Level (resources || properties)
• General Concepts are terms that are commonly known such as “name”, “address”, and “class”, and they are always found in a corpus.
• Key Concepts are important terms that are always found in the query result and not many in the whole dataset.
• Information Level (triples)
• Common Information explains background knowledge that supports readers to understand the main content. (a lot of general concepts)
• Topic-Specific Information contains specific terms that are highly relevance to the article. (a lot of key concepts)
TRIPLE RANKING
16
are General Concepts are Key Concepts
Identify • General concepts • Key concepts
Get an RDF graph 2211
TRIPLE RANKING
17
are General Concepts are Key Concepts
Common Information Most of nodes and links are general concepts
33 44Topic-Specific Information Most of nodes and links are key concepts
α⋅w(s) + β⋅w(p) + γ⋅w(o)
3
α⋅w(s) + β⋅w(p) + γ⋅w(o)
α + β + γ
TRIPLE RANKING
18
w(uri)=fQ(uri)
log( fD(uri) + 1)
vw(⟨s,p,o⟩)=
a number of a URI in a Query result
a logarithmic scale of a number of a URI in a whole Dataset
Weight of a URI
Visualization-Weight of a Triple
The coefficients are 1.0 by default, but they can be adjusted due to for specific purpose.
Concept Level
Information Level
high: key concept low: general concept
high: topic-specific low: common info
TRIPLE RANKING
19
h"p://dbpedia.org/resource/Hydrogen 53 1,386 16.87h"p://dbpedia.org/resource/Category:Chemical_elements 14 10,880 3.47h"p://dbpedia.org/resource/Hydrogen_economy 13 6,489 3.41h"p://dbpedia.org/resource/Category:Diatomic_nonmetals 12 103 5.96h"p://dbpedia.org/resource/Category:Airship_technology 8 166 3.60h"p://www.w3.org/2004/02/skos/core#Concept 8 9,707,808 1.14
h"p://www.w3.org/2002/07/owl#Thing 2 9,761,514 0.29h"p://www.hydrogen.energy.gov/ 1 1 0.00
h"p://www.w3.org/2002/07/owl#sameAs 72 !meout 0.00
h"p://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type 38 !meout 0.00
h"p://www.w3.org/2000/01/rdf-‐schema#subClassOf 24 !meout 0.00h"p://www.w3.org/2002/07/owl#equivalentClass 22 !meout 0.00h"p://purl.org/dc/terms/subject 12 30,232,709 1.60h"p://www.w3.org/2004/02/skos/core#broader 12 2,485,421 1.88h"p://xmlns.com/foaf/0.1/isPrimaryTopicOf 3 34,557,438 0.40h"p://purl.org/dc/elements/1.1/rights 2 3,102,660 0.31
URIfQ fD
log(fD)
fQ
Reso
urce
sPr
oper
ties
in a Query graph
in a whole Dataset
Query Topic: dbpedia:Hydrogen
(raw: 1,291,986)
(raw: 15,195,702)
Concept Level
TRIPLE RANKING
20
Subject Predicate Object vw
dp:Hydrogen rdf:type owl:Thing 5.62
dp:Hydrogen rdf:type skos:Concept 6.01
dp:Hydrogen dct:subject dp:Chemical_elements 7.31
dp:Hydrogen dct:subject dp:Airship_technology 7.35
dp:Hydrogen rdf:type dp:Diatomic_nonmetals 7.48
HFor Example
http://dbpedia.org/resource/Hydrogen
Common
Topic-Specific
Information Level
TRIPLE RANKING
21
In case of sub-property (also sub-class)
ltk:higherTaxon
ltk:mergedIntoskos:broader
rdfs:subPropertyOf
rdfs:subPropertyOf
ltk:higherTaxon
ltk:mergedInto
a x
a y
skos:broadera x
a yskos:broader
more specific than
Raw Data Inferred Data
PROTOTYPE
23
http://rc.lodac.nii.ac.jp/rdf4u/
Thanks toClient: D3js, Bootstrap, jQuery, Server: SimpleRDF, SPARQL for PHP
• To simplify a graph by removing some inferred triples.
• To give ranking scores to triples based on common and topic-specific information.
• To filter a graph by selecting preferred properties.
• To control an interactive graph diagram.
Features
bit.ly/rdf4u
DISCUSSION
Usefulness
Uniqueness
Novelty
Prospect
Some graph visualisation works: Motif, Gephi, RDF Gravity, Fenfire, and IsaViz,
• do not use the power of Semantic Web to sparsity a graph, and
• do not mention to provide different data for different user levels
• TF-IDF is adapted for ordering triple from common to topic-specific level of information.
• The degree of commonness versus specificity is calculated by evaluating the nature of the dataset with the algorithm.
• The triple ranking can be extended by applying various algorithm in order to satisfy diverse characteristics of the data in other domains such as Biodiversity Informatics.
• Mashup tools should consider this idea.
24
• A diagram is sparser and easier to be read by human.
• Beginners can read common information firstly.
• Expert can read topic-specific information.
FUTURE PLAN
• To do critical evaluation • Survey • Number of cutting edge
• To find the precise border between common information and topic-specific information
• To find a better way to count the number of URIs(always timeout)
• To remove noisy triples
• To improve triple ranking algorithm for other domains
25
Prop
erty
Sel
ectio
n
Gra
ph
Sim
plifi
cati
on
Trip
le R
anki
ngRDF4U
Human-Readable Graph
Original Query Graph
http://rc.lodac.nii.ac.jp/rdf4u
非常感謝
THANKS TO THESE IMAGE SOURCES
https://www.pinterest.com/pin/444660163179663554/
http://www.clipartpanda.com/categories/reading-clipart
https://en.wikipedia.org/wiki/Facebook_like_button
http://www.iconarchive.com/show/misc-icons-by-iconlicious/Monitor-icon.html
http://www.w3.org/RDF/icons/
http://designplaygrounds.com/tv/the-power-of-data-visualization-2/
https://conceptdraw.com/a1247c3/preview/256