Finding and Ranking Knowledge on the Semantic Web
description
Transcript of Finding and Ranking Knowledge on the Semantic Web
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Finding and RankingKnowledge on the
Semantic Web
Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari
University of Maryland, Baltimore County
http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF
grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 2
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 3
Google has made us smarter
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 4
But what about our agents?
tell
register
A Google for knowledge on the SemanticWeb is needed by people and software agents
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 5
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 6
title• text
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 7
Swoogle Architecture
metadata creation
data analysis
interface
SWD discovery
SWD MetadataWeb Service
Web Server
SWD Cache
The Web
The WebCandidate
URLs Web Crawler
SWD Reader
IR analyzer SWD analyzer
Agent Service
Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes, 55K properties, 7M individuals (4/05)
Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05)
Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo1
Digest “Time” Ontology (document view)
Demo2(a)
Digest “Time” Ontology (term view)
Demo2(b)
………….
TimeZone
before
intAfter
Find Term “Person”Demo3
Not capitalized! URIref is case sensitive!
Digest Term “Person”Demo4
167 different properties
562 different properties
Demo5(a) Swoogle
Today
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 14
Demo5(b) Swoogle
Statistics
FOAFFOAF
TrustixTrustix
W3CW3C
StanfordStanford
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 15
Swoogle’s Triple Store lets you shop
And check out your triples into any of several reasoners
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 16
Summary
Swoogle (Mar, 2004)Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)
Swoogle3 (July 2005)Swoogle3 (July 2005)
Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface
Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart
Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals
2005
2004
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 17
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 18
The Semantic Web Onion
Universal RDF Graph
RDF Document
Class-instance
Molecule
Triple
Physically hosting knowledge(About 100 triples per SWD in average)
The “Semantic Web”(About 10M documents)
Finest lossless set of triples
triples modifying the same subject
Atomic knowledge block
Resource
Literal
Swoogle maintains metadata about objects in different layers of the Semantic Web Onion.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 19
RDF graph Resource
Web
SWT
SWD
usespopulates defines
officialOntoisDefinedBy
owl:imports…
rdfs:seeAlsordfs:isDefinedBy
SWO
isUsedByisPopulatedBy
rdfs:subClassOf
sameNamespace, sameLocalnameExtends class-property bond
1
23
4 5
6 7
Term Search
Document Search
literal
Semantic Web Navigation Model
Navigating the HTML web is simple; there’s just one kind of link. The SW has more kinds of links and hence more navigation paths.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 20
RDF graph Resource
Web
SWT
SWD
usespopulates defines
officialOntoisDefinedBy
owl:imports…
rdfs:seeAlsordfs:isDefinedBy
SWO
isUsedByisPopulatedBy
rdfs:subClassOf
sameNamespace, sameLocalnameExtends class-property bond
1
23
4 5
6 7
Term Search
Document Search
literal
Semantic Web Navigation Model
Relations in 1 and 3 and parts of 4 require a global view to discover
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 22
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 23
Rank has its privilege• Google introduced a new approach to ranking query
results using a simple “popularity” metric.– It was a big improvement!
• Swoogle ranks its query results also– When searching for an ontology, class or property,
wouldn’t one want to see the most used ones first?• Ranking SW content requires different algorithms for
different kinds of SW objects– For SWDs, SWTs, individuals, “assertions”,
molecules, etc…
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 24
Google’s PageRank• A page’s rank is a function of
how many links point to it and the rank of the pages hosting those links.
• The “random surfer” model provides the intuition: (1) Jump to a random page(2) Select and follow a random link on the
page and repeat until ‘bored’(3) If bored, go to (1)
• Ranked pages by the relative frequency with which they are visited.
Jump to arandom page
Follow arandom link
bored?
no
yes
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 25
Ranking Semantic Web Documents• Target: a pure SW dataset
– Nodes: a collection of online SWDs (330K SWDs, 1.5% are labeled as ontologies)
– Links: in addition to hyperlinks, term level relations are generalized into TM, EX, IM.
• Rational surfer model (extension of weighted PageRank)– Semantic content (term level relations) encoded into links– rank of node iteratively spread via links– weight/capacity of link vary according to link semantics– propagate weight to imported ontologies
• Evaluation– Method: Compare OntoRank with PageRank for
promoting ontologies even using the same Pure SW Dataset
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 26
An Example
http://www.cs.umbc.edu/~finin/foaf.rdf
http://xmlns.com/wordnet/1.6/
http://xmlns.com/foaf/1.0/
EX
TM
TM
TM
http://www.w3.org/2000/01/rdf-schema
wPR =0.2wPR =0.2
wPR =100wPR =100
wPR =3wPR =3
wPR =300wPR =300
OntoRank =0.2OntoRank =0.2
OntoRank =100OntoRank =100
OntoRank =103OntoRank =103
OntoRank =403OntoRank =403
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 27
Ontology Dictionary• Motivation
– One ontology does not always provide all needed vocabulary
– There could be many scenario that requires assembling terms from multiple ontologies
• DIY ontology engineering1. Search an appropriate class C2. Search for popular properties used for modifying C’s
class instance3. Go back to step 1 if more classes are needed
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 28
Ranking Semantic Web Terms• Pr(Term|Doc) can be measured by the normalized
value of the product of the term’s– Popularity: how many SWDs is using the term.– Frequency: how many times the term is used in the SWD
• SWDs are accessed non-uniformly by OntoRank• TermRank estimates a term’s importance as
∑ Pr(Term|Doc) * OntoRank(Doc)• Evaluation
– Compare TermRank with Term’s popularity for the top 10 highest rated terms and compose analytical evaluation.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 29
Class-Property Bonds
Class Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”Class-Property Bond
(introduced by instances)• foaf:name• dc:title
Class-Property Bond (introduced by ontology)• foaf:mbox• foaf:name
foaf:name
rdf:type owl:Classrdf:type
“a human being”rdfs:comment
foaf:name
“Tim Finin”
“Tim’s FOAF File”dc:title
foaf:mbox
rdfs:domain
foaf:Agent
rdfs:subClassOf
rdfs:domain
SWD1
SWD3SWD2
foaf:Person
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 30
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 32
Applications and use cases• Supporting Semantic Web developers, e.g.,
– Ontology designers– Vocabulary discovery– Who’s using my ontologies or data?– Etc.
• Searching specialized collections, e.g.,– Proofs in Inference Web– Text Meaning Representations of news stories in
SemNews
• Supporting SW tools, e.g.,– Discovering mappings between ontologies
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 36
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 37
Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling
System/date Terms Documents Individuals Triples Bytes
Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109
Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010
2005 2.5x105 5x106 5x107 5x108 5x1010
2008 5x105 5x107 5x108 5x109 5x1011
We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 38
How much reasoning?
• SwoogleN (N<=3) does limited reasoning– It’s expensive
– It’s not clear how much should be done
• More reasoning would benefit many use cases– e.g., type hierarchy
• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 39
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than html search engines– So they require different techniques and APIs
• Swoogle like systems can help create consensus ontologies and foster best practices
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 40
http://ebiquity.umbc.edu/Annotated
in OWL
For more information