Finding and Ranking Knowledge on the Semantic Web

35
UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

description

Finding and Ranking Knowledge on the Semantic Web. Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County.  http://creativecommons.org/licenses/by-nc-sa/2.0/ - PowerPoint PPT Presentation

Transcript of Finding and Ranking Knowledge on the Semantic Web

Page 1: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Finding and RankingKnowledge on the

Semantic Web

Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari

University of Maryland, Baltimore County

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

Page 2: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 3: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

Google has made us smarter

Page 4: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

But what about our agents?

tell

register

A Google for knowledge on the SemanticWeb is needed by people and software agents

Page 5: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 6: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

title• text

Page 7: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes, 55K properties, 7M individuals (4/05)

Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05)

Page 8: Finding and Ranking Knowledge on the Semantic Web

Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Demo1

Page 9: Finding and Ranking Knowledge on the Semantic Web

Digest “Time” Ontology (document view)

Demo2(a)

Page 10: Finding and Ranking Knowledge on the Semantic Web

Digest “Time” Ontology (term view)

Demo2(b)

………….

TimeZone

before

intAfter

Page 11: Finding and Ranking Knowledge on the Semantic Web

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Page 12: Finding and Ranking Knowledge on the Semantic Web

Digest Term “Person”Demo4

167 different properties

562 different properties

Page 13: Finding and Ranking Knowledge on the Semantic Web

Demo5(a) Swoogle

Today

Page 14: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

Demo5(b) Swoogle

Statistics

FOAFFOAF

TrustixTrustix

W3CW3C

StanfordStanford

Page 15: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

Swoogle’s Triple Store lets you shop

And check out your triples into any of several reasoners

Page 16: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 16

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3 (July 2005)Swoogle3 (July 2005)

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart

Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals

2005

2004

Page 17: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 17

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 18: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 18

The Semantic Web Onion

Universal RDF Graph

RDF Document

Class-instance

Molecule

Triple

Physically hosting knowledge(About 100 triples per SWD in average)

The “Semantic Web”(About 10M documents)

Finest lossless set of triples

triples modifying the same subject

Atomic knowledge block

Resource

Literal

Swoogle maintains metadata about objects in different layers of the Semantic Web Onion.

Page 19: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 19

RDF graph Resource

Web

SWT

SWD

usespopulates defines

officialOntoisDefinedBy

owl:imports…

rdfs:seeAlsordfs:isDefinedBy

SWO

isUsedByisPopulatedBy

rdfs:subClassOf

sameNamespace, sameLocalnameExtends class-property bond

1

23

4 5

6 7

Term Search

Document Search

literal

Semantic Web Navigation Model

Navigating the HTML web is simple; there’s just one kind of link. The SW has more kinds of links and hence more navigation paths.

Page 20: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 20

RDF graph Resource

Web

SWT

SWD

usespopulates defines

officialOntoisDefinedBy

owl:imports…

rdfs:seeAlsordfs:isDefinedBy

SWO

isUsedByisPopulatedBy

rdfs:subClassOf

sameNamespace, sameLocalnameExtends class-property bond

1

23

4 5

6 7

Term Search

Document Search

literal

Semantic Web Navigation Model

Relations in 1 and 3 and parts of 4 require a global view to discover

Page 21: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 22: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

Rank has its privilege• Google introduced a new approach to ranking query

results using a simple “popularity” metric.– It was a big improvement!

• Swoogle ranks its query results also– When searching for an ontology, class or property,

wouldn’t one want to see the most used ones first?• Ranking SW content requires different algorithms for

different kinds of SW objects– For SWDs, SWTs, individuals, “assertions”,

molecules, etc…

Page 23: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

Google’s PageRank• A page’s rank is a function of

how many links point to it and the rank of the pages hosting those links.

• The “random surfer” model provides the intuition: (1) Jump to a random page(2) Select and follow a random link on the

page and repeat until ‘bored’(3) If bored, go to (1)

• Ranked pages by the relative frequency with which they are visited.

Jump to arandom page

Follow arandom link

bored?

no

yes

Page 24: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 25

Ranking Semantic Web Documents• Target: a pure SW dataset

– Nodes: a collection of online SWDs (330K SWDs, 1.5% are labeled as ontologies)

– Links: in addition to hyperlinks, term level relations are generalized into TM, EX, IM.

• Rational surfer model (extension of weighted PageRank)– Semantic content (term level relations) encoded into links– rank of node iteratively spread via links– weight/capacity of link vary according to link semantics– propagate weight to imported ontologies

• Evaluation– Method: Compare OntoRank with PageRank for

promoting ontologies even using the same Pure SW Dataset

Page 25: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 26

An Example

http://www.cs.umbc.edu/~finin/foaf.rdf

http://xmlns.com/wordnet/1.6/

http://xmlns.com/foaf/1.0/

EX

TM

TM

TM

http://www.w3.org/2000/01/rdf-schema

wPR =0.2wPR =0.2

wPR =100wPR =100

wPR =3wPR =3

wPR =300wPR =300

OntoRank =0.2OntoRank =0.2

OntoRank =100OntoRank =100

OntoRank =103OntoRank =103

OntoRank =403OntoRank =403

Page 26: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 27

Ontology Dictionary• Motivation

– One ontology does not always provide all needed vocabulary

– There could be many scenario that requires assembling terms from multiple ontologies

• DIY ontology engineering1. Search an appropriate class C2. Search for popular properties used for modifying C’s

class instance3. Go back to step 1 if more classes are needed

Page 27: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

Ranking Semantic Web Terms• Pr(Term|Doc) can be measured by the normalized

value of the product of the term’s– Popularity: how many SWDs is using the term.– Frequency: how many times the term is used in the SWD

• SWDs are accessed non-uniformly by OntoRank• TermRank estimates a term’s importance as

∑ Pr(Term|Doc) * OntoRank(Doc)• Evaluation

– Compare TermRank with Term’s popularity for the top 10 highest rated terms and compose analytical evaluation.

Page 28: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 29

Class-Property Bonds

Class Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”Class-Property Bond

(introduced by instances)• foaf:name• dc:title

Class-Property Bond (introduced by ontology)• foaf:mbox• foaf:name

foaf:name

rdf:type owl:Classrdf:type

“a human being”rdfs:comment

foaf:name

“Tim Finin”

“Tim’s FOAF File”dc:title

foaf:mbox

rdfs:domain

foaf:Agent

rdfs:subClassOf

rdfs:domain

SWD1

SWD3SWD2

foaf:Person

Page 29: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 30

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 30: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

Applications and use cases• Supporting Semantic Web developers, e.g.,

– Ontology designers– Vocabulary discovery– Who’s using my ontologies or data?– Etc.

• Searching specialized collections, e.g.,– Proofs in Inference Web– Text Meaning Representations of news stories in

SemNews

• Supporting SW tools, e.g.,– Discovering mappings between ontologies

Page 31: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

Page 32: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010

2005 2.5x105 5x106 5x107 5x108 5x1010

2008 5x105 5x107 5x108 5x109 5x1011

We think Swoogle’s centralized approach can be made to work for the next few years if not longer.

Page 33: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

How much reasoning?

• SwoogleN (N<=3) does limited reasoning– It’s expensive

– It’s not clear how much should be done

• More reasoning would benefit many use cases– e.g., type hierarchy

• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C

Page 34: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers– We need better ways to discover, index, search and

reason over SW knowledge

• SW search engines address different tasks than html search engines– So they require different techniques and APIs

• Swoogle like systems can help create consensus ontologies and foster best practices

Page 35: Finding and Ranking Knowledge on the Semantic Web

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

http://ebiquity.umbc.edu/Annotated

in OWL

For more information