NJVR: The NanJing Vocabulary Repository
-
Upload
gong-cheng -
Category
Technology
-
view
187 -
download
4
description
Transcript of NJVR: The NanJing Vocabulary Repository
![Page 1: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/1.jpg)
NJVR: The NanJing Vocabulary Repository
Gong Cheng, Min Liu, Yuzhong Qu
Nanjing University
![Page 2: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/2.jpg)
Motivation
summarization
rankingmatching
Ontology-related research topics A large and representativecollection of real-world vocabularies
![Page 3: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/3.jpg)
State of the art
Top-down efforts Bottom-up efforts
Our goal
Size: small (hundreds)
Access: directly (via browsing)
Size: large (thousands)
Access: indirectly (via searching)
![Page 4: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/4.jpg)
Contribution
• NJVR: A large and freely-accessible vocabulary repository– Source: An index of 4.1 B RDF triples distributed in 15.9 M RDF
documents crawled from 5.8K pay-level domains (PLDs)– Constitution:
• RDF descriptions of 2,996 dereferenceable vocabularies crawled from 261 PLDs• Document-level statistical data on their instantiations (e.g. term frequency)
– Accessibility: Publicly downloadable
![Page 5: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/5.jpg)
Construction of NJVR
1. Crawling
2. Vocabulary identification
3. Vocabulary instantiation
![Page 6: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/6.jpg)
Crawling (2007—May 2011)
1. Initialization (of the URI pool)– Other freely-accessible repositories, e.g. pingthesemanticweb.com– LOD cloud– Search results, e.g. Swoogle, Google
1. URI Dereference and document parsing– java.net package– Jena
1. Pool expansion– URIs in parsed documents– Submissions from the users of Falcons
![Page 7: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/7.jpg)
Vocabulary identification
• Bottom-up strategy1. Term: URI that identifies a class/property in its dereference
document
2. Vocabulary: Terms in a common namespace are grouped
![Page 8: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/8.jpg)
Results
• 455,718 terms– 396,023 classes, 59,868 properties, (many are in YAGO NS)
• 2,996 vocabularies– From 261 PLDs , (many are from w3.org)
• Instantiation found for– 115,707 classes (29.2%), e.g. foaf:Person– 25,963 properties (43.4%), e.g. dc:creator– 1,874 vocabularies (62.6%)
![Page 9: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/9.jpg)
Applications of NJVR
• Vocabulary ranking• Vocabulary matching• …
![Page 10: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/10.jpg)
NJVR for vocabulary ranking
• Using NJVR as a test case for vocabulary ranking
![Page 11: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/11.jpg)
Future work
• Removal of low-quality vocabularies from NJVR• Comparative analysis of NJVR and other repositories• …
![Page 12: NJVR: The NanJing Vocabulary Repository](https://reader036.fdocuments.net/reader036/viewer/2022080212/559666381a28abf3338b487a/html5/thumbnails/12.jpg)
Just use it!
ws.nju.edu.cn/njvr
ws.nju.edu.cn/falcons