1 Dr Alexiei Dingli Introduction to Web Science Reusing knowledge.
Populating Ontologies for the Semantic Web Alexiei Dingli.
-
Upload
bernice-mcgee -
Category
Documents
-
view
221 -
download
1
Transcript of Populating Ontologies for the Semantic Web Alexiei Dingli.
![Page 1: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/1.jpg)
Populating Ontologies for the Semantic Web
Alexiei Dingli
![Page 2: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/2.jpg)
What’s the problem?
![Page 3: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/3.jpg)
Towards a solution … (1)
Ask intelligent
agents to do the
job for us!!
But they don’t understand the
WWW !!!
![Page 4: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/4.jpg)
Towards a solution … (2)
But there’s another way in which this can be achieved, by supplying the missing semantic information
For the Web to reach its full potential, it must evolve into a SemanticWeb, providing a universally accessible platform that allows data tobe shared and processed by automated tools as well as by people.
(W3C Web Guru)
Creating the Semantic Web !!
![Page 5: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/5.jpg)
Towards a solution … (3)
Why do many believe this solution will fail?
It requires lots of time and effort
It needs lots of people willing to do it
Not everyone can do it
![Page 6: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/6.jpg)
Our approaches
Active learning to reduce annotation burden Supervised learning Adaptive IE The Melita methodology
Automatic annotation of large repositories Largely unsupervised Armadillo
![Page 7: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/7.jpg)
Adaptive IE What is AIE?
Performs tasks of traditional IEExploits the power of Machine Learning in
order to adapt to complex domains having large amounts of domain
dependent data different sub-languages features different text genres
Considers important the Usability and Accessibility of the system
![Page 8: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/8.jpg)
Amilcare
Tool for adaptive IE from Web-related textsSpecifically designed for document
annotationBased on (LP)2 algorithm
Covering algorithm based on Lazy NLP Trains with a limited amount of examples Effective on different text types
free texts semi-structured texts structured texts
Uses Gate and Annie for preprocessing
![Page 9: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/9.jpg)
CMU: detailed results (LP)2 BWI HMM SRV Rapier Whisk
speaker 77.6 67.7 76.6 56.3 53.0 18.3 location 75.0 76.7 78.6 72.3 72.7 66.4
stime 99.0 99.6 98.5 98.5 93.4 92.6 etime 95.5 93.9 62.1 77.9 96.2 86.0
All Slots 86.0 83.9 82.0 77.1 77.3 64.9
1. Best overall accuracy 2. Best result on speaker field3. No results below 75%
![Page 10: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/10.jpg)
Gate
General Architecture for Text Engineering provides a software infrastructure for researchers and
developers working in NLP
Contains Tokeniser Gazetteers Sentence Splitter POS Tagger Semantic Tagger (ANNIE) Orthographic Coreference
http://www.gate.ac.uk
Pronominal Coreference Multi lingual support Protégé WEKA many more exist and can be added
![Page 11: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/11.jpg)
AnnotationCurrent practice of annotation for knowledge identification and extraction
is time consuming
needs annotation by experts
is complex
Reduce burden of text annotation for Knowledge
Management
![Page 12: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/12.jpg)
Different Annotation Systems
SGML TEX Xanadu CoNote ComMentor JotBot Third Voice Annotate.net The Annotation Engine Visual Text
Alembic Annotea CritLink The Gate Annotation Tool iMarkup MnM S-CREAM Yawas
![Page 13: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/13.jpg)
Melita
Tool for assisted automatic annotation Uses an Adaptive IE engine to learn how to annotate
(no use of rule writing for adapting the system) Users: annotates document samples IE System:
Trains while users annotate Generalizes over seen cases Provides preliminary annotation for new documents
Performs smart ordering of documents Advantages
Annotates trivial or previously seen cases Focuses slow/expensive user activity on unseen cases User mainly validates extracted information
Simpler & less error prone / Speeds up corpus annotation The system learns how to improve its capabilities
![Page 14: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/14.jpg)
Methodology: Melita Bootstrap Phase
Bare Text
Amilcare Learns in
background
User Annotates
![Page 15: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/15.jpg)
Methodology: Melita Checking Phase
Bare Text
Learning in background
from missing
tags, mistakes
User Annotates
Amilcare Annotates
![Page 16: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/16.jpg)
Methodology: Melita Support Phase
Bare Text
Corrections used to retrain
Amilcare Annotates
User Corrects
![Page 17: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/17.jpg)
Intrusivity An evolving system is difficult to control Goal:
Avoiding unwelcome/unreliable suggestions Adapting proactivity to user’s needs
Method: Allow users to tune proactivity Monitor user reactions to suggestions
![Page 18: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/18.jpg)
Smart ordering of Documents
Bare Text
Tries to annotate all the documents and selects the
document with partial annotations
Learns annotations
User Annotates
![Page 19: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/19.jpg)
Methodology: Melita
Ontology
defining
concepts
Control Panel
Document
Panel
![Page 20: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/20.jpg)
Results
Tag Amount of Texts needed for training
Prec Rec
stime 20 84 63
etime 20 96 72
location 30 82 61
speaker 100 75 70
Location
0
20
40
60
80
100
0 50 100 150
training examples
Original Order selected Order
30 60
![Page 21: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/21.jpg)
Future Work
Research better ways of annotating concepts in documents
Optimise document ordering to maximise the discovery of new tags
Allow users to edit the rules Learn to discover relationships !! Not only suggest but also corrects
user annotations !!
![Page 22: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/22.jpg)
Annotation for the Semantic Web
Semantic Web requires document annotation Current approaches
Manual (e.g. Ontomat) or semi-automatic (MnM, S-Cream, Melita)
BUT: Manual/Semi-automatic annotation of
Large diverse repositories Containing different and sparse information
is unfeasible E.g. a Web site (So: 1,600 pages)
![Page 23: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/23.jpg)
Redundancy Information on the Web (or large repositories) is
Redundant
Information repeated in different superficial formats Databases/ontologies Structured pages (e.g. produced by databases) Largely structured pages (bibliography pages) Unstructured pages (free texts)
![Page 24: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/24.jpg)
Our Proposal
Largely unsupervised annotation of documents Based on Adaptive Information Extraction Bootstrapped using redundancy of information
Method Use the structured information (easier to extract)
to bootstrap learning on less structured sources (more difficult to extract)
![Page 25: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/25.jpg)
Example: Extracting Bibliographies
Mines web-sites to extract biblios from personal pages Tasks: Finding people’s names Finding home pages Finding personal biblio pages Extract biblio references
Sources NE Recognition (Gate’s Annie) Citeseer/Unitrier (largely incomplete biblios) Google Homepagesearch
![Page 26: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/26.jpg)
AKT Reference Ontology
Developed by the AKT partners Represent the knowledge used in the CS AKTive Portal
testbed Consists of several sub-ontologies Available in several flavours …
DAML+OIL OWL
Has 9,000,000 RDF triples !! Available at
Ontology http://www.aktors.org/publications/ontology/ RDF Triples http://triplestore.aktors.org/
![Page 27: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/27.jpg)
Mining Web sites (1)• Mines the site looking for
People’s names• Uses
•Generic patterns (NER)•Citeseer for likely bigrams
• Looks for structured lists of names
• Annotates known names• Trains on annotations to discover
the HTML structure of the page• Recovers all names and
hyperlinks
![Page 28: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/28.jpg)
Experimental Results (1) People
discovering who works in the department using Information Integration
Total present in site 129 Using generic patterns + online repositories
48 correct, 3 wrong Precision 48 / 51 = 94 % Recall 48 / 129 = 37 % F-measure 51 %
Errors A. Schriffin Eugenio Moggi Peter Gray
![Page 29: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/29.jpg)
Experimental Results (2) People
using Information Extraction Total present in site 129
96 correct, 9 wrong Precision 96 / 105 = 91 % Recall 96 / 129 = 74 % F-measure 87 %
Errors Speech and Hearing European Network Department Of
Position Paper The Network To System
![Page 30: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/30.jpg)
Mining Web sites (2)
• Annotates known papers• Trains on annotations to
discover the HTML structure• Recovers co-authoring
information
![Page 31: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/31.jpg)
Experimental Results (1) Papers
discovering publications in the department using Information Integration
Total present in site 320 Using generic patterns + online repositories
151 correct, 1 wrong Precision 151 / 152 = 99 % Recall 151 / 320 = 47 % F-measure 64 %
Errors - Garbage in database!!@misc{ computer-mining,
author = "Department Of Computer", title = "Mining Web Sites Using Adaptive Information Extraction Alexiei Dingli and Fabio Ciravegna and David Guthrie and Yorick Wilks", url = "citeseer.nj.nec.com/582939.html" }
![Page 32: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/32.jpg)
Experimental Results (2) Papers
using Information Extraction Total present in site 320
214 correct, 3 wrong Precision 214 / 217 = 99 % Recall 214 / 320 = 67 % F-measure 80 %
Errors Wrong boundaries in detection of paper names! Names of workshops mistaken as paper names!
![Page 33: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/33.jpg)
User Role Providing …
A URL List of services
Already wrapped (e.g. Google is in default library) Train wrappers using examples
Examples of fillers (e.g. project names)
In case … Correcting intermediate results Reactivating Armadillo when paused
![Page 34: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/34.jpg)
Armadillo Library of known services (e.g. Google, Citeseer)
Tools for training learners for other structured sources
Tools for bootstrapping learning From un/structured sources No user annotation Multi-strategy acquisition of information using redundancy
User-driven revision of results With re-learning after user correction
![Page 35: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/35.jpg)
Rationale Armadillo learns how to extract information
From large repositories
By integrating information from diverse and distributed resources
Use: Ontology population Information highlighting Document enrichment Enhancing user experience
![Page 36: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/36.jpg)
Data Navigation (1)
![Page 37: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/37.jpg)
Data Navigation (2)
![Page 38: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/38.jpg)
Data Navigation (3)
![Page 39: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/39.jpg)
What’s so new about Armadillo? In other systems …
User defined examples are used Generic patters are used that work independently of
the site
In our system … We also make use of
generic patterns & some user defined examples We learn page specific patterns And we integrate information from different sources
![Page 40: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/40.jpg)
IE for SW: The Vision Automatic annotation services
For a specific ontology Constantly re-indexing/re-annotating documents Semantic search engine
Effects: No annotation in the document
As today’s indexes are not stored in the documents No legacy with the past
Annotation with the latest version of the ontology Multiple annotations for a single document
Simplifies maintenance Page changed but not re-annotated
![Page 41: Populating Ontologies for the Semantic Web Alexiei Dingli.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea15503460f94ba455a/html5/thumbnails/41.jpg)
Links Melita
http://nlp.shef.ac.uk/melita/ Armadillo
http://nlp.shef.ac.uk/armadillo/ Amilcare
http://nlp.shef.ac.uk/amilcare/ Gate
http://www.gate.ac.uk AKT Reference Ontology
http://www.aktors.org/publications/ontology/ AKT 3Store
http://triplestore.aktors.org/ More than 40 semantic web technologies
http://www.aktors.org/technologies/ Most of them can be freely downloaded Range from IE tools, semantic portals, annotation tools, semantic
web services, dialogue systems, etc