ESWC SS 2013 - Monday Introduction John Domingue: The 3rd ESWC Summer School
VocBench - pdfs.semanticscholar.org · GWT / Presentation 03/06/2015 12th ESWC ... Under-the-hood...
Transcript of VocBench - pdfs.semanticscholar.org · GWT / Presentation 03/06/2015 12th ESWC ... Under-the-hood...
University of
Rome
Tor Vergata
VocBench
A Web Application for
Collaborative Development of Multilingual Thesauri
Armando Stellato+, Sachit Rajbhandari*, Andrea Turbati+, Manuel Fiorelli+
Caterina Caracciolo*, Tiziano Lorenzetti+, Johannes Keizer*, Maria Teresa Pazienza+
+ART Group, Dept of Enterprise Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy
*Food and Agricultural Organization of the United Nations (FAO), Viale delle Terme di Caracalla, 00153 Rome, Italy
Contacts: {surname}@info.uniroma2.it {name.surname}@fao.org
Portoroz, Slovenia, 31st May - 4th June 2015
Why was it built?
AGROVOC (big agriculture vocabulary developed by FAO)
– >32 000 concepts in up to 22 languages
– A global group of terminologists.
– No tool to support their work
– No existing tool that met all of FAO’s needs
03/06/2015 12th ESWC, Portoroz, Slovenia 2
V1.0 – 2010
• Google Web Toolkit (for the Web Application)
• Lucene (for label indexing & free-text search)
• Protégé API
– DB backend
• (later) OWLART API
• MySQL
• Custom model for
thesaurus
representation
Business logic
MySQL
Protégé 3.4OWLART API
GWT / Presentation
03/06/2015 12th ESWC, Portoroz, Slovenia 3
V1.0 – 2010
03/06/2015 12th ESWC, Portoroz, Slovenia 4
V1.x Problems
• Could not support other triple stores (Glued to Protégé API)
• Custom representation model
• No support for emerging standards, e.g. SKOS
• I/O
– No import
– Complicated export
• No support for alignments
– AGROVOC aligned to a dozen other vocabularies
• No SPARQL interface
03/06/2015 12th ESWC, Portoroz, Slovenia 5
03/06/2015 12th ESWC, Portoroz, Slovenia 6
Towards VB2.0…
Many of VB1.x limitations derived from the absence of a true RDF Backend
• not just connection to a RDF triple store
• but a proper abstraction layer providing high level functionalities for ontology/thesaurus management
Driving lines for VB2.0
• A completely rebuilt backing framework for the service and data layers, based on an already existing open source project:
Semantic Turkey [1]
– Based on OSGi Open Services Gateway
– Open Connectibility to most notable RDF middleware and triple storing technologies (Sesame2, OWLIM, Allegrograph, Jena (not maintained) )
– Native support for SKOS and SKOSXL over RDF (no more conversions from internal legacy models), other than OWL
• major reworking
– all changes under-the-hood, and leaving user experience almost unchanged.
– New features added in the following versions
[1] http://semanticturkey.uniroma2.it/
Maria Teresa Pazienza, Noemi Scarpato, Armando Stellato and Andrea Turbati Semantic Turkey: A Browser-Integrated Environment for Knowledge
Acquisition and Management, Semantic Web Journal, vol. 3, no. 3, 2012
Objectives and Requisites for a V2
R1. Multilingualism
R2. Controlled Collaboration
R3. Data Interoperability and Consistency
R4. Software Interoperability/Extensibility
R5. Scalability
R6. Under-the-hood data access/modification
R7. Ease-of-use for both users and system administrators
03/06/2015 12th ESWC, Portoroz, Slovenia 7
…and here we are
R1. Multilingualism
…and multilingual UI: (currently: English, Spanish, Dutch, Thai)
multilingual editing
…and visualization
R2. Controlled Collaboration (1/3)
Role-based Access Control
03/06/2015 12th ESWC, Portoroz, Slovenia 10
R2. Controlled Collaboration (2/3)
Formal Editorial Workflow
• Following the full life-cycle of concepts/terms, from proposal to deprecation
• Supported by Role-based Access Control
an example of a
typical workflow:
GUEST
<concept-create>
Proposed by guest
VALIDATOR
<validates>
Validated
PUBLISHER
<publishes>
Published
TERM EDITOR
<concept-edit>
Revised
ADMINISTRATOR
<validates>
Published
ONTOLOGY EDITOR
<concept-delete>
Proposed deprecated
PUBLISHER
<validates>
Deprecated
03/06/2015 12th ESWC, Portoroz, Slovenia 11
Recent Changes
• Available through a
dedicated module
• or as RSS feeds
includes both:– User changes
– Content changes
03/06/2015 12th ESWC, Portoroz, Slovenia 12
R2. Controlled Collaboration (3/3)
R3. Data Interoperability and Consistency (1/3)
Formats
• Import/Export in all popular RDF serialization formats
• Concrete availability of the various formats depend however on the connected triple store/RDF middleware
Models
• VocBench adopts a SKOS-XL + reified skos:definitions model
• Import of SKOS core data
– Refactoring for SKOSSKOS-XL
and skos:definition reification
• Export
– SKOS-XL:
• “All contents” or
• Filtered export based on broader
concept/schems
– SKOS: options for removing/keeping
reified labels and definitions
Vocabularies
• Possibility to owl:import any existing vocabulary,
from the web or from local files.
• Availaibility of a caching mirror for previously imported vocabularies
R3. Data Interoperability and Consistency (2/3)
Integrity / Consistency
• VB features a complex multi-scheme
management of thesauri
• Actions creating potential breaks in the
structure (e.g. breaking reachability of
a concept) are forbidden
• To deal with imported data, Integrity
Constraint Validation checks have
been included in the platform
– Currently, only dangling concepts have
been deal with
– More to come, already available as
services from ST
Alignment
R3. Data Interoperability and Consistency (2/3)
03/06/2015 12th ESWC, Portoroz, Slovenia 15
R4. Software
Interoperability/Extensibility
Triple Store Agnostic
• OWLART API provide:
– a very tiny layer over existing middlewares (e.g. Sesame, Jena)
– High-level “vocabulary layer” for OWL, SKOS, SKOS-XL
• What triple stores do we currently support and which connectors are
actively maintained?
– Sesame2 (standard internal triple stores, both in-memory and native)
– GraphDB/OWLIM (through Sesame remote connection, and an
optional parameter expressly dedicated to cover the different
management of graphs wrt Sesame)
– Other partners have experimented with other triple stores
https://art-uniroma2.atlassian.net/wiki/display/ST/Accessing+Various+Triplestores
– Past experiments with Allegrograph and Jena Middleware
• For GraphDB/OWLIM SE, we exploit its free-text indexing capabilities
03/06/2015 12th ESWC, Portoroz, Slovenia 16
Vendor data access layerVendor Triple store
High-level data access
Raw triple access
Vendor data access layer
OWLART API
Semantic Turkey
Business logic
GWT/Presentation
R5. Scalability
Performance
• Information is provided to the frontend as much as possible in an incremental fashion (e.g., each level of
the concept hierarchy, as nodes are expanded).
• Interfaces reverts to limited content and search-filtering for potentially exploding results
Maintenance
• ST offers a meaningful core set of RDF services…
• …however many functionalities (especially in UI) require the composition of several calls.
• Solution: combo of:
– per-service ad-hoc solutions (heavy weight single services realizing specific functionalities)
– general development facilities for the injection of additional information into common API calls (e.g. the rendering of
RDF resources is available as an extension point, with different implementations being dynamically injectable into the
SPARQL queries of several services).
03/06/2015 12th ESWC, Portoroz, Slovenia 17
R6. Under-the-hood data access/modification
Embedded SPARQL Editor
Syntax highlight…
…completion…
…and validation
R7. Ease-of-use for both users
and system administrators
Continuous check-on-start life cycle
• VB technically never recognizes itself as
installed/deployed
• At each startup it checks that the
complete set of pre-requisites for a
correct start is satisfied.
• Whenever a new VB version is installed,
if new features have been introduced, or
mandatory configuration options added,
or the database requires update batches,
the system will identify these needs and
react accordingly, eventually interacting
with the user upon necessity
03/06/2015 12th ESWC, Portoroz, Slovenia 19
Statistics Module
03/06/2015 12th ESWC, Portoroz, Slovenia 20
Three layered extensible
architecture
• Presentation Layer
– GWT (Google Web Toolkit)
Vocbench User Interface (Mozilla apps
in the original framework)
• Services Layer
– Enables communication between the
client (Vocbench UI) and the
ontology persistence layer.
– HTTP based Services accessed
through the Ajax paradigm
– OSGi Extensible Servicing System
• Persistence Layer
– Access to ontological knowledge.
– Based on dedicated ontology API,
which can be implemented through
use of different technologies.
Vocbench 2.0 (and ST) Architecture
2103/06/2015
Vocbench 2.0 (and ST) Architecture
03/06/2015 12th ESWC, Portoroz, Slovenia 22
Front end Back end
Administrative
Database
(MySQL)
Triple Store
Middleware
Hibernate
Layer
Semantic
Turkey/
OWLART
API
Gilead
Service
Wrapper
Layer
Google Web
Toolkit
(GWT)
Graph
Visualization
GWT
Incubator
Web services
VB “desktop version”:
Semantic Turkey for Firefox
Related Works (1/2)
• PoolParty: http://www.poolparty.biz/ [18]. Web-based Editor for Thesauri using Linked Data
– Support for SKOS (optional add-on for SKOS-XL)
– Use: Commercial license (Evaluated thanks to a free evaluation account for PoolParty Advanced Server version 4.5.1 (rev 5429) )
– Version Tracking is supported, as the system performs access control to some extent.
– An add-on further enables an approval workflow based on the existing role based access control mechanism.
– Editing history is shown both at project level and at entity level.
– Alignment: lookup over LOD, or different projects can be linked together
– Publishes a SPARQL endpoint, dereferenceable URIs, and a wiki with limited editing capabilities.
– Quality criteria: can enforced interactively (i.e., illegal operations are blocked), or violations are simply recorded in a quality report.
– Backed by Sesame middleware
– Incorrect multiple scheme support (violates non -entailment of scheme containment along concept hierarchies, section 4.6.4 of the SKOS Reference [1] )
• TemaTres: http://www.vocabularyserver.com/ Web-based Editor for Controlled Vocabularies
– term-based meta-model, no native support for SKOS
– Use: Free and open-source
– due to the term-based nature of the model, the export to SKOS is often confusing (e.g. two synonyms terms exported as two different concepts)
– Monolingual (though alignments between vocabularies)
– No multiple scheme support (each thesaurus is a scheme)
– Rigid access control mechanism based on user roles (administrator, editor, guest).
• Workflow management: term transition from candidate status to either accepted or rejected. “Accepted” cannot be reverted, even after modifications
– Data quality: metrics and a flexible reporting generator.
– Connectivity: available API and a few plugins (e.g. for publication over different platforms, such as WordPress) are available
03/06/2015 12th ESWC, Portoroz, Slovenia 24
Related Works (2/2)
• TopBraid EVN: Web-based Editor for Business Vocabularies http://www.topquadrant.com/products/topbraid-enterprise-vocabulary-net/
– Support for SKOS, OWL Ontologies and Content tag sets
– Use: Commercial license (We didn’t carry extensive evaluation as we did not receive the evaluation license we requested)
• SKOSEd: https://code.google.com/p/skoseditor plugin for Protégé 4.x for editing SKOS thesauri
– Support for SKOS
– Use: free of use and open-source. We have evaluated version 1.0-alpha(build04) on Protégé 4.1 as, 2.0-alpha has a bug related to scheme
management
– Desktop tool (no web application)
– Ontology editing, SKOSEd allows interweaving SKOS and OWL constructs (defect: same form for skos:Concept and skos:ConceptSchemes
– Incorrect concept scheme management (same as PoolParty)
– Being an extension of Protégé 4.x, SKOSEd may not be used in conjunction with the collaboration framework developed for Protégé 3.x
• Web Protégé: http://webprotege.stanford.edu [16] Collaborative Web-based Ontology Editor
– No support for SKOS/SKOS-XL (supports OWL/OBO editing)
– Use: local Installation or service via public portal. Free of use in both cases
– Collaboration: based on the collaboration plugin for Protégé 3 [17], providing:
• Change tracking
• Inline discussions and notifications.
• Access control mechanism for user groups, based on configurable policies enforced at various granularities.
– Completely configurable user interface
– Available API
03/06/2015 12th ESWC, Portoroz, Slovenia 25
Functional Comparison
Name LicenseFree to
use
Deployme
nt
Data
Models
Import/
Export
Scheme
Managem
ent
Custom
RelationsReasoner
Data
quality
Extendibility /
InteroperabilityACL
Workflow
Managem
ent
Collaborati
on,
Content
Validation
RDF
Middlewar
e
RDF
Backend
SPARQL
QueryingSemantic Integration
VocBench
GNU GPL
v3 (web
application)
, Mozilla
Public
License
MPL
(Semantic
Turkey)
YesWeb
application
SKOS-XL,
SKOS
through
offline
scaling
tool
SKOS(-
XL),
versatile
spreadshe
et import
(through
ST Firefox
UI)
Yes
Creation,
Import,
use
Depends
on triple
store
Metrics
API, shared
backend,
pluggable
Yes Yes
Change
feed,
validation
OWL ART
API
(connector
s to
others:
Sesame2
bundled)
provided
by
Sesame2,
or other
connectors
Yes
assisted
(browse&search) linking
of resources from other
projects / manual linking
of LOD resources.
Extensions for RDF
lifting from unstructured
content
PoolParty Proprietary NoWeb
application
SKOS,
SKOS-XL
add-on
SKOS(-
XL),
static
spreadshe
et import
Only top
concepts
Creation,
Import,
use
Depends
on triple
store
Metrics
Validation
rules
REST API YesYes (add-
on)
History,
versioning,
validation
Sesame
SAIL API
provided
by
Sesame2
Yes
Linking
Text Mining & Entity
Extraction, Search
function
WebProtégé
Mozilla
Public
License
(MPL)
YesWeb
application
OWL 2,
OBOOWL
Not
applicable
Creation,
Import,
use
No,
external
reasoning
possible
Metrics
API, shared
backend,plugg
able
Yes No
Discussion
, watching,
changes
feed
OWL API
provided
by Protégé
3
No linking to BioPortal
TemaTres
GNU
General
Public
License
version 2.0
(GPLv2)
YesWeb
application
Term
based
thesaurus
organizatio
n
MADS,
SKOS-
Core,
Zthes,
Others
Import
from:
Skos-
Core,
tabulated
or tagged
text file
One
scheme
per
vocabulary
Creation,
useNo
Metrics,
ReportsAPI
Yes;
limited
Yes;
limited
Limited
validation
No RDF
Middlewar
e, SKOS
RDF/XML
available
only as an
export
Relational
database
(MySQL
by default)
Not native,
no realtime,
can export
data to a
SPARQL
endpoint
through
ARC2 (RDF
library for
PHP)
Linking between
vocabularies, Entity
Extraction (via addon)
SKOSEd
GNU
Lesser
GPL
YesDesktop
applicationSKOS SKOS
Only top
concepts
Creation,
Import,
use
Depends
on
available
plugins
KB
consistenc
y
Pluggable No No No
OWL API
(used by
Protégé 4)
provided
by Protégé
4 (OWL
API)
Yes
(inherited
from
Protégé 4)
N/A
Example 9 (non-entailment)
<A> skos:narrower <B> .
<A> skos:inScheme <MyScheme> .
does not entail<B> skos:inScheme <MyScheme> .
03/06/2015 12th ESWC, Portoroz, Slovenia 27
Here is a list of relevant organizations* adopting, or close to adopting, VB2.0:
• Food and Agriculture Organization (FAO) > AGROVOC, Biotechnology, Land and Water, FAO Topics
• EU Documentation Office > EUROVOC
• Italian Senate > TESEO
• European Environment Agency (EEA) > GEMET
• Harvard University > Unified Astronomy Thesaurus (UAT)
• EC Parliament Library
• Agence Nationale de la Recherche > Infrastructure nationale AnaEE France
• CABI
• United Nations Convention to Combat Desertification (UNCCD)
• Scottish Government > Gov metadata
• Columbia University > IEDA Thesaurus * http://aims.fao.org/tools/vocbench/partners
User Community and Evaluation (1/2)
2 VB community mailing lists:
- VB user forum
- VB developer forum
User Community and Evaluation (2/2)
USE Values Feature Evaluation
03/06/2015 12th ESWC, Portoroz, Slovenia 28
UsefulnessEase of
use
Ease of
learningSatisfaction
Global 5,34 4,49 5,11 4,93
Experienced 5,58 4,66 5,18 5,02
Inexperienced 4,97 4,19 5,00 4,79
easy to use effective interesting
History 5,38 5,50 6,33
SPARQL Querying 4,00 5,40 6,29
Publication Workflow
Management5,50 5,63 6,22
Collaborative
Management5,75 5,88 6,11
Scheme Management 4,83 5,17 5,57
Role-based Access
Control5,33 5,22 5,40
Reasoning 4,29 4,43 5,38
Triple Store Connectivity 3,67 4,50 5,00
Online Questionnaire: http://vocbench.uniroma2.it/purl/VocBench-User-Questionnaire_2014-10.zip
USE* questionnaire: http://hcibib.org/perlman/question.cgi?form=USE
values ranging from 1 to 7
collected 11 anonymous responses
Lund, A.M. (2001) Measuring Usability with the USE Questionnaire. STC Usability SIG Newsletter, 8:2.
Why should I "buy" it?
Collaborative Management
– Validation&Publication Workflow (propose, validate, publish, revise, deprecate…)
– Fine grained user management
• both users and functionalities may be associated in groups"
• Functionalities (or groups of) may be assigned to different users (or groups of)
– Full editing history (not only concepts, but most of the actions can be subject to validation too)
– RSS Feeds
– Fine-grained metadata and editorial notes: SKOS-XL and reified definitions allow for timestamped status and rich editorial notes
Multilinguality
– Strong support for multi-lingual thesauri management
– Application itself is also multilingual (currently support for english, dutch, spanish, more languages coming)
Native RDF support
– Support for different triple stores
– Possibilty to SPARQL query/update through a dedicated interface with syntax completion/highlight
– SKOS-XL management
• If preferred, SKOS-core export through available conversion tools
Large scale thesauri management
– Scalability limited only by the underlying triple store
Extensibility
– OSGi connectable services
Advanced skos:ConceptScheme Management
– SKOS allows for non-trivial management of multiple conceptual schemes, which is fully supported by VB
And, last but not the least: Free and Open Source! (http://vocbench.uniroma2.it)
03/06/2015 12th ESWC, Portoroz, Slovenia 29
Future works
• A more dynamic framework for content validation
– Trade-off between extensibility/flexibility and the strong
controlled approach
• ICV: more checks (available from the ST engine)
• Overcome extensibility limitations of GWT
• More interaction with the Linked Data
– Improved Alignment
– Generation of VoID / LIME descriptions
03/06/2015 12th ESWC, Portoroz, Slovenia 30
Contacts
VocBench site: http://vocbench.uniroma2.it/
VocBench pages@FAO: http://aims.fao.org/tools/vocbench-2/
You can also follow VB by registering to:
• AIMS Community Site: http://aims.fao.org/ (you can selected the topics you
are interested into)
• VocBench Mailing Lists:
– User: http://groups.google.com/group/vocbench-user
– Developer: http://groups.google.com/group/vocbench-developer
• Semantic Turkey Mailing Lists:
– User: http://groups.google.com/group/semanticturkey-user
– Developer: http://groups.google.com/group/semanticturkey-developer
03/06/2015 12th ESWC, Portoroz, Slovenia 31
03/06/2015 12th ESWC, Portoroz, Slovenia 32/XX
TIME FOR QUESTIONS :-)
03/06/2015 12th ESWC, Portoroz, Slovenia 33
…oh, and there’s a
demo tomorrow!