Principles for knowledge engineering on the Web
-
Upload
guus-schreiber -
Category
Technology
-
view
454 -
download
0
description
Transcript of Principles for knowledge engineering on the Web
Principles for knowledge engineering on the Web
Guus Schreiber
VU University Amsterdam
Computer Science, Web & Media
Overview of this talk
• Semantic Web: the digital heritage case
• Knowledge-engineering principles
• Challenges for Web KE
My journeyknowledge engineering
• design patterns for problem solving
• methodology for knowledge systems
• models of domain knowledge
• ontology engineering
My journeyaccess to digital heritage
My journeyWeb standards
• Web metadata: RDF
• OWL Web Ontology Language
• SKOS model for publishing vocabularies on the Web
SEMANTIC WEB: THE DIGITAL-HERITAGE CASE
The Web: resources and links
URL URL
Web link
The Semantic Web: typed resources and links
URL URL
Web link
ULAN
Henri Matisse
Dublin Core
creator
Painting“Woman with hat”SFMOMA
Vocabulary interoperability: SKOS
Vocabulary representations
• SKOS has been a major success
• Easy to understand and create
• LCSH publication set important example
The myth of a unified vocabulary
• In large virtual collections there are always multiple vocabularies – In multiple languages
• Every vocabulary has its own perspective– You can’t just merge them
• But you can use vocabularies jointly by defining a limited set of links– “Vocabulary alignment”
• It is surprising what you can do with just a few links
Example use of vocabulary alignment
“Tokugawa”
SVCN period Edo
SVCN is local in-house ethnology thesaurus
AAT style/period Edo (Japanese period) Tokugawa
AAT is Getty’s Art & Architecture Thesaurus
Enriching metadata with concepts
Learning vocabulary alignments
• Example: learning relations between art styles and artists through NLP of art historic texts– “Who are Impressionist painters?”
Semantic search: result clustering based on retrieval path
Research issues
• Information retrieval as graph search– more semantics => more paths– finding optimal graph patterns
• Vocabulary alignment
• Information extraction– recognizing people, locations, …– identity resolution
• Multi-lingual resources
Personalized Rijksmuseum
• Interactive user modeling
•Recommendations of artworks and art topics
Mobile museum tour
KNOWLEDGE ENGINEERING PRINCIPLES
Lessons I learned
Principle 1: Be modest!
• Ontology engineers should refrain from developing their own idiosyncratic ontologies
• Instead, they should make the available rich vocabularies, thesauri and databases available in an interoperable (web) format
• Initially, only add the originally intended semantics
Principle 2: Think large!
"Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing."
Doug Lenat
Principle 3: Develop and use patterns!
• Don’t try to be (too) creative
• Ontology engineering should not be an art but a discipline
• Patterns play a key role in methodology for ontology engineering
• See for example patterns developed by the W3C Semantic Web Best Practices group
http://www.w3.org/2001/sw/BestPractices/
• SKOS can also be considered a pattern
Principle 4: Don’t recreate, but enrich and align
• Techniques:– Learning ontology relations/mappings– Semantic analysis, e.g. OntoClean– Processing of scope notes in thesauri
Principle 5: Beware of ontologicalover-commitment!
Principle 6: writing in an ontology language doesn’t make it an ontology!
• Ontology is vehicle for sharing
• Papers about your own idiosyncratic “university ontology” should be rejected at conferences
• The quality of an ontology does not depend on the number of, for example, OWL constructs used
Principle 7: Required level of formal semantics depends on the domain!
• In our semantic search we use three OWL constructs:– owl:sameAs, owl:TransitiveProperty,
owl:SymmetricProperty
• But cultural heritage has is very different from medicine and bioinformatics– Don’t over-generalize on requirements for
e.g. OWL
CHALLENGES FOR WEB KE
Challenge: Linked Open Data
Availability of government data: http://data.gov.uk
The fight for “standard” semantics Schema.org
Challenge: vocabulary alignment methodology
• Multitude of alignment techniques available– Direct syntactic match– Lexical manipulation– Structured, ….
• Precision & recall varies
• Large evaluation initiative– OAEI http://oaei.ontologymatching.org/
Limitations of categorical thinking
• The set theory on which ontology languages are built is inadequate for modelling how people think about categories (Lakoff)– Category boundaries are not hard: cf. art styles– People think of prototypes; some examples are
very prototypical, others less
• We also need to make meta-distinctions explicit– organizing class: “furniture”– base-level class: “chair”– domain-specific: “Windsor chair”
Challenge: new types of search exploiting semantics
Relation search: Picasso, Matisse & Braque
Challenge: combining professional annotations with public “tags”
Challenge: data trust issues
• How can a museum trust annotations of outsiders?
• Need to adapt techniques from closed world to open world
• Ongoing case studies study reputation assessment, use of probability theories, ….
Challenge: event-centred approach => people like narratives
Extracting piracy eventsfrom piracy reports & Web sources
Visualising piracy events
Large-scale experimentation!
TOWARDS WEB SCIENCE
We need to study the Web as a phenomenon
• Web dynamics• Collective intelligence• Privacy, trust and
security• Linked open data• Universal access
Web for Social
Development
48
Acknowledgements
• Long list of people
• Projects: MIA, MultiemdiaN E-Culture, CHOICE, MunCH, CHIP, Agora, PrestoPrime, NoTube, EuropeanaConnect, Poseidon