Drupal and Apache Stanbol
-
Upload
alkuvoima -
Category
Technology
-
view
538 -
download
1
description
Transcript of Drupal and Apache Stanbol
Drupal and Apache Stanbol
SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES
Gabriel Dragomir
• Drupal developer, trainer and consultant
• Founding member of Drupal Romania Association
About me
The Semantic Web
• Tim Berners Lee:
‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’
What’s the hype?
• Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data
• Examples:
• keyword extraction from content: annotate abstracts
• text categorization: organize big volumes of text based on a thesaurus
• media monitoring of tags: occurences of a specific keyword on social media channels
Linked data
• Project started in 2007
• Aimed at building the Web of Data by:
• identifying open access data sets
• converting them into RDF vocabularies
• publish them as open access data sets
Linked data ecosystem
• Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/
• Provides a conceptual map of the vocabularies
• Various providers: libraries, governmental actors, NGOs
Linked data ecosystem
• Where to find other data sets?
• http://www.w3.org/2001/sw/wiki/SKOS/Datasets
• Swoogle: http://swoogle.umbc.edu/
• PoolParty: http://vocabulary.semantic-web.at
Linked data at work!
Semantic annotation
• Creates specific metadata that enable new ways to retrieve and aggregate information
• Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core)
• For more on ontologies see: http://www.w3.org/wiki/Good_Ontologies
• The annotations build semantic
Semantic annotation
• Most common uses:
• Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais)
• Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.
Apache Stanbol on the fly
• Here comes Apache Stanbol
• A new approach:
• modular semantic analysis of documents
• processing components can be built for virtually any language
• flexible workflows via semantic annotation chains
• any vocabulary (Linked Data, custom) can be used
Service oriented architecture
• Stanbol is designed to offer service oriented integration
• RESTful web services API returning RDF or JSON/JSON-LD
• Each component exposes an endpoint independently
• Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling
• Remote component management
Implementation• OSGi layer: Apache Felix and Apache Sling
• Build environment: Apache Maven
• RDF framework: Apache Clerezza
• Triples store, reasoning engine: Apache Jena
• Indexing and semantic search: Apache Solr
• Content analysis/metadata extraction: Apache Tika
• Natural language processing: Apache OpenNLP
Architecture
Components
• Semantic layer:
• Enhancer, EntityHub, ContentHub
• Enhancement engines: internal, 3rd party
• User interfaces
• Knowledge integration (rule sets, reasoners)
• Storage integration
Content enhancement
• Examples:
• retrieve additional metadata for a piece of content
• identify the language of a text
• extract entities (persons, places, organizations)
• create annotations to external sources
• use 3rd party services for named entities recognition
Drupal meets Stanbol
• Several modules implement RDF support allowing data transport to Stanbol semantic annotations
• Taxonomy system allows for complex annotation
• Fieldable taxonomy terms allow for storage of complex semantic data
User scenarios
• Semantic indexing via Stanbol (SOLR yard)
• Content enrichment with semantically related information (documents, factual data, images etc.)
• Tag as you type: dynamic annotation of text in editors
How it works• POST request sends content via REST API
• content is processed by an enhancement chain
• Returns JSON-LD, RDF/XML, RDF/JSON etcJSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format
• for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation
• http://stanbol-yle.jelastic.planeetta.net/demo/enhancer
Drupal integration
Source: blog.iks-project.eu
Drupal distribution: IKS CE
• IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor)
• Components:
• Search API Stanbol
• VIE.js - semantic annotation UI
• https://drupal.org/project/iksce
• http://drupal.org/project/vie
• http://drupal.org/project/search_api_stanbol
• https://github.com/fago/stanbol-for-drupal
Search API Stanbol
• enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub.
• data sent as RDF
• data can be mashed up with data from other sources (Managed Sites, Remote Sites)
VIE.js
• “Vienna IKS Editables”
• JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.
Monolitic vs Decoupled Content Management Systems
• Monolitic vs Decoupled Content Management Systems
source: Henri Bergius - http://bergie.iki.fi
Demo setup
• we store Drupal entities in a SOLR index
• annotations are to be made based on:
• DBPedia - bundled with Apache Stanbol
• a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus
• SemWeb is imported as a SOLR index into Apache Stanbol
Custom vocabularies
• PoolParty Semantic Web
• 224 concepts related to semantic web
• Author: Andreas Blumauer
• http://vocabulary.semantic-web.at/PoolPartySemanticWeb.html
• http://vocabulary.semantic-web.at/PoolPartySemanticWeb/Drupal.html
Demo
• index Drupal entities in Apache Stanbol
• retrieve annotated entites via REST API
• annotate entities using dbpedia and semweb indexes
• edit Drupal entities and annotate on the fly
• retrieve linked data tag recommendations
Questions?
Thank you!