Semantic Web 0 (0) 1 IOS Press metaphactory: A Platform ... · 2 P. Haase et al. / metaphactory: A...

Semantic Web 0 (0) 1 1IOS Press

metaphactory: A Platform for KnowledgeGraph ManagementPeter Haase a, Daniel Herzig a, Artem Kozlov a, Andriy Nikolov a,* and Johannes Trame a

a metaphacts GmbH, Walldorf, GermanyE-mails: [email protected], [email protected], [email protected], [email protected],[email protected]

Abstract. In this system paper we describe metaphactory, a platform for building knowledge graph management applications.The metaphactory platform aims at supporting different categories of knowledge graph users within the organization by realizingrelevant services for knowledge graph data management tasks, providing a rich and customizable user interface, and enablingrapid building of use case-specific applications. The paper discusses how the platform architecture design built on open standardsenables its reusability in various application domains and use cases as well as facilitates integration of the knowledge graph withother parts of the organizational data and software infrastructure. We highlight the capabilities of the platform by describingits usage in four different knowledge graph application domains and share the lessons learnt from the practical experience ofbuilding knowledge graph applications in the enterprise context.

Keywords: Knowledge graph, Data management, End-user platform, Industrial application

1. Introduction

In the recent years, knowledge graph technologiesestablished a solid position in the enterprise world,serving as a central element in the organizational datamanagement infrastructure. Knowledge graphs arebecoming both the repository for organization-widemaster data (ontological schema and static referenceknowledge) as well as the integration hub for vari-ous legacy data sources: e.g., relational databases ordata streams. However, the tool support for managingknowledge graphs and building custom applications ontop of them is still limited. To support organizationsin managing and making use of knowledge graphs, wecreated the metaphactory platform which covers thewhole lifecycle of knowledge graph applications: fromdata extraction & integration, storage, and querying tovisualization and data authoring.

In order to exploit the capabilities of knowledgegraph technologies to the maximal extent, an organiza-tion requires many supporting functionalities beyond

*Corresponding author. E-mail: [email protected].

merely being able to store data as a graph and query it.Within a large organization, these functionalities haveto provide support for different user groups: from layusers who merely want to explore the data in a conve-nient way to expert users who manage and modify theknowledge graphs and internal developers who createtargeted applications customized for specific end usergroups. Based on these diverse needs, such functional-ities include:

– Data management: Common management taskscan be time-consuming and expensive withoutsupporting tools. Even assistance in such ba-sic functionalities like SPARQL querying withgraphical UI, auto-suggestions, and query cata-log can already go a long way in uncovering theadded value of knowledge graphs. Moreover, sup-port for knowledge graph authoring, includingboth schema ontologies and the instance data, aswell as integration of data from other sources ofdifferent types are the features needed in almostany enterprise use case.

– End-user oriented interaction: The user has tobe able to interact with the knowledge graph in

1570-0844/0-1900/$35.00 c© 0 – IOS Press and the authors. All rights reserved

mailto:[email protected]






2 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

an intuitive and user-friendly way. This includes,for example, the need for navigation, explorationand visualization of knowledge graphs, rich se-mantic search with visual query construction andfaceting, and a simple yet powerful interface fordata authoring and editing.

– Rapid application development: Generic support-ing toolkit must enable creating use case-specificuser applications with minimal effort.

We developed the metaphactory platform to realizethese functionalities while primarily aiming at largeenterprises and organizations. The challenges of theuse case scenarios in such organizations necessitatecertain design principles which a generic knowledgegraph management platform has to follow:

– Reusability: The platform can be used in a greatvariety of contexts, domains, and use cases toserve multiple categories of users. All aspects andfunctionalities of a generic platform must be cus-tomizable to enable it to be reused in any context.Whenever possible, the design must avoid any in-herent assumptions about the data and user needs.The use of generic open standards such as RDF1,SPARQL2, SHACL3, and others where availablehelps to achieve this.

– Compatibility: In most cases, the knowledgegraph constitutes only a part of the organiza-tional data infrastructure and must be used incombination with other elements: non-RDF datasources, data processing APIs, external data an-alytics tools. A knowledge graph managementplatform must be compatible and envisage inte-gration with other elements of the infrastructureboth at the input level (combining different kindsof data for transparent access) as well as at theoutput level (enabling knowledge graph data tobe consumed by external tools). Open interfacesshould be used whenever possible to simplifysuch integration.

– Extensibility and customization: A platform whichis intended to be reused in different domains anduse cases must provide an expert user with capa-bilities to build custom targeted applications forher needs with only a minimal support from theplatform vendor.

1https://www.w3.org/RDF/2https://www.w3.org/TR/sparql11-query/3https://www.w3.org/TR/shacl/

In this paper we present the key concepts of themetaphactory platform showing how these designprinciples help to address the knowledge graph man-agement challenges and share our experiences andlessons learnt in applying the platform in diverse real-world use cases. The rest of the article is structuredas follows. Section 2 provides a high-level overviewof the architecture of the platform and its main build-ing blocks. Further sections explain the functionali-ties of each architecture layer in detail. Section 3 de-scribes how the platform manages heterogeneous datasources and provides a uniform querying mechanism.Section 4 outlines the set of services realized by theplatform and exposed for consuming. Section 5 out-lines the design decisions behind the customizable UIof metaphactory and discusses different supported in-teraction paradigms. In section 6 we describe four ap-plication scenarios of the platform and discuss the ex-periences and lessons learnt. Finally, section 7 con-cludes the paper and provides the directions for futurework.

2. Architecture

Figure 1 shows a high-level overview of the archi-tecture of the metaphactory platform developed to pro-vide a variety of functionalities aiming at different tar-get user groups. One such group includes the end userswithin the organization. These users are not interestedin the internals of the knowledge graph data structureor technical complexities of data access and integra-tion. The expert users and data architects that have toauthor and modify the knowledge graphs and incorpo-rate legacy data are primarily interested in convenientand efficient tools to make data management opera-tions easier. Finally, application developers within theorganization require that it is easy to incorporate theknowledge graph infrastructure into the overall soft-ware infrastructure of the organization and build newtargeted end-user applications exploiting the knowl-edge graph with a minimal effort. The architecture wasdesigned to reconcile the diverse requirements of thesegroups while maintaining the design constraints out-lined above: enabling reusability in different domains,compatibility with other data sources and applications,and supporting extensibility and customization.

The platform operates on top of a graph databasestoring the knowledge graph. The communication isperformed by the data access infrastructure of theplatform using standard SPARQL 1.1 queries, which

https://www.w3.org/RDF/

https://www.w3.org/TR/sparql11-query/

https://www.w3.org/TR/shacl/

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 3

Fig. 1. metaphactory system architecture.

makes the system independent from a specific databasevendor. Depending on the customer needs, this al-lows the platform to connect to various different triplestores or even non-RDF sources virtually integratedand exposed via a SPARQL interface: e.g., a rela-tional database integrated using R2RML4 mappings ora custom keyword search index. To be able to interactwith multiple data sources using virtual data integra-tion, the platform contains a SPARQL federation en-gine Ephedra [1], which realizes SPARQL 1.1 queryfederation over both SPARQL endpoints and customcompute services.

On top of the data access infrastructure, the platformservices layer implemented at the platform backendside realizes a range of generic functionalities for inter-action with knowledge graphs. The services extend thecapabilities of the standard SPARQL access normallyprovided by triple stores and offer the specific capa-bilities to serve the needs of each target user groups:simplify the communication between web-based end-

4https://www.w3.org/TR/r2rml/

user UI and the knowledge graph, perform knowl-edge graph management operations over the ontolog-ical schema and data required by expert users andknowledge graph maintainers, and enable easy interac-tion with other tools, which is needed by in-house de-velopment teams. These services are exposed as RESTAPIs to be consumed by the client-side user interfaceas well as by third-party tools: e.g., Tableau5 or KN-IME6 for data analytics. One critical functionality im-plemented at this level is the user access control: theplatform allows configuring fine-grained access at thelevel of specific APIs available to each user role.

The web-component based user interface is built us-ing a customizable templating mechanism. Each URIresource in the knowledge graph is visualized using anHTML page. While one can design a separate HTMLpage for each resource, in the majority of cases this isnot required: it is possible to specify template pagesthat can be applied to any resource of the same type.

5http://www.tableau.com6http://www.knime.com

https://www.w3.org/TR/r2rml/

http://www.tableau.com

http://www.knime.com


When visualizing a data resource, the platform triesto select an appropriate template among the availableones. This template then gets rendered using the actualURI of the resource. The metaphactory client-side UIprovides a plethora of customizable UI components forinteracting with semantic data: from table-based andgraph-based visualizations to structured search envi-ronments and data authoring controls based on knowl-edge patterns. Each one represents an HTML5 compo-nent that can be directly inserted and configured withinHTML code.

In this way the user interface can be easily cus-tomized for the specific application use case at hand.The use of the standard HTML5 format for storingclient-side UI views enables an expert user to createand edit her own interface. Use case-specific configu-ration parameters and UI templates can be packaged asa separate app and added to the default platform instal-lation to realize a custom domain-specific knowledgegraph management application.

3. Data Access Infrastructure

The metaphactory data access infrastructure is builton top of the RDF repository that serves as the storagefor the managed knowledge graph data. The SPARQL1.1 query language supported by most available RDFdatabases serves as a common communication proto-col. The platform reuses a popular RDF4J framework7

to realize access to data repositories. In this way, (a)the platform can be installed in any environment thatalready includes a triple store chosen by the customerorganization as well as (b) enables selecting any datastorage solution based only on the use case require-ments without introducing additional constraints onits own. In particular, metaphactory officially supportsmost of the well-known triple store solutions avail-able on the market, e.g. Blazegraph8, Stardog9, Ama-zon Neptune10, GraphDB11, Virtuoso12, and others.

While the platform requires the existence of onemain default RDF repository containing domain data,it allows working with multiple repositories as well:the platform’s repository manager enables configuring

7http://rdf4j.org/8https://www.blazegraph.com/9https://www.stardog.com/10https://aws.amazon.com/neptune/11https://ontotext.com/products/graphdb/12https://virtuoso.openlinksw.com/

and managing connections to many data repositoriesin a declarative way. In particular, this enables manag-ing domain data separately from the system data suchas saved SPARQL queries or SHACL data quality re-ports. Different data sources maintained by the repos-itory manager are described in RDF using the RDF4Jrepository configuration ontology. These repositoriescan include not only native RDF triple stores but otherdata sources accessible via SPARQL: most impor-tantly, relational databases virtually integrated usingR2RML mappings and exposed via SPARQL with anontology-based data access engine, either a separateone like Ontop [2] or one integrated with a triple storelike Stardog.

In many use case scenarios there arises a need tohandle hybrid information needs that require com-bining information from multiple data sources. Theseneeds are characterized by such dimensions as:

– Variety of data sources: The data to be integratedis often stored in several physical repositories.Such repositories can include both RDF triplestores and datasets that are only virtually pre-sented as RDF: e.g., relational databases exposedusing R2RML mappings.

– Variety of data modalities: RDF graph data of-ten needs to be combined with other data modali-ties: e.g., textual, temporal, or geospatial data. Tobe able to integrate those, SPARQL queries needto support special extensions for full-text, spatial,and other corresponding types of search.

– Variety of data processing techniques: Relevantdata is often not stored directly in some reposi-tory, but has to be computed by some dedicateddomain-specific services: e.g., graph analytics(finding the shortest path or interconnected graphcliques), statistical analysis and machine learning(applying a machine learning classifier, findingsimilar entities using a vector space model), etc.

An application scenario can require dealing withseveral of these aspects simultaneously. While fed-erated query processing appears a natural way foron-the-fly integration of diverse data sources, the re-search effort, however, mainly concentrated on achiev-ing the transparent query federation over native RDFdatasets as opposed to the hybrid query challenges.Existing approaches either utilize meta-level informa-tion about federation members in order to build anoptimal query plan (e.g., DARQ [3], SPLENDID [4],ANAPSID [5], or HiBISCuS [6]) or use special run-time execution techniques targeting remote service

http://rdf4j.org/

https://www.blazegraph.com/

https://www.stardog.com/

https://aws.amazon.com/neptune/

https://ontotext.com/products/graphdb/

https://virtuoso.openlinksw.com/


queries (e.g., FedX [7]). Among the approaches target-ing hybrid query processing, SCRY [8] and Quetzal-RDF [9] deal with calling data processing servicesusing SPARQL queries. Quetzal-RDF defines customfunctions and table functions (generalized aggregationoperations) and invokes them from a SPARQL query,but does not follow the SPARQL 1.1 syntax. SCRYconforms to SPARQL 1.1 using special GRAPH tar-gets to wrap service invocations, although it cannotdistinguish between multiple input/output parameters.

Given the limitations of existing solutions, to ad-dress these challenges we implemented Ephedra: aSPARQL federation engine for hybrid queries [1]. Weadopt the SPARQL 1.1 federation mechanism usingthe SERVICE keyword, but broaden its usage to enablecustom services to be integrated as data sources andoptimize such hybrid SPARQL queries to be executedefficiently.

Ephedra defines a common implementation inter-face, in which interactions with external services areencapsulated in an RDF4J SAIL module13. In this way,a custom compute service can be registered in therepository manager as yet another SPARQL repositoryand referenced inside SERVICE clauses in SPARQLqueries. SPARQL graph patterns specified inside suchSERVICE clauses are parsed to extract input parame-ters for a service call as well as the variables to bind theresults returned by the service. The Ephedra SPARQLquery execution strategy sends the sub-clauses of aquery to corresponding data sources, gathers partial re-sults, combines them using union and join operations,and produces result sets. In this way, processing hybridqueries is transparent and performed in the same wayas ordinary SPARQL queries.

In order to express custom service requests as part ofSPARQL queries, hybrid services integrated throughEphedra are declaratively described using the Ephedraservice descriptor ontology. This ontology extends thewell-known SPIN14 ontology to define the acceptedgraph patterns, input arguments, and output results.For example, a wrapper for a custom service that re-trieves similar entities based on the proximity in theword2vec vector embedding space looks like the fol-lowing:

:Word2VecSimilarityService a eph:Service ;rdfs:label "A wrapper for the word2vecsimilarity service." ;eph:hasSPARQLPattern (

13http://docs.rdf4j.org/sail/14http://spinrdf.org/

[sp:subject :_entity ;sp:predicate mph:hasSimilar ;sp:object :_similar

]) ;spin:constraint[a spl:Argument ;rdfs:comment "URI of the search entity" ;spl:predicate :_entity ;spl:valueType xsd:anyURI

] ;spin:column[a spin:Column ;rdfs:comment "URI of a similar entity" ;spl:predicate :_similar ;spl:valueType xsd:anyURI

] .

Based on this descriptor, Ephedra will be able to in-terpret and process the following query answering thequestion “Which painters are similar to Rembrandt?”and returning service outputs among the query result:SELECT ?artist ?label WHERE {SERVICE mpfed:wikidataWord2Vec {

wd:Q5598 mph:hasSimilar ?artist .}?artist wdt:P106 wd:Q1028181 . # painter?artist rdfs:label ?label .

}

With this approach, services with standardized in-terfaces (e.g., REST APIs) can be included into theEphedra federation in a fully declarative way, withoutthe need to implement specific service wrappers.

Processing hybrid queries including custom servicecalls requires modifying the query processing proce-dures, because a hybrid query does not follow thestandard SPARQL semantics. Processing a SERVICEclause realizing a custom service call requires all in-put parameters to be bound, which means that the joinoperands cannot be arbitrarily re-ordered. For this rea-son, Ephedra implements dedicated static query opti-mization strategies, which produce an optimal and ex-ecutable query plan.

4. Platform Services

The platform backend realizes a range of servicesthat implement additional functionalities on top ofthe standard interfaces of triple stores. These servicesstreamline the specific types of interactions with theknowledge graph that are required by different tar-get user groups. They include (a) convenience servicesthat realize commonly required tasks that are usuallynot provided by triple stores directly and (b) connec-tor services that make data from the knowledge graphconsumable by applications.

http://docs.rdf4j.org/sail/

http://spinrdf.org/


Convenience services implement commonly re-quired routines that are either too cumbersome to berealized by sending SPARQL queries directly or re-quire additional tools (e.g., a separate keyword searchindex). These services are implemented in a genericway avoiding dependencies on a particular triple storeand/or ontology. A simple example of a convenienceservice is a generic label service retrieving a human-readable label for a given resource. The service canbe configured to deal with various modelling pat-terns (e.g., taking into account not rdfs:label, butskos:prefLabel or even property paths) and languagepreferences. Such services provide the functionalitiesutilized by the end-user interface components.

Another set of generic convenience services realizesthe knowledge graph management tasks required byexpert users and their tools:

– Linked Data Platform (LDP) service for resource-based access, creation, update, and deletion oflinked data according to the W3C LDP specifica-tion15

– Data quality service for performing data valida-tion using SHACL constraints.

– Query catalog service for saving and managingreusable SPARQL query templates (expressed us-ing the SPIN ontology).

– Keyword search service based on the Graph-Scope16 data search engine integrated into theplatform.

Manual modifications of a knowledge graph as wellas bringing in transformed legacy data sources causethe need to verify and maintain the knowledge graphdata quality: its adherence to the schema ontologiesand domain constraints. To this end, the metaphactoryplatform utilizes the SHACL standard for defining andverifying the data constraints. SHACL provides an on-tology for expressing data constraints in the form ofshapes: RDF structures describing sets of restrictionsthat the data in the main knowledge graph must con-form to. SHACL allows defining complex combina-tions of constraints more expressive than those sup-ported by OWL. The metaphactory platform supportsboth SHACL constraints defined manually by the useras well as automatically generated from the OWL on-tology. Auto-generator bootstrapping rules process theRDFS and OWL restrictions and generate correspond-ing SHACL shapes. The platform includes a SHACL

15https://www.w3.org/TR/ldp/16http://metaphacts.com/graphscope

execution engine that transforms SHACL shapes intoSPARQL queries, executes them against the knowl-edge graph and generates a data quality report.

Connector services aim at pre-processing and ex-posing knowledge graph data in a way that makes themeasily consumable by external applications. These ser-vices are primarily available for the needs of applica-tion developers utilizing the knowledge graph infor-mation in combination with other elements of the in-frastructure within an organization. Examples of suchservices are the Tableau connector service that ex-poses the data for analysis by a popular Tableau ap-plication and the Alexa skill service that generatesan Alexa17 skill description to enable voice interac-tion with knowledge graph. To ease the integrationof the platform into an organizational infrastructure,the Query-as-a-Service (QaaS) functionality is imple-mented. This allows exposing pre-saved parameterizedSPIN query templates as custom REST APIs that areeasy to call by internally developed tools within a cus-tomer organization. Ability to work with a REST APIis usually preferred by internal developer teams to theneed to compose and send SPARQL queries directly.

To reconcile the needs to support diverse usergroups and to keep the platform adaptable to variousdifferent use cases and ease integration with other toolsand applications in the organization, the service layerwas implemented based on open standards wheneverpossible. While SPARQL 1.1 is used as a commonquery language, other standards are used for realiz-ing more specific functionalities: for example, SHACLto specify and process the constraints over the data,LDP to enable management operations over RDF us-ing HTTP requests, REST as a communication inter-face for external applications.

5. User Interface

The user interface design of the metaphactory plat-form aims at satisfying the requirements outlined insection 1: enabling reusability, compatibility, and cus-tomization. For this reason, the core platform interfaceis implemented in a generic way, providing tools forcustom-building of all specific views for the use caseat hand. In addition to that, the platform envisages ex-ternal applications connecting to it, which gives possi-bility to use alternative user interfaces according to the

17https://developer.amazon.com/alexa

https://www.w3.org/TR/ldp/

http://metaphacts.com/graphscope

https://developer.amazon.com/alexa


user needs. One example of such an alternative UI isAmazon Alexa.

5.1. Customizable UI: templates and customcomponents

The web-based UI of the metaphactory platform isresource-centric: the platform renders an HTML viewfor a provided URI of a resource from the knowl-edge graph (Fig. 2). Resource views can be specifiedat different levels of abstraction: it is possible to de-fine a dedicated HTML view for one specific resourceas well as provide templates that will be applied to allresources of a specific kind. The criterion for choos-ing a template for a resource is configurable: by de-fault, the template is selected according to the rdf:typeproperty value, but this can be changed by providing aspecial template selection SPARQL query. This allowsadapting the platform setup depending on the high-level structure of the knowledge graph.

Resource views and templates are defined as HTMLpages. These can be created and edited directly in theplatform using the provided HTML editor. Besidesstandard HTML tags, a view can contain special UIcomponents: configurable custom client-side UI ele-ments which can be referenced in the template sourcecode as special HTML5 tags. A custom UI componentreceives its configuration parameters as tag attributes.A generic component that needs to interact with theknowledge graph (e.g., to visualize data) receives thedata selection SPARQL query as part of its configu-ration and interacts with the platform backend servicelayer APIs to obtain the required data.

Moreover, the web components provided by metaphac-tory are not only customizable individually, but arecomposable. Some component can define an environ-ment that can include other nested components ableto communicate via pre-defined interfaces. Such en-vironments can define common configuration param-eters that are applied to all included components aswell as define a layout for these components. An ex-ample of such environment is a structured search com-ponent that is able to combine different input controlsfor generating a search request (e.g., keyword searchor form-based search) with various options for visual-izing and exploring search results. With this approach,complex user interaction functionalities can be spec-ified in a fully declarative way by composing pre-existing atomic elements without the need to build newcomponents for each new use case application.

In this way, the user interface addresses the designrequirements supporting reusability and custom appli-cation building. View templates and configurable cus-tom components provided by the platform enable partsof the UI to be reused for different resources, datasets,and applications. Building a custom end-user applica-tion for a specific use case involves defining and pack-aging a set of required UI templates and configurationsand does not require modifications of the platform it-self. An application can be delivered as a simple plu-gin that can be installed automatically on top of thestandard platform release.

The platform provides a wide range of custom com-ponents for various user needs primarily focusing onthe functionalities most often required by end users:search, visualization, and authoring. The main proto-col of interaction with RDF stores is the SPARQLquery language: an expressive formalism, which how-ever is not convenient for non-expert users. For thisreason, search and visual exploration capabilities thatwould capture information needs in a user-friendlyway and generate information retrieval queries basedon them are crucial for an end-user tool working withknowledge graphs. This direction has been recognizedas important for a long time in the research community.Structure of semantic datasets was exploited to gener-ate facets for search refinement [10] and linked databrowsing [11], where facets are selected based on datadistribution within the dataset. The need to express in-formation needs using complex SPARQL queries ledto development of query builder tools providing vi-sual assistance to the user. For example, the ExCon-Quer [12] framework provides an interface for con-structing SPARQL queries where the user expressesher information need by selecting the concepts andproperties from the ontology and building an abstractquery structure which is afterwards translated intoa SPARQL query. OptiqueVQS [13] provides visualquery formulation allowing the user to select relevantconcepts and properties, express value restrictions, andvisualize the resulting query pattern as a graph struc-ture.

In its structured search functionality, the metaphac-tory platform further develops these directions, butputs emphasis on the capability to configure search andadapt it to different use cases with a minimal effort.In particular, search over the knowledge graph is real-ized as an environment, which allows specifying dif-ferent ways for generating the search query, refining it,and visualizing the search results. A search request canbe generated in several ways: e.g., by using a simple


Fig. 2. Example resource view for a protein in the Wikidata knowledge graph.

keyword search text field, by entering parameters ina pre-defined form, or by constructing the search cri-teria iteratively, selecting relevant properties and con-straints. All these multiple approaches for expressinguser information needs generate the search request inthe same form: as a SPARQL query. The query canbe further refined using facets that can be configureddeclaratively as part of the search environment. Fi-nally, the search results returned by the produced querycan be visualized by any appropriate UI component de-pending on the form of data and customer preferences:e.g., as a table or as a chart.

Built-in visualization components in the metaphac-tory platform provide the capabilities to show subsetsof a knowledge graph in an informative way, facilitateexploration of the graph, and summarize the informa-tion. Depending on the structure of the data and thetarget view, the developer can select alternative visu-alization strategies, e.g., a table that condenses graphdata into a more traditional relational format or a graphthat emphasizes inter-relations between entities. Forthe latter, the metaphactory platform is integrated witha powerful Ontodia [14] tool for building custom RDFgraph diagrams (Fig. 3). Ontodia allows not only vi-sualizing parts of an RDF graph, including concepts,entities, relations, and properties, but also enables the

user to further explore the dataset and extend the viewin an interactive way. To summarize aggregated dataand show the user the analysis results, various chartsare available. Common special types of data, such asgeospatial and temporal, can be visualized using ap-propriate views: e.g., map or timeline.

To support creation and editing of knowledge graphs,the platform provides end-user friendly, customizable,and extensible authoring UI based on the compos-ite component environment and knowledge graph pat-terns. In essence, a knowledge graph pattern is a struc-ture including a SPARQL query pattern with some ad-ditional metadata that is used for creation and vali-dation of the user input. This concept helps to hidethe complexity of the underlying data model from theend user, but at the same time allows expert users toprecisely capture information needs and adjust author-ing UI for these needs by using various componentsfor data input, ranging from simple text inputs to hi-erarchical suggestion components and nested forms(Fig. 4). The application of knowledge graph patternsis not limited only to data authoring. The same patterncan be re-used in structured search for query genera-tion and in visualization components for data retrieval.This re-usability helps to make sure consistency of theUI across the whole application.


Fig. 3. Graph visualization with Ontodia.

Fig. 4. Authoring forms using knowledge graph patterns.

5.2. Expressive keyword search querying withGraphScope

GraphScope is a data search engine for knowledgegraphs that allows the user to access RDF data in a sim-ple way by entering keyword queries. These keywordqueries are interpreted by GraphScope and the resultsmatching the information need are shown to the user.GraphScope tries to answer the user’s query by findingthe most suitable interpretation of each keyword withrespect to concepts, properties, and instances of theknowledge graph. To this end, GraphScope relies ona set of specialized indices that contain various meta-

information about the distribution of entities and asso-ciated keywords in the data repository. Based on theseindices, GraphScope identifies the most appropriatematches for the keywords in the user query and gener-ates a corresponding SPARQL query to retrieve the re-sults. After the system returned an initial result set forthe user’s keyword query, GraphScope provides possi-bilities to further refine the query and explore the initialresult set. The user can select an appropriate interpreta-tion for keywords to refine the meaning of the query aswell as further expand the results by exploring the rela-tions of the entities in the result set. These refinements,


which the user performs from the UI, are automaticallytransformed into modifications of the SPARQL query.

GraphScope, originally developed as a standalonetool, was integrated with the metaphactory platform toserve as an additional query generation approach in thegeneric architecture. The integration is realized bothat the backend and frontend levels. GraphScope andmetaphactory reuse the same connection to the back-end triple store to access data via SPARQL. At thefrontend level, the GraphScope user interface for defin-ing a keyword search query and refining the search re-sults is wrapped as a metaphactory UI HTML5 com-ponent that can be added on a template page.

5.3. Voice interface using Amazon Alexa

Separating the service layer backend and the client-side UI makes it easy to integrate the platform withexternal tools at the UI level: an alternative UI canbe deployed on top of the platform, while it can in-teract with the knowledge graph by invoking rele-vant platform services. An example of such alternativeuser interface is voice interaction using Alexa that canbe used alongside the more traditional paradigms de-scribed above [15]. The Amazon Alexa framework al-lows the developer to define a specific service (calledSkill) to process user requests and generate verbalizedanswers. The Alexa framework includes two parts: anAlexa skill which provides an abstraction over com-plex voice processing and generation services and anAmazon Lambda service implementing the applica-tion logic. An Alexa skill processes the voice mes-sages from the user and transforms them into servicerequests with provided parameters. A skill can definea number of request types called intents, which canin turn be mapped to several utterances (natural lan-guage phrases). An utterance can be parameterizedusing slots (request parameters): for example, whenAlexa receives a user’s question “Alexa, ask Wikidatawhat is the capital of Poland”, it extracts the skill name(Wikidata), identifies the intent corresponding to thequestion “what is the capital of. . .?” and the corre-sponding Lambda service. Then, the Lambda serviceis invoked with the extracted parameters: the ID of theintent (e.g., “capitalcity”) and the request parameter(“Poland”). The Lambda service is then responsible forfinding the right answer and returning it in a verbalizedform. The Alexa service is then merely pronounces thereturned verbalized answer.

Our Lambda function finds an answer to the queryin the knowledge graph by mapping an intent to a

SPARQL query pattern. For example, to answer ourexample question over the Wikidata knowledge graph,we need to find the value of the property P36 “capital”for the entity Q36 “Poland”.

To handle such a direct factual question, our Lambdafunction uses a SPARQL query pattern of the follow-ing form (here using the Blazegraph full-text search):SELECT ?answer WHERE {

?uri rdfs:label ?label.?label bssearch:search "${entity}".?label bssearch:minRelevance "0.5".?label bssearch:matchAllTerms "true".?uri ${property} ?answer .

} LIMIT 1

We use an automated procedure to bootstrap theAlexa skill definition and generate descriptions of in-tents as well as example entities and verbalization tem-plates.

Our Alexa skill is available in the Amazon Skillstore under the name “metaphacts” in German and En-glish18.

6. Experiences and Lessons Learnt

The metaphactory platform is used in production ina variety of use cases involving knowledge graph man-agement in different application domains. In the fol-lowing we consider four diverse practical use casesfrom the open knowledge graphs, cultural heritage, en-ergy industry, and life sciences domains highlightingdifferent aspects of knowledge graph management.

6.1. Open Knowledge Graphs (Wikidata)

Wikidata19 is a free and open knowledge graphcontaining general-purpose data across domains. Itis used as a reference data storage for other Wiki-media projects (in particular, Wikipedia). Due to itscomprehensive nature and popularity, there exists alarge volume of mappings between Wikidata entitiesand instances of other repositories, including domain-specific ones (e.g., UniProt20 for proteins, GeoN-ames21 for geographical locations, etc.).

To exploit these data, we have set up a publicshowcase system for the community22, easing access

18https://www.amazon.com/metaphacts/dp/B0745KLCFX/ forUS English.

19https://www.wikidata.org20https://www.uniprot.org/21http://www.geonames.org22https://wikidata.metaphacts.com/

https://www.amazon.com/metaphacts/dp/B0745KLCFX/

https://www.wikidata.org

https://www.uniprot.org/

http://www.geonames.org

https://wikidata.metaphacts.com/


to the information provided by the Wikidata queryservice. The system provides different search inter-faces as entry points into Wikidata’s knowledge baseand visualizes search results based on a comprehen-sive HTML5 based templating approach. Internally,the public metaphactory Wikidata system is used bothas an integration hub to facilitate integration of datafrom multiple sources for the needs of industrial usecases as well as a demo system to highlight variousplatform features. Among others, the public Wikidatametaphactory system includes such functionalities as

– Structured search over Wikidata (configured forgeneral-purpose person-organisation data as wellas for the life sciences domain)

– Integration with the Wikidata free-text search ser-vice

– Integration with life science-specific repositoriesto show additional views

– Semantic similarity search based on a word2vecvector space embedding model [16]

– Geospatial search using a combination of GeoSPARQL23

queries, external services like OpenStreetMap24,and map-based visualizations.

Using the links to external repositories, the Wikidatametaphactory system is used to integrate external datain other use cases: e.g., to bring in geospatial data inthe cultural heritage use case or additional protein tis-sue data from neXtProt25 in the life science scenario.

6.2. Cultural Heritage

Knowledge graph technologies have become promi-nent in the context of the cultural heritage domain,where the CIDOC-CRM ontology26 became a popu-lar standard for exposing cultural heritage informationas linked data. The metaphactory platform is utilizedin the context of the ResearchSpace project27 to man-age the British Museum knowledge graph and help theresearchers (a) explore meta-data about museum arti-facts: historical context, associations with geograph-ical locations, creators, discoverers and past owners,etc, and (b) use these meta-data in collaborative workby creating annotations, narratives involving semanticreferences, and argumentations exploiting knowledgegraph data as evidence [17].

23http://www.opengeospatial.org/standards/geosparql24https://www.openstreetmap.org25https://www.nextprot.org/26http://www.cidoc-crm.org/27http://www.researchspace.org/

A crucial piece of functionality is the structuredsearch, where the user can define complex multi-criteria information needs iteratively, by selecting ap-propriate clauses in the user interface (Fig. 5). A queryrequest, which can look like “give me all bronze arti-facts created in Egypt between 2500BC and 2000BC”,then gets translated into SPARQL and answered us-ing the knowledge graph data. Given the complexityand very high expressivity of the CIDOC-CRM on-tology, using it directly at the level of user interfacewould make the system complicated for the end-user.For this reason, we defined a set of special fundamen-tal concepts (FCs) and relations (FRs) [18], which ab-stract over physical classes and properties of the ontol-ogy, while being intuitively understandable to the user(e.g., Thing, Actor, Event and relations between them).With the structured search interface, the user can spec-ify the criteria for data selection, interactively refinethe initial selection results, explore the returned resultset with faceted search and different results visualiza-tion views, and finally save the defined search for fu-ture reuse.

Knowledge graph data are used to support researchcollaborations by enabling the argumentation processand making assertions about graph entities followingthe direction outlined in [19]. User assertions have ex-plicitly stored provenance information and can com-pose complex exchanges of arguments allowing totrace the whole reasoning process back to its originsand base evidence. Finally, the data selected from theknowledge graph can be used as references in semanticnarratives that combine user-authored text, referencesto knowledge graph entities and different data visual-izations supported by the platform: from images asso-ciated to entities to charts summarizing selected datasubsets.

A demo installation of the ResearchSpace platformover the British Museum knowledge graph collectionis available on the Web28.

While the British Museum data collection was thefirst target use case in the cultural heritage domain,the resulting ResearchSpace platform extension wasre-applied for other use cases in this area. Sphaera Cor-pusTracer [20], a practical application developed incollaboration with the Max-Planck Institute manages aknowledge graph addressing science history informa-tion: e.g., tracing the survived printed publications of

28https://demo.researchspace.org

http://www.opengeospatial.org/standards/geosparql

https://www.openstreetmap.org

https://www.nextprot.org/

http://www.cidoc-crm.org/

http://www.researchspace.org/

https://demo.researchspace.org


Fig. 5. Structured semantic search over British Museum artifacts.

medieval astronomy textbooks in the early modern pe-riod29.

6.3. Engineering & Manufacturing Industry

Large-scale organizations involved in the engineer-ing and manufacturing industry use knowledge graphrepresentation to model master data: e.g., informationabout products and their typology, separate assemblyparts, projects and their topics, etc. An inherent traitof these use cases is the condition that the knowledgegraph constitutes only a part of a complex infrastruc-ture involving heterogeneous data sources and large-scale networks of atomic devices able to communicatedata. This raises a number of challenges for data man-agement and utilization: in partucular, the need to in-tegrate static knowledge with runtime data and applypotentially complex analysis procedures.

These challenges are present in full in the usecase scenarios in which metaphactory is applied atSiemens [21]. For example, a typical Siemens gas tur-bine has about 2,000 sensors and a diagnostic task todetect whether a single particular purging operation

29http://db.sphaera.mpiwg-berlin.mpg.de

is over requires applying multiple hundreds of signalprocessing rules. To be able to process relevant infor-mation in a timely way and achieve intelligent diag-nostics, knowledge graph data must be integrated withruntime information provided by relevant APIs on de-mand. In this context, knowledge graph informationacts as the integration backbone for the heterogeneousdata infrastructure. To provide the integrated view ofthis hybrid infrastructure, the integration capabilitiesof the metaphactory platform play a crucial role.

The Ephedra federation architecture enables com-bining the “native” knowledge data with the data pro-duced by the API services available in the Siemensinfrastructure, such as MindSphere30, as well as withanalysis results produced by machine learning algo-rithms in a transparent way, at the level of SPARQLqueries.

For example, one of the use cases involves usingthe knowledge graph to integrate and represent knowl-edge about a building in order to create a full virtualcopy (Digital Twin) of the real building, including thestatic information describing the building as well asdynamic sensor data reflecting its current state. In an-

30https://siemens.mindsphere.io/

http://db.sphaera.mpiwg-berlin.mpg.de

https://siemens.mindsphere.io/


other case, a trained machine learning model is ex-posed as a REST API that takes as input a list of fea-tures and returns as response the classification resultsof a text categorization problem. The use of Ephedraallows wrapping these diverse API as SPARQL feder-ation members and processing federated queries thatjoin the raw data with the predictions generated by ma-chine learning. In this way, the hybrid data integra-tion capabilities enabled by the platform enable severalpractical end-user scenarios within the Siemens infras-tructure using a common approach to build several dis-tinct knowledge graph applications targeting differentuser groups.

6.4. Life Sciences

The life sciences and pharmacy industry has beenamong the early adopters of the semantic web/knowledgegraph technologies. Currently there exists a wide rangeof reusable life-science reference datasets already pub-lished in RDF and available in public linked datarepositories (e.g., neXtProt [22], ChEMBL31 [23] andothers). In order to be exploited in an industrial en-vironment, these datasets must be integrated with theinternal data sources collected within the organization.

Such a scenario involved utilizing the metaphac-tory platform within the Sanofi S.A.32 pharmaceuti-cal company. The project focused on building a tar-get dashboard to support internal decision making pro-cesses using integrated data from over 20 different datarepositories including both company-internal and pub-lic data. The metaphactory platform served as a one-stop portal for consolidated access to target-related in-formation such as function, expression, genetics, inter-actions/pathways, etc.

To achieve transparent querying of integrated data,relevant data sources were brought together using acombination of ETL processes and virtual integra-tion. Data contained in relational databases were trans-formed using R2RML mappings. A high-level ontol-ogy was designed to serve as a global schema to ab-stract from the set of local ontologies of each datasource. To support the end-user in finding and inter-preting the data, the target dashboard user interfacewas built using the metaphactory UI templating func-tionality and the generic platform services. The dash-board includes complementary search mechanisms,which can be used depending on the user task at hand:

31https://www.ebi.ac.uk/chembl/32http://www.sanofi.com

structured search that realizes ontology-based graph-ical query building and keyword search. Search re-sults are visualized using template pages according tothe relevant types of entities. In addition, the data ex-posed via the platform API services is further exploitedby third-party tools like KNIME to perform advanceddata analysis.

Given the diversity of data to be integrated and thelikelihood of error, ensuring the quality control be-comes a matter of critical importance. To this end, theplatform setup included the use of SHACL to verifythe quality of the data. Using a combination of SHACLconstraints automatically generated from the ontologyas well as manually curated domain rules help to en-sure that the added value of data integration is not com-promised by reduced data quality.

6.5. Lessons Learnt

The metaphactory platform has been deployed invarious knowledge graph management use cases tar-geted at different industries, organization types, anduser groups. This experience provided us with valu-able hints regarding the management and exploitationof knowledge graphs within enterprise organizations aswell as the challenges which have to be addressed bya general-purpose knowledge graph management plat-form. The use cases demonstrate the importance fora knowledge graph application to adapt to the needsof existing data infrastructure aiming to complementit with missing functionalities rather than replace it.The knowledge graph platform must support rapid de-velopment and deployment of a use case applicationthat would demonstrate the added value of knowledgegraph technologies. Two examples of such function-alities are transparent data integration and semanticsearch.

Data integration needs of real-world customers of-ten go beyond RDF data and relational/tabular datasources. While static data can be integrated by meansof a dedicated ETL process and physical materializa-tion of data as RDF, using dynamic data in combina-tion with the knowledge graph requires extension ofcommon data access mechanisms. The metaphactoryapproach to the problem includes support of hybrid in-formation needs by means of extended SPARQL 1.1federation capabilities. In this case, dynamic data ex-posed by means of a service can be accessed by requestand combined with the knowledge graph informationat the level of query results.

https://www.ebi.ac.uk/chembl/

http://www.sanofi.com


The use of open standards is crucially important inrealizing interactions with third-party tools at all lev-els. For example, using SPARQL 1.1 as a generic inter-face for communicating with data repositories enablesthe platform to interact with any triple store selected bythe customer without imposing any restrictions on theinfrastructure. For the same reason, it is also beneficialif a knowledge graph management platform can pro-vide its output using interfaces preferred by the organi-zation’s developer teams: for instance, if some data canbe retrieved by sending a SPARQL query to the end-point, it is still often preferable if the same data can beprovided via a REST API call with a set of key-valueparameters.

One generic observation regarding the requirementstowards end-user supporting tools in the industrial set-ting concerns the way reusability is achieved. In aca-demic research the focus is often on providing a maxi-mally generic approach that is able to adapt to the spe-cific task at hand by using intelligent heuristics: for ex-ample, by generating automatically the most suitableset of facets for semantic search results explorationor by automatic selection of relevant data sources toachieve transparent query federation. In the industrialsetting, the possibility to adjust the configuration ofsome module manually in a very fine-grained way isusually more important, while automating heuristicscan play only an auxiliary role. This is mainly dueto the need to deal with “long tail“ edge cases in theknowledge graph structure, for which a generic tech-nique would fail: improving the coverage of a fullygeneric approach for such cases is usually more expen-sive than manual configuration. This often creates anobstacle for applying open source tools developed inthe research community to industry use cases.

7. Conclusions and Outlook

The metaphactory platform has been developedwith the aim to support interaction with knowledgegraphs and utilize the added value of knowledge graphdata within organizational environments. The trial ver-sion of the platform can be accessed for free after reg-istration33.

Given the experience gathered in multiple use cases,we consider several directions particularly importantto improve the current version of the platform.

33http://www.metaphacts.com/trial, at the time of writing the ver-sion 2.3.2 is available.

– Intelligent data authoring. Whenever knowledgegraph data has to be inserted manually by the userrather than imported from pre-existing sources, itoften constitutes a cumbersome and time-costlyprocess. It is important to assist the user in knowl-edge graph population tasks whenever possibleproviding intelligent auto-suggestions as well asvalidating and refining user input. To this end,the use of novel machine learning techniquesfor knowledge graph population, particularly,embedding-based methods appears particularlypromising.

– Integration with machine learning technologies.Knowledge graphs provide added value servingas input for machine learning techniques produc-ing new information, as well as can themselvesbe extended using machine learning. metaphac-tory platform already includes interfaces that canbe utilized for interaction with machine learningtools. Providing configurable general-purpose ex-tensions enabling application of machine learn-ing tools with minimal effort would further bene-fit data scientists.

– Advanced data integration tasks. This involves,in particular, platform-level support for manag-ing mappings for integration of non-RDF struc-tured (e.g., JSON, XML) and unstructured (e.g.,text) sources as well as common integration pro-cedures, such as instance matching.

– Development of “blueprints” for common usecases. Given commonalities between differentuse cases, it is desirable to develop common“blueprint” templates for applications. Such ap-plication templates will greatly reduce costs forbuilding applications targeted at new use cases.

References

[1] A. Nikolov, P. Haase, J. Trame and A. Kozlov, Ephedra: Ef-ficiently Combining RDF Data and Services Using SPARQLFederation, in: 8th International Conference on KnowledgeEngineering and Semantic Web (KESW 2017), Szczecin,Poland, 2017, pp. 246–262.

[2] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov,D. Lanti, M. Rezk, M. Rodriguez-Muro and G. Xiao, Ontop:Answering SPARQL queries over relational databases, Seman-tic Web 8(3) (2017), 471–487.

[3] B. Quilitz and U. Leser, Querying Distributed RDF DataSources with SPARQL, in: 7th International Semantic WebConference (ISWC 2008), Vol. 5021, Springer, Karlsruhe, Ger-many, 2008, pp. 524–538.

http://www.metaphacts.com/trial


[4] O. Görlitz and S. Staab, SPLENDID: SPARQL Endpoint Fed-eration Exploiting VOID Descriptions, in: COLD2011 Work-shop, 10th International Semantic Web Conference (ISWC2011), Bonn, Germany, 2011.

[5] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruck-haus, ANAPSID: An Adaptive Query Processing Engine forSPARQL Endpoints, in: 10th International Semantic Web Con-ference (ISWC 2011), Bonn, Germany.

[6] M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS:Hypergraph-Based Source Selection for SPARQL EndpointFederation, in: 11th Extended Semantic Web Conference(ESWC 2014), Heraklion, Greece, 2014.

[7] A. Schwarte, P. Haase, K. Hose, R. Schenkel and M. Schmidt,FedX: Optimization Techniques for Federated Query Process-ing on Linked Data, in: 10th International Semantic Web Con-ference (ISWC 2011), Bonn, Germany, pp. 601–616.

[8] B. Stringer, A. Meroño-Peñuela, S. Abeln, F. van Harmelenand J. Heringa, SCRY: Extending SPARQL with Custom DataProcessing Methods for the Life Sciences, in: 9th InternationalConference on Semantic Web Applications and Tools for LifeSciences (SWAT4LS), Amsterdam, The Netherlands, 2016.

[9] J. Dolby, A. Fokoue, M. Rodriguez-Muro, K. Srinivas andW. Sun, Extending SPARQL for Data Analytic Tasks, in: 15thInternational Semantic Web Conference (ISWC 2016), Kobe,Japan, 2016, pp. 437–452.

[10] M. Tvarozek and M. Bieliková, Generating Exploratory SearchInterfaces for the Semantic Web, in: Human-Computer Inter-action - Second IFIP TC 13 Symposium, (HCIS 2010), Bris-bane, Australia, 2010, pp. 175–186.

[11] T. Franz, J. Koch, R.Q. Dividino and S. Staab, LENA-TR :Browsing Linked Open Data Along Knowledge-Aspects, in:Linked Data Meets Artificial Intelligence, 2010 AAAI SpringSymposium, Stanford, CA, USA, 2010.

[12] J. Attard, F. Orlandi and S. Auer, ExConQuer: Lowering barri-ers to RDF and Linked Data re-use, Semantic Web 9(2) (2018),241–255.

[13] A. Soylu, M. Giese, E. Jiménez-Ruiz, G. Vega-Gorgojo andI. Horrocks, Experiencing OptiqueVQS: a multi-paradigm and

ontology-based visual query system for end users, UniversalAccess in the Information Society 15(1) (2016), 129–152.

[14] D. Mouromtsev, D. Pavlov, Y. Emelyanov, A. Morozov,D. Razdyakonov and M. Galkin, The Simple Web-based Toolfor Visualization and Sharing of Semantic Data and Ontolo-gies, in: 14th International Semantic Web Conference (ISWC2015), Bethlehem, PA, USA, 2015.

[15] P. Haase, A. Nikolov, J. Trame, A. Kozlov and D.M. Herzig,Alexa, Ask Wikidata! Voice Interaction with KnowledgeGraphs using Amazon Alexa, in: 16th International SemanticWeb Conference (ISWC 2017), Vienna, Austria, 2017.

[16] A. Nikolov, P. Haase, D.M. Herzig, J. Trame and A. Kozlov,Combining RDF Graph Data and Embedding Models for anAugmented Knowledge Graph, in: BigNet@WWW’18 Work-shop, The Web Conference 2018, (TheWebConf 2018), Lyon,France, 2018, pp. 977–980.

[17] D. Oldman and D. Tanase, Reshaping the Knowledge Graph byconnecting researchers, data and practices in ResearchSpace,in: 17th International Semantic Web Conference (ISWC 2018),Monterey, CA, USA, 2018.

[18] K. Tzompanaki and M. Doerr, Fundamental Categories andRelationships for querying CIDOC-CRM based repositories,Technical Report, TR-429, ICS-FORTH, 2012.

[19] M. Doerr, A. Kritsotaki and K. Boutsika, Factual argumenta-tion - a core model for assertions making, JOCCH 3(3) (2011),8–1834.

[20] F. Kräutli and M. Valleriani, CorpusTracer: A CIDOC databasefor tracing knowledge networks, DSH 33(2) (2018), 336–346.

[21] T. Hubauer, P. Haase and D.M. Herzig, Use Cases of the In-dustrial Knowledge Graph at Siemens, in: 17th InternationalSemantic Web Conference (ISWC 2018), Monterey, CA, USA,2018.

[22] L. Lane et al., neXtProt: a knowledge platform for human pro-teins, Nucleic Acids Research 40(Database–Issue) (2012), 76–83.

[23] E.L. Willighagen et al., The ChEMBL database as linked opendata, J. Cheminformatics 5 (2013), 23.

Semantic Web 0 (0) 1 IOS Press metaphactory: A Platform ... · 2 P. Haase et al. / metaphactory: A...

Documents

Transcript of Semantic Web 0 (0) 1 IOS Press metaphactory: A Platform ... · 2 P. Haase et al. / metaphactory: A...