ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a...

Copyright Antidot™ 1

Linked Enterprise Data

LEVERAGING THE SEMANTIC WEB STACKIN A CORPORATE ENVIRONMENT

ISWC 2012 – BOSTONFABRICE LACROIX – [email protected]


Antidot – who we are

French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data

management

Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.


ClientsPublishing

Healthcare

Enterprises E-commerce

http://www.batiactu.com/

http://www.canalplus.fr/

http://www.lepoint.fr/

http://www.lemoniteur.fr/

http://www.nrj.fr/

http://www.quechoisir.org/

http://www.tf1.fr/

http://www.recherchesante.fr/

http://www.afpa.fr/

http://www.ca-centrest.fr/

http://www.apce.com/

http://www.annoncesjaunes.fr/

http://www.rhonealpes.fr/

http://www.ameli.fr/

http://www.inserm.fr/

http://www.securite-sociale.fr/

http://www.nissan.fr/

http://www.cultura.com/

http://www.damart.fr/

http://fr.decathlon.com/

http://www.king-jouet.com/

http://www.welcomeoffice.com/

http://www.bhv.fr/

http://www.discounteo.com/

http://www.feuvert.fr/

http://www.camaieu.fr/

http://www.milongamusic.com/


Unstructured documents

files, ECM, collaborative spacesintranet, extranet, Web sitese-mails, instant messaging


Structured data

CRM, ERP, directoryknowledge basesbusiness applications (production, support)


IS are bloated

1 practice => 1 need => 1 application => 1 siloInformation system is driven by the processData are numerous, various and scattered


Solutions or workarounds?

BI MDM

SOA Search


Solutions and workarounds

Enterprise Search brings little value to users Document oriented Does not solve real business problems

Google like Verity like


What we want


What we want

LDAP

CRM

Production

ERP

ECM

FilesSupport


Changing the paradigm

Switching from an application view to a data centric way of thinking.


Bring out the implicit

Build the Giant Enterprise Graph


LED

Linked Enterprise Data application of the Semantic Web

technologies and Linked Data principles to the enterprise infrastructure


What works for the Web…

Federating silos on the Web

http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)


…can’t always be used

in corporate IS Legacy apps can’t be "Sparql’ed" 80% un- or semi- structured data don’t fit in the

model as such Defining vocabularies/ontologies for silos is too

complex and expensive Don’t want RDF per se but valuable information External data is available in XML/JSON through

Web Services Staff trained for RDB, XML, Web apps. No Risk and stability strategy: SemWeb

technology considered as new and immature


The RDF/storage approach

Setting up a global RDF repository does not work either ITs are afraid by the "RDF everywhere"

activists


Semantic Web technology

still is the right solution

in corporate environmentBUT it is not an aim

JUST use it

as a means


Just do it

Think of it as a stream paradigm build new objects using existing data without interfering with the existing

infrastructure with SemWeb somewhere under the hood


Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph


How: extract & normalize

Harvest and normalize as in an ETL fetch, clean, transform… normalize records (names, IDs) to

prepare the linking step

For databases db2triples : an RDB2RDF

implementation by Antidot (open source, W3C validated)


How: semantize

Don’t transform everything in RDF cherry-pick a subset of interesting fields

for each object and create their RDF triples counterpart

interesting == needed for linking or inferring

Semantize


How: semantize

Triples generation Be smart: avoid upfront ontology design,

use small vocabularies Be pragmatic: transform XML tags and

field names to predicates Be agile: only insert what you need. And

when you need more, add more.

Semantic Web fuels the modeling, linking and information building process






Trash the graph


How: semantize

Unstructured documents Extract metadata and transform them as

needed to RDF.➡ Ex: author => dc:creator

Use of text-mining to extract named entities: people, organizations, products…➡ generate those entities list using the data

sources: directory for employees, CRM for companies and people, ERP for products

➡ create triples like doc_URI quotes entity_URI


How: semantize

Unstructured documents Compare documents using various and

dedicated algorithms➡ is the same➡ is included➡ is similar➡ is related

Generates new triples➡ create triples like<docA> is_sub_version_of <docB>






Trash the graph


How: enrich

Enrich the graph run specific algorithms to generate more

links and triples (classifiers, topic detection, …)

insert external data gathered from the LOD or other external datasets or APIs


How: infer

Create new knowledge add rules according to your needs

IF a coworker is quoted in documents

THEN the business unit is bound to the documents

AND this coworker belongs to a business unit






Trash the graph


How: build

Build select resources corresponding to

objects seeds (using Sparql queries) for each seed, follow links smartly in

order to create basic objects

Build


How: build

Finalize decorate the new knowledge objects

with data set apart (not loaded in the triplestore)

now we have rich user-actionable objects

Build Finalize






Trash the graph


How: expose

Make the new information available to users and to the entire IS

EnrichHarvest

Classify

Semantize

NormalizeAnnotate

Indexation AFS search engine

RDF Triplestore (Linked Data)

Relational DB


Conclusion

It works! The triples we create and the inference

rules we add are dictated by the goal / application➡ usage and value oriented

We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL➡ we are agile

What matters is the graph. But the graph is not the triplestore➡ storage independent


There’s an app for that

Antidot Information Factory a software solution designed specifically

to leverage structured and unstructured data

enable large-scale processing of existing data

automate publishing of enriched or newly created information.

Harvest Normalize Semantize Enrich Build Expose


The Giant Enterprise Graph

Now we have a path to let SemWeb enter the enterprise


THANKS FOR YOUR ATTENTIONQUESTIONS?

DiscussUnderstandLearnExchange

[email protected]

ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a...

Technology

Transcript of ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a...