ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a...
-
Upload
antidot -
Category
Technology
-
view
884 -
download
0
description
Transcript of ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a...
Copyright Antidot™ 1
Linked Enterprise Data
LEVERAGING THE SEMANTIC WEB STACKIN A CORPORATE ENVIRONMENT
ISWC 2012 – BOSTONFABRICE LACROIX – [email protected]
Copyright Antidot™ 2
Antidot – who we are
French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data
management
Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.
Copyright Antidot™ 3
ClientsPublishing
Healthcare
Enterprises E-commerce
Copyright Antidot™ 4
Unstructured documents
files, ECM, collaborative spacesintranet, extranet, Web sitese-mails, instant messaging
Copyright Antidot™ 5
Structured data
CRM, ERP, directoryknowledge basesbusiness applications (production, support)
Copyright Antidot™ 6
IS are bloated
1 practice => 1 need => 1 application => 1 siloInformation system is driven by the processData are numerous, various and scattered
Copyright Antidot™ 7
Solutions or workarounds?
BI MDM
SOA Search
Copyright Antidot™ 8
Solutions and workarounds
Enterprise Search brings little value to users Document oriented Does not solve real business problems
Google like Verity like
Copyright Antidot™ 9
What we want
Copyright Antidot™ 10
What we want
LDAP
CRM
Production
ERP
ECM
FilesSupport
Copyright Antidot™ 11
Changing the paradigm
Switching from an application view to a data centric way of thinking.
Copyright Antidot™ 12
Bring out the implicit
Build the Giant Enterprise Graph
Copyright Antidot™ 13
LED
Linked Enterprise Data application of the Semantic Web
technologies and Linked Data principles to the enterprise infrastructure
Copyright Antidot™ 14
What works for the Web…
Federating silos on the Web
http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)
Copyright Antidot™ 15
…can’t always be used
in corporate IS Legacy apps can’t be "Sparql’ed" 80% un- or semi- structured data don’t fit in the
model as such Defining vocabularies/ontologies for silos is too
complex and expensive Don’t want RDF per se but valuable information External data is available in XML/JSON through
Web Services Staff trained for RDB, XML, Web apps. No Risk and stability strategy: SemWeb
technology considered as new and immature
Copyright Antidot™ 16
The RDF/storage approach
Setting up a global RDF repository does not work either ITs are afraid by the "RDF everywhere"
activists
Copyright Antidot™ 17
Semantic Web technology
still is the right solution
in corporate environmentBUT it is not an aim
JUST use it
as a means
Copyright Antidot™ 18
Just do it
Think of it as a stream paradigm build new objects using existing data without interfering with the existing
infrastructure with SemWeb somewhere under the hood
Copyright Antidot™ 19
Enterprise Graph HowTo
Construct the graph generate triples from data create triples from documents
Leverage the graph enrich infer
Browse the graph select resources build objects
Trash the graph
Copyright Antidot™ 20
How: extract & normalize
Harvest and normalize as in an ETL fetch, clean, transform… normalize records (names, IDs) to
prepare the linking step
For databases db2triples : an RDB2RDF
implementation by Antidot (open source, W3C validated)
Copyright Antidot™ 21
How: semantize
Don’t transform everything in RDF cherry-pick a subset of interesting fields
for each object and create their RDF triples counterpart
interesting == needed for linking or inferring
Semantize
Copyright Antidot™ 22
How: semantize
Triples generation Be smart: avoid upfront ontology design,
use small vocabularies Be pragmatic: transform XML tags and
field names to predicates Be agile: only insert what you need. And
when you need more, add more.
Semantic Web fuels the modeling, linking and information building process
Copyright Antidot™ 23
Enterprise Graph HowTo
Construct the graph generate triples from data create triples from documents
Leverage the graph enrich infer
Browse the graph select resources build objects
Trash the graph
Copyright Antidot™ 24
How: semantize
Unstructured documents Extract metadata and transform them as
needed to RDF.➡ Ex: author => dc:creator
Use of text-mining to extract named entities: people, organizations, products…➡ generate those entities list using the data
sources: directory for employees, CRM for companies and people, ERP for products
➡ create triples like doc_URI quotes entity_URI
Copyright Antidot™ 25
How: semantize
Unstructured documents Compare documents using various and
dedicated algorithms➡ is the same➡ is included➡ is similar➡ is related
Generates new triples➡ create triples like<docA> is_sub_version_of <docB>
Copyright Antidot™ 26
Enterprise Graph HowTo
Construct the graph generate triples from data create triples from documents
Leverage the graph enrich infer
Browse the graph select resources build objects
Trash the graph
Copyright Antidot™ 27
How: enrich
Enrich the graph run specific algorithms to generate more
links and triples (classifiers, topic detection, …)
insert external data gathered from the LOD or other external datasets or APIs
Copyright Antidot™ 28
How: infer
Create new knowledge add rules according to your needs
IF a coworker is quoted in documents
THEN the business unit is bound to the documents
AND this coworker belongs to a business unit
Copyright Antidot™ 29
Enterprise Graph HowTo
Construct the graph generate triples from data create triples from documents
Leverage the graph enrich infer
Browse the graph select resources build objects
Trash the graph
Copyright Antidot™ 30
How: build
Build select resources corresponding to
objects seeds (using Sparql queries) for each seed, follow links smartly in
order to create basic objects
Build
Copyright Antidot™ 31
How: build
Finalize decorate the new knowledge objects
with data set apart (not loaded in the triplestore)
now we have rich user-actionable objects
Build Finalize
Copyright Antidot™ 32
Enterprise Graph HowTo
Construct the graph generate triples from data create triples from documents
Leverage the graph enrich infer
Browse the graph select resources build objects
Trash the graph
Copyright Antidot™ 33
How: expose
Make the new information available to users and to the entire IS
EnrichHarvest
Classify
Semantize
NormalizeAnnotate
Indexation AFS search engine
RDF Triplestore (Linked Data)
Relational DB
Copyright Antidot™ 34
Conclusion
It works! The triples we create and the inference
rules we add are dictated by the goal / application➡ usage and value oriented
We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL➡ we are agile
What matters is the graph. But the graph is not the triplestore➡ storage independent
Copyright Antidot™ 35
There’s an app for that
Antidot Information Factory a software solution designed specifically
to leverage structured and unstructured data
enable large-scale processing of existing data
automate publishing of enriched or newly created information.
Harvest Normalize Semantize Enrich Build Expose
Copyright Antidot™ 36
The Giant Enterprise Graph
Now we have a path to let SemWeb enter the enterprise
Copyright Antidot™ 37
THANKS FOR YOUR ATTENTIONQUESTIONS?
DiscussUnderstandLearnExchange