Andreas BlumauerCEO & Managing Partner
Semantic Web Company / PoolParty Semantic Suite
Taxonomy Boot Camp 2017Washington, DC
Leveraging Taxonomy Management With Machine Learning
INTRODUCTION
2Semantic Web
Company
founder & CEO of
Andreas Blumauer
developer and vendor of
2004founded
6.0
current Version
active at
based on
Vienna
located
part of EnterpriseKnowledge Graphs
manages
standard for
part of
enriches
>200serves customers
editor of
Taxonomies
is about
Ontologies
standard for
graduates
Text Mining
used for
Agenda
▸ Cognitive Computing: Semantic Technologies & Machine Learning
▸ Terms, Concepts, Shadow Concepts▸ Corpus Analysis & (Shadow) Concept Extraction
with PoolParty▸ A comparison with LSA and Word2Vec▸ Use Cases
▹ Document Annotation & Indexing▹ Text Classification (incl. Benchmarks)▹ Recommender Systems (incl. Use Case)
3
Cognitive Computing
Combining Semantic Technologies With Machine Learning
4
A key assumptionof this talk
People do not search for documents only, they seek facts about things and smaller chunks of information.
Machines shall help to create links across data silos to give answers to questions.
5
Converging A.I. Technologies
A quick question at the beginning
Will Artificial Intelligence make Subject Matter Experts obsolete?
6 Imagine you want to build an application that helps to identify patients and treatments pairings.
Which will you prefer?
Applications solely based on machine learning, those ones which are based on doctors' knowledge only, or a combination of both?
How Semantic Computing and Machine Learning complement each other
7Structured Data
Machine Learning
Cognitive Applications
How Semantic Computing and Machine Learning complement each other
8Unstructured Data
Structured Data
Machine Learning
Cognitive Applications
How Semantic Computing and Machine Learning complement each other
9Unstructured Data
Structured Data
Knowledge Graphs
Machine Learning
Cognitive Applications
Towards a Digital Twin
Proposal for a Cognitive Computing Platform Architecture
10Unstructured Data
Structured Data
Knowledge Graphs
Machine Learning
Semantic Layer
IoT & Cognitive Applications
Terms, Concepts, Shadow ConceptsHow to make sense of text and data
11
Terms and co-occurence models
12DocumentCorpus
- Websites- PDF, Word, …- Abstracts from
DBpedia- RSS Feeds
Term 8
Term 3
Term 7
Term 8
Term 6
Term 9
Term 5
Term 10
- Relevant terms and phrases- Relevancy of terms- co-occurence between terms and terms
Term 1
Term 4
Term 2
‘Things’ but not Strings: Using a ‘Semantic Knowledge Graph’
http://www.my.com/taxonomy/62346723
prefLabel
Retina
image
http://www.my.com/images/90546089
http://www.my.com/taxonomy/97345854
prefLabel
Funduscope
altLabelOphthalmoscope
http://www.mycom.com/taxonomy/4543567
prefLabel
Diagnostic Equipment
has broader
Shadow Concepts
Use co-occurences between concepts and terms to extract ‘shadow concepts’
14 This site is a 15th-century Inca site located 2,430 metres above sea level. It is located in Cusco, Peru.
It is situated on a mountain ridge above the Sacred Valley through which the Urubamba River flows. Most archaeologists believe that it was built as an estate for the Inca emperor Pachacuti. Often mistakenly referred to as the "Lost City of the Incas", it is the most familiar icon of Inca civilization. The Incas built the estate around 1450, but abandoned it a century later at the time of the Spanish Conquest.
Inca site
Machu Picchu
CuscoInca
empire
Inca emperor
Peru
Spanish Conquest
Sacred Valley
Chankas
Lost City
Pachacuti
In addition to explicitly used concepts and terms, Machu Picchu is extracted from the article as a Shadow Concept. As a prerequisite, one has to provide and analyze a representative text corpus first.
Example:
Corpus AnalysisUse PoolParty for Deep Text Analysis
15
Bionics
How do we learn from a lot of text?
16 Bla bla bla bla. Bla bla bla bla
The stove is on. The stove is hot!
Ontological model → reasoningTaxonomical model → is-a abstractions
Bla stove bla bla. Bla bla bla hot
Switched on devices are dangerous devices.
The stove is on. The stove is hot!
Statistical model/cooccurences → is related
The stove is on. The stove is hot!
Switched on devices are dangerous, only if the operating temperature is above 100 degrees and the automatic shutdown mechanism is broken.
Bla bla bla bla. Bla bla bla bla
Graphs + Machine Learning
PoolParty as a supervised learning system
17Content Manager
Integrator
Taxonomist/Ontologist
ThesaurusServer
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Corpus Learning/ Semantic Analysis
CMS
extends
is basis of
analyzesuses API
proposesextensions
Knowledge graphs as a result of human-machine cooperation
18Manually created parts of graph
Supervised learning
Automatically created parts of graph(corpus analysis, RDF transformation, machine learning, ….)
PoolParty Corpus Analysis
How taxonomists can extend taxonomies with some help from machine learning algorithms
19
Candidate Concepts derived from sample documents can be easily integrated into taxonomy. A list of possible Candidate Concepts is
shown per document or as a list of most relevant candidates per corpus.
Context of a given taxonomy concept can be visualised with a few mouse-clicks. Terms, concepts and shadow concepts
can be high-lighted per document.
Network-based Knowledge Graph Assessment
Thesaurus Harmonizer
20 ▸ Find missing relationships between concepts, which are of high semantic relevance
▸ Point out structural flaws in existing thesauri
▸ Identify corpora that only reflect a fraction of a thesaurus ▹ Or, vice versa: identify
thesauri that are far too big for their domain applications, and possibly missing details
Use CasesBenefit from Semantic Knowledge Graphs
and Machine Learning
21
PoolParty Extractor
Extract concepts from text even if not used explicitly
22
Some domains use text that doesn’t always call a spade a spade. With ‘shadow concept extraction’ those ‘masked’ concepts still can be surfaced.
Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors.- The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee- Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor)- The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector.
Climate Change
Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors.- The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee- Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor)- The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector.
Climate Change
PoolParty Semantic Classifier
Text Classification based on Machine Learning and Semantic Knowledge Models
23
PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
Benchmarking the PoolParty Semantic Classifier
Improvement of 5.2% compared to traditional (term-based) SVM
24
Features used Classifier F1 (5 folds) Variance
Terms LinearSVC 0.83175 0.0008
Concepts from REEGLE + Shadow Concepts LinearSVC 0.84451 0.0011
Concepts from REEGLE LinearSVC 0.84647 0.0009
Terms + Concepts from REEGLE + Shadow Concepts LinearSVC 0.87474 0.0009
Reegle thesaurusA comprehensive SKOS taxonomyfor the clean energy sector(http://data.reeep.org/thesaurus/guide)
● 3,420 concepts● 7,280 labels (English version)● 9,183 relations (broader/narrower + related)
Document Training Set1.800 documents in 7 classesRenewable Energy, District Heating Systems, Cogeneration, Energy Efficiency, Energy (general), Climate Protection, Rural Electrification
Sample Calculation
Based on an improvement of 5.2%
25Inbound
Documents
PoolParty Semantic Classifier
ExperiencedAgent
● 100,000 documents (emails, tickets, etc.) per month● 5 Euros extra costs per document when misrouted
● Cost savings per year:○ 1,200.000 x €5.0 x 0.052 = € 312,000 per annum
Use Shadow Concepts to improve Recommender Systems
26Mini Countryman
And it’s probably more of a crossover than ever, with the design to match, Being a Mini, the Countryman is clearly meant to be the driver’s car among small crossovers. The suspension is sophisticated, and there are lots of chassis options (a stiffer sports setup, variable damping, the electronically controlled ALL4 all-wheel-drive).
But it’s also the crossover for people who’ve bags of cash to blow on personalisation and luxury.
There’s been a lot of effort on ramping up the cabin quality, but then the outgoing Countryman was a sad let-down in that department.
On the outside, plastic wheel-arch extensions, with eyebrow creases in the metalwork above, as well as roof bars and sill protectors all add to the visual crossover-ness. This remains the only Mini with angular rather than oval headlamps, and there’s a load of visual posturing going on in the lower face.
There are eight versions at launch, and they’re exactly what you’d expect. It’s Cooper or Cooper S, each fuelled by petrol or diesel, each of them with front drive or ALL4. Oh and an eight-speed auto, too, if you count that as a separate choice. The Cooper petrol is a three-cylinder, the rest fours.
You get extra kit as standard versus the old car, including navigation, Bluetooth, emergency call and park sensors. Upgrades include a bigger touch-screen nav with high-definition traffic, various posher seats, a HUD, and driver aids. Oh and a cushion thingy that folds out from the boot so you can sit on the rear bumper without getting your clothes mucky.
In June 2017 a Cooper E will launch, which has the Cooper three-cylinder petrol driving the front wheels, and an electric motor for the rears, with a capacity to do a claimed 25 miles of gentle all-electric running. So it has the performance of a Cooper S ALL4 with the tax-busting advantages of a plug-in hybrid. And you wouldn’t use any fuel if you commuted a short distance.
The platform is BMW’s contemporary transverse-engined hardware, in the bigger of its two sizes. That means it shares a lot with the BMW X1. The 4WD system is more sophisticated than the previous Countryman’s. The proportion of drive to the rear is computed by a controller that takes into account parameters including grip, steering angle and throttle position, as well as whether you’ve got the sports mode and sports traction systems selected.
Use a Knowledge Graph + Co-occurences for precise Content Recommendation
27 RavingDe-Void
Scott
attack
Stilinski
friend
shame
O’Brien
woman
married
girl
attractive
Sim
ilar e
piso
des!
love
Example: Find similar episodes
Rules-based Recommender Systems
Example: Wine-to-Cheese Harmonizer
Live Demo
28 Dry
Medium-bodied
High acidity
Weingut Weinrieder
Grüner Veltliner
Alte Reben
is characterized by
Nutmeg
Full-bodied
Warm finish
Tobacco
is characterized by
Nagelkaas
Cumin
Clove
Hard cheese
Higher fat
?is characterized by
matches
matches
does not match
Why ‘The Knot’ uses Machine Learning and Semantic Models
29 ▹ XO Group runs ‘The Knot’ since 1996
▹ NYSE: XOXO (S&P 600 Component)
▹ 1.5 million active members▹ The Knot has helped marry
25 million couples▹ Partnering with 300,000
wedding vendors ▹ Millions of vendor reviews
Thank you for your interest!
Andreas BlumauerCEO, Semantic Web Company
▸ Mail [email protected]▸ Company https://www.semantic-web.com ▸ LinkedIn https://www.linkedin.com/in/andreasblumauer▸ Twitter https://twitter.com/semwebcompany ▸ Blog https://www.linkedin.com/today/
author/andreasblumauer
30
© Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/
Top Related