When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional...

39
When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector with acknowledgement to with acknowledgement to Jeremy Rogers, Pieter Jeremy Rogers, Pieter Zanstra Zanstra , & the GALEN Consortium , & the GALEN Consortium Nick Drummond, Matthew Horridge, Hai Wang in CO Nick Drummond, Matthew Horridge, Hai Wang in CO - - ODE/ ODE/ HyOntUSE HyOntUSE Information Management Group Dept of Computer Science, U Manches Information Management Group Dept of Computer Science, U Manches ter ter Holger Knublauch, Ray Holger Knublauch, Ray Fergerson Fergerson , , and the Prot and the Prot é é g g é é - - Owl Team Owl Team [email protected] [email protected] co co - - ode ode - - [email protected] [email protected] www.co www.co - - ode.org ode.org protege.stanfo protege.stanfo rd.org rd.org www.opengalen.org www.opengalen.org 1 Open GALEN

Transcript of When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional...

Page 1: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

When and Why to use a Classifier?When and Why to use a Classifier?

Alan RectorAlan Rectorwith acknowledgement to with acknowledgement to

Jeremy Rogers, Pieter Jeremy Rogers, Pieter ZanstraZanstra, & the GALEN Consortium, & the GALEN ConsortiumNick Drummond, Matthew Horridge, Hai Wang in CONick Drummond, Matthew Horridge, Hai Wang in CO--ODE/ODE/HyOntUSEHyOntUSE

Information Management Group Dept of Computer Science, U ManchesInformation Management Group Dept of Computer Science, U Manchesterter

Holger Knublauch, Ray Holger Knublauch, Ray FergersonFergerson, , …… and the Protand the Protééggéé--Owl TeamOwl Team

[email protected]@[email protected]@cs.man.ac.uk

www.cowww.co--ode.orgode.orgprotege.stanfoprotege.stanfo��rd.orgrd.orgwww.opengalen.orgwww.opengalen.org

1O penGA LEN

Page 2: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Reasons to classify (1)Reasons to classify (1)• Managing Compositional ontologies / Terminologies

– “Conceptual Lego”• Managing combinatorial explosions - the exploding bicycle

– Empowering users• “Just in time” ontologies

– Give the users the Lego set with limited connectors

– Organising polyhierarchies / Modularizing ontologies• “Normalising ontologies”• Multiaxial indexing of resources

– Providing multiple views -• Reorganising the ontology by new abstractions

– Constraining ontologies & schemas• Enforcing constraints• Imposing policies

– For clinical Statements with SNOMED entriesin request mode � context is request

Page 3: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Reasons to classify(2)Reasons to classify(2)

• ‘Matching’ instances against classes– Resource/service discovery

– Self-describing storage• ‘Archetypes’ & templates

• Providing a skeleton for default reasoning & Prototypes(but not to do the reasoning itself)– Molluscs typically have shells

• Cephalopods are kinds of Molluscs but typically do not have shells– Nautiloids are kinds of Cephalopods but typically do have shells

» Nautilus ancestor are kinds of Nautiloids but do (did) not have shells

– Biology is full of exceptions

Page 4: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Classification is about ClassesClassification is about Classes

• Classification works for– Organising & constraining classes / schemas

– Identifying the classes to which an instance definitely belongs• Or those to which it cannot belong

• Classification is open world– Negation as unsatisfiability

• ‘not’ == ‘impossible’ (“unsatisfiable”)

– Databases, logic programming, PAL, queries etcare closed world

• Negation as failure– ‘not’ == cannot be found

Page 5: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Reasons not to ClassifyReasons not to Classify• To query large number of instances

– Open world (“A-Box”) reasoning does not work over large numbers of instances

• If the question is closed world• E.g. “Drugs licensed for treatment of asthma”

• If the query requires non-DL reasoning• E.g. numerical, optimisation, probabilistic, …

– Would like to have a more powerful hybrid reasoner

• For Metadata and Higher Order Information– Classifiers are strictly first order

• A few things can be ‘kluged’

• If there are complex defaults and exceptions– “Prototypical Knowledge”

• E.g. “Molluscs typically have shells”– NB Simple exceptions can be handled, but requires care

Page 6: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Use insteadUse instead• To query large numbers of instances OR

If the query is closed world– Queries / constraints over databases– Instance stores / triple stores / …– Rules

• DL-programming• JESS, Algernon, Prolog, …

– Belief revision / non-monotonic reasoning

• If query requires Non DL Reasoning– Hybrid reasoners or ?SWRL?

• No good examples at the moment

• For defaults and Exceptions & Prototypical Knowledge– Traditional frame systems

• More expressive default structure than Protégé– Exceptions for classes as well as instances

» Over-riding rather than narrowing

Page 7: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Classification to build Ontologies: Classification to build Ontologies: Conceptual LegoConceptual Lego

hand

extremity

body

acute

chronic

abnormal

normalischaemic

deletion

bacterial

polymorphism

cell

protein

gene

infection

inflammation

Lung

expression

Page 8: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

LogicLogic--based Ontologies: based Ontologies: Conceptual LegoConceptual Lego

“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIoncausing Increase in Viscosity of Mucus in CysticFibrosis…”

“Hand which isanatomicallynormal”

Page 9: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Linking taxonomies:Linking taxonomies:Conceptual LegoConceptual Lego

NormalisationNormalisationGenesSpecies

Protein

Function

Disease

Protein coded by(CFTRgene & in humans)

Membrane transport mediated by(Protein coded by

(CFTRgene in humans))

Disease caused by (abnormality in

(Membrane transport mediated by (Protein coded by (CTFR gene & in humans))))

CFTRGene in humans

Page 10: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Conceptual Lego and Conceptual Lego and NormalisationNormalisationPractical ExamplePractical Example

Page 11: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Take a Few Simple Concepts & PropertiesTake a Few Simple Concepts & Properties

Page 12: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Combine them in DescriptionsCombine them in Descriptionswhich can be simplewhich can be simple……..

Sickle cell disease is a disease caused Sickle cell disease is a disease caused somesome sicklingsickling haemoglobinhaemoglobin

Page 13: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

or which can be as complex as you likeor which can be as complex as you like

CytsticCytstic fibrosisisfibrosisis is caused by some nonis caused by some non--normal ion transport that is the function of normal ion transport that is the function of a protein coded for by a CFTR genea protein coded for by a CFTR gene

Page 14: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Add some definitionsAdd some definitions

““Diseases linked to CFTR GenesDiseases linked to CFTR Genes””

Page 15: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

We have built a simple treeWe have built a simple tree

easy to maintaineasy to maintain

Page 16: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Let the classifier organise it Let the classifier organise it

Page 17: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

If you want more abstractions,If you want more abstractions,just add new definitionsjust add new definitions

(re(re--use existing data)use existing data)

“Diseases linked to abnormalproteins”

Page 18: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

And let the classifier work againAnd let the classifier work again

Page 19: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

And again And again ––For a view based on species For a view based on species

“Diseases linked genesdescribed in the mouse”

Page 20: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

And let classifier check consistency And let classifier check consistency (My first try wasn(My first try wasn’’t)t)

Page 21: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Normalising (untangling) Normalising (untangling) OntologiesOntologies

StructureFunctionPart-wholeStructure Function

Part-whole

Page 22: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Untangling and EnrichmentUntangling and EnrichmentUsing a classifier to make life easierUsing a classifier to make life easier

Substance- Protein- - ProteinHormone- - - Insulin- Steroid- - SteroidHormone- - - Cortisol- Hormone- -ProteinHormone- - - Insulin- - SteroidHormone- - - Cortisol- Catalyst- - Enzyme- - - ATPase

- PhsioloicRole- - HormoneRole- - CatalystRole

- Substance- - Protein- - - Insulin- - - ATPase- Steroid- - Cortisol

� playsRolesomeValuesFrom CatalystRole

ATPase

� playsRolesomeValuesFrom HormoneRole

Cortisol

� playsRolesomeValuesFrom HormoneRole

Insulin

� Protein & playsRolesomeValuesFrom CatalystRole

Enzyme

� Substance & playsRole someValuesFromCatalystRole

Catalyst

� Steroid & playsRolesomeValuesFrom HormoneRole

SteroidHomone

� Protein & playsRolesomeValuesFrom HormoneRole

ProteinHormone

� Substance & playsRole-someValuesFromHormoneRole

Hormone

Substance- Protein- - ProteinHormone- - - Insulin- - Enzyme- - - ATPase- Steroid- - SteroidHomone^- - - Cortisol-Hormone- - ProteinHormone^- - - Insulin^- - SteroidHormone^- - - Cortisol^- Catalyst- - Enzyme^- - - ATPase^

Page 23: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

NormalisationNormalisation & Quality Assurance& Quality Assurance

• Humans recognise errors of commision easily– Miss errors of omission

• Classifiers convert errors of omission to errors of commission– Inadequate definitions create “orphans”

• BodyPartParts of Heart

Ventricle…

CardiacSeptum

• Classifiers flag errors of commision– Over definition leads to inconsistency (unsatisfiability)

• “Pneumonia located in the brain”

Page 24: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Enforcing constraints & policiesEnforcing constraints & policies

• A class with both necessary & sufficient and additional necessary conditions acts as a rule

• The Unit testing Framework supports checking that rules are enforced

Page 25: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

A Probe class to check a constraintA Probe class to check a constraint

All Tests PassedAll Tests Passed

Page 26: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Skeleton for Defaults & ExceptionsSkeleton for Defaults & Exceptions

use of beta blocker

in asthma

beta blocker asthma

serious

contraindication

mild

contraindication

cardioselective cardioselectivebeta blocker

use of cardioselectivebeta blocker

in asthma

Experience: Normalised Experience: Normalised ontologistsontologists lead to clean default lead to clean default inheritanceinheritance

Page 27: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

When to Classify When to Classify

• … but isn’t having a classifier an intollerableoverhead for the applications?

– It depends on the life cycle you choose

• Life cycles– Pre-coordination

– Just in time coordination

– Post Coordination

Page 28: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

PrePre--coordinationcoordinationIf the terms can be enumerated in advanceIf the terms can be enumerated in advance

”Asserted

form“Sources”

Classifier“Compiler”

Classifiedform

“binary”

Application

Authors

Users

(“high level language”“intermediate representations”)

Overwhelming the dominant Overwhelming the dominant pattern today.pattern today.

““binarybinary”” can be can be OWL Light RDF(S),OWL Light RDF(S),XML Schema, OBO, XML Schema, OBO, ……

Page 29: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Commit Results to a PreCommit Results to a Pre--Coordinated Coordinated OntologyOntology

Assert(“Commit”)changes inferred byclassifier

Page 30: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Post coordination|Post coordination|When there are When there are combinatoriallycombinatorially many potential termsmany potential terms

• “Lazy classification” on demand

Big on the outside: small kernel on the insideBig on the outside: small kernel on the insideAvoid the exploding bicycleAvoid the exploding bicycle

kernelmodel

Externally available resource

API

Page 31: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Post CoordinationPost Coordination

Terminology Services

”Asserted

form“Sources”

Classifier“Compiler”

Application

Authors

Users

(& intermediaterepresentations)

AssertedOntology

Store

But requires a classifier But requires a classifier available at application timeavailable at application time

Page 32: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Terminology Services

A CompromiseA CompromiseJust in Time CoordinationJust in Time Coordination

”Asserted

form“Sources”

Classifier“Compiler”

Application

Authors

Users

(& intermediaterepresentations)

AssertedOntology

Store

Pre-coordinatedCache

Only the occasional new notion Only the occasional new notion Requires classification Requires classification --

need not be real timeneed not be real time

Page 33: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Summary Summary

• Why Classify– Managing Compositional ontologies / Terminologies

• “Conceptual Lego”• Empowering users - just in time classification• Providing views & deferring decisions on abstractions• Quality assurance

– Constraining concepts & schemas– Providing a skeleton for default reasoning & Prototypes

• When to classify– Pre-coordination– Just-in-time coordination– Post-coordination

Page 34: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Summary: When to Classify?Summary: When to Classify?Applications do not need a classifier Applications do not need a classifier

to benefit from classificationto benefit from classification

• Pre-coordination– If concepts/terms can be predicted

– When classifier is not available at run time

– When we must fit with legacy applications

• Post-coordination– When a a few concepts are needed from a large potential set

– When a classifier is available • and time cost is acceptable

– When applications can be built or adapted to take advantage

Page 35: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial
Page 36: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Idiopathic Hypertensionin our co’s Phase 2 study

Skeleton for Fractal tailoring forms for clinical trialsSkeleton for Fractal tailoring forms for clinical trials

Hypertension

Idiopathic Hypertension

In our company’s studies

In Phase 2 studies

Hypertension

Idiopathic Hypertension`

In our company’s studies

In Phase 2 studies

Page 37: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

More on ViewsMore on Views

• A problem in the digital anatomist– To an anatomist, the Pericardium and the Heart are

separate organs

– To a clinician they are part of the same organ• A disease of the pericardium counts as a kind of heart

disease

Page 38: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

Represent context and views by Represent context and views by variant propertiesvariant properties

Organ

HeartPericardium

OrganPart

CardiacValve

Disease of (Heart or part-of-heart)

Disease of Pericardium

is_part_of

is_structurally_part_ofis_clinically_part_of

Page 39: When and Why to use a Classifier? - Protégé · Reasons to classify (1) • Managing Compositional ontologies / Terminologies – “Conceptual Lego” • Managing combinatorial

ProtProtééggéé--OWL alternative viewsOWL alternative views

Disorder of “Clinical heart”“Disorder of heart of any part of the heart”

(including clinical and functional parts)

Disorder of “FMA heart”“Disorder of heart or any structural part of the heart”