1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics...

22
1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang, Donna Truran National Centre for Classification in Health

Transcript of 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics...

Page 1: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

1

Topic: Identifying the Data Schema behind SNOMED CT

Jon Patrick, Centre for Health Informatics Research & Development, University of SydneyMing Zhang, Donna TruranNational Centre for Classification in Health

Page 2: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

2

Outline

Project description Research methodology Experiments and Results Conclusion Limitation Recommendation for future work

Page 3: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

3

Project Description

Project background SNOMED CT – The core content is stored in simple

tables Project Objective

To discover the conceptual model of SNOMED CT by reverse engineering

Page 4: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

4

Research methodology

Data preparation Transfer the SNOMED CT core content table into

RDBMS , that is the Text file into MySQL

Ontology Structure Investigation Database querying -- Explicit characteristics Programming – Implicit characteristics

Data modelling Analysis of the different characteristics and features

so as to generate the conceptual data model

Page 5: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

5

Experiment and Result

Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Full structure implicit Characteristics of the Ontology Classification principles Relationship patterns

Page 6: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

6

Original Data model

3 data tables: Concepts: one clinical idea is recorded as an concept:

Descriptions: one clinical idea could have more than one description in this table

Relationships: each row represents a relationship between two concepts

16953009 0 elbow joint structure Xa1q8 T15430 1

28696014 0 16953009 elbow joint 0 2 en

711822028 16953009 272741003 182353008 1 2 0

laterality side

Page 7: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

7

Fully defined and primitive concepts

Primitive: A concept is primitive if its defining characteristics

are insufficient to define it – that is it has more content than indicated by its attributes and relationship, e.g. clinical finding

Fully defined concepts A concept is fully defined if its defining

characteristics are sufficient

“sufficient” and “insufficient” are determined by SNOMED experts.

Currently 41244 (11%) concepts are fully defined

Page 8: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

8

Relationship types

Relationships between two concepts

“laterality” is a “relationship type” According to the statistics there are 1.4 million

records of relationships,

There are 62 relationship types used currently to represent the relationships between two concepts.

711822028 16953009 272741003 182353008 1 2 0

laterality side

Page 9: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

9

Relationship types

Time aspect Access instrument Laterality Revision status

WAS A Has specimen Interprets Procedure context

Indirect device After MAY BE A Associated with

Measurement method Has focus Has active ingredient Due to

Specimen source identity Approach Causative agent Specimen source topography

Scale type REPLACED BY SAME AS Associated procedure

Specimen source morphology Using Access Has intent

Property Has dose form Procedure site Associated finding

Recipient category Direct device Part of Direct morphology

Procedure morphology Finding context Priority Has definitional manifestation

Specimen substance Procedure site - Direct Method Occurrence

Pathological process Has interpretation Associated morphology Component

Procedure device Direct substance Episodicity Onset

Indirect morphology Procedure site - Indirect Severity Is a

Specimen procedure Temporal context Course

MOVED TO Subject relationship context Finding site

Page 10: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

10

Hierarchy structure

In the collection of relationship types, “IS_A” represents the hierarchal relationship.

485,335 records in relationships tables are stored in the hierarchal information of SNOMED CT

The main hierarchal features root level(no parents): one root “SNOMED CT CONCEPT” middle node level (have parents and children): 80895 (22%) concepts

25687 nodes have only 1 child

leaf node level (no children) 285283 (78%) concepts

Page 11: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

11

Multiple inheritance

one concept in SNOMED CT may have many children and many parents

25687

15213

9678

6621

4768

33112530

1901 1545 1626

8016

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8 9 10 >10

Number of Children

Nu

mb

er

of

Co

nc

ep

ts

Page 12: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

12

Multiple inheritance

Distribution of The number of parent

282775

59910

15711

4979 1804 9990

50000

100000

150000

200000

250000

300000

1 2 3 4 5 >5

number of parent

nu

mb

er

of

co

nc

ep

ts

Page 13: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

13

Hierarchy structure - example

Root

MiddleNodes

leaf

Multiple parents

Page 14: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

14

Full structure

Bacterial pneumonia

Infective pneumonia Bacterial infectious disease

Disease

Sudden Onset

courses

Episodicities

bacteria

structure of interstitial tissue of lung

Causative agent

Finding site

onset

course

episodicity

Page 15: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

15

Experiment and Result

Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Fully structure Implicit Characteristics of the Ontology Classification principle Relationship patterns

Page 16: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

16

Classification principle

Top level categories: 18 direct children of root Each concept belongs to only one top level

category So all concepts in SNOMED CT can be divided into 18 groups

Page 17: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

17

Implicit

Top level category Number Of concepts

Physical force 200

Specimen 1044

Staging and scales 1108

Linkage concept 1129

Events 1642

Environments and geographical locations 1666

Physical object 4355

Social context 5188

Context-dependent categories 6836

Observable entity 7568

Qualifier value 8266

Pharmaceutical / biologic product 19639

Substance 23022

Organism 26134

Body structure 31760

Procedure 52741

Special concept 62014

Clinical finding 111866

Page 18: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

18

Relationship patterns

Relationship table

582896029 76752008363698007254837009 ….

Relationship ID Source concepts Relationship Type Target concepts

Breast cancer Finding stte Breast Structure

Clinical finding Body structure

Finding sttePattern: Clinical finding Body structure

The specific relationship type between any two Top categories

Page 19: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

19

Relationship patterns

Pattern: {C1,type,C2} C1 is the one of 18 top categories type is the one of 62 relationship types C2 is the one of 18 top categories There are 18x62x18 = 20088 possible patterns

Each record in 1.4 million relationships records match one pattern.

To avoid ambiguity, the scope of this study covers only is “active” concepts

The results show only 78 patterns have instance in relationship table.

Page 20: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

20

Data modelling based on patterns

For example: to find the relationship between “clinical

finding” and other top categories.

Clinical finding (finding) Causative agent (attribute) Pharmaceutical / biologic product (product)

Clinical finding (finding) Course (attribute) Qualifier value (qualifier value)

Clinical finding (finding) Due to (attribute) Clinical finding (finding)

Clinical finding (finding) Episodicity (attribute) Qualifier value (qualifier value)

Clinical finding (finding) Finding site (attribute) Body structure (body structure)

…………………..

Clinical finding (finding) Has definitional manifestation

(attribute) Clinical finding (finding)

Page 21: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

21

Conceptual Data Model

Procedure

Environments and geographical locations

Specimen

Social context

Physical object

Physical force

Pharmaceutical / biologic product

Observable entity

Substance

OrganismCausative

agent

Causative agent

Finding site

After

Due to

Has definitional manifestation

Associated with

Interprets

Associated with

Causative agent

Causative agent

Causative agent

Associated with

Interprets

After

Associated with

EpisodicityOnset

Has interpretation

Pathological process

Occurrence

Course

Severity

Direct morphologyProcedure

morphology ComponentProcedure

site - Indirect

Indirect morphology

Procedure site

Procedure site - Direct

Has focus

Component

Component

Using

Procedure device

Indirect device

Direct device

UsingAccess

instrument

Body Structure

Part of

Laterality

Has intent

Revision status

Scale type

Approach

Access

Priority

Method

Property

Time aspect

Measurement method

Has focus

Recipient category

Context-dependent categories

Associated finding

Associated finding

Associated procedure

Associated finding

Associated procedure

Procedure contex

Temporal context

Finding context

Component Direct substance

Specimen source topography

Specimen source morphology

Specimen source identity

Has specimen Specimen

procedure

Specimen source identity

Specimen source identity

Associated morphology

Associated with

Clinical finding

Subject relationship context

Has dose form

Has active ingredient

Has active ingredient

Specimen substance

Qualifier value

Procedure

Environments and geographical locations

Specimen

Social context

Physical object

Physical force

Pharmaceutical / biologic product

Observable entity

Substance

Organism

Body Structure

Context-dependent categories

Linkage concept

Clinical finding

Qualifier value

Event

Special concept

Staging and scales

SNOMED CT

CONCEPT

ISA

Confidential -- Draft

Conceptual Model for SNOMED CT

Jon Patrick & Ming Zhang

School of Information Technologies

University of Sydney

01/02/2006

Page 22: 1 Topic: Identifying the Data Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang,

22

Future Work

Design a methods of defining real-world constraints over the relationships E.g. suicide can have slow onset

Develop storage and maintenance procedures for managing the data, e.g. there is no constraint over the data model as it exists at the moment.

Design a terminology server to deliver SCT to vendors.

Work with vendors to define a transport mechanism for vendors to be able to install SCT.

Create Internet access to SCT content for ad hoc users.

Start working on systems that demonstrate the value of SCT for clinical and administrative work.