Using ontologies to make sense of unstructured medical data
description
Transcript of Using ontologies to make sense of unstructured medical data
![Page 2: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/2.jpg)
NCBO: Key activities
• We create and maintain a library of biomedical ontologies.
• We build tools and Web services to enable the use of ontologies and their derivatives.
• We collaborate with scientific communities that develop and use ontologies.
![Page 3: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/3.jpg)
http:
//re
st.b
ioon
tolo
gy.o
rghtt
p://
rest
.bio
onto
logy
.org
Ontology ServicesOntology Services
• Download• Traverse• Search• Comment
• Download• Traverse• Search• Comment
WidgetsWidgets• Tree-view• Auto-complete• Graph-view
• Tree-view• Auto-complete• Graph-view
AnnotationAnnotation
Data AccessData Access
Mapping ServicesMapping Services
• Create• Download• Upload
• Create• Download• Upload
ViewsViews
Term recognitionTerm recognition
Fetch “data” annotated with a given term
Fetch “data” annotated with a given term
http://bioportal.bioontology.orghttp://bioportal.bioontology.org
![Page 4: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/4.jpg)
Annotation service
Process textual metadata to automatically tag text with as many ontology terms as possible.
90 million calls, ~700 GB of data90 million calls, ~700 GB of data
![Page 5: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/5.jpg)
Resource index
Pubmed AbstractsPubmed Abstracts
Adverse Events (AERS)Adverse Events (AERS)
GEOGEO
::
Clinical TrialsClinical Trials
Drug BankDrug Bank
Won 1st prize at the 2010 Semantic Web Challenge @ ISWC
Won 1st prize at the 2010 Semantic Web Challenge @ ISWC
![Page 6: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/6.jpg)
Creating Lexicons
Term – 1:::Term – n
Sentence in Clinical Note – 1:::Sentence in Clinical Note – m
tf df NN JJ … VP …
ID Term-1 150,879 90,000 0.90 0.05 … 0.03 …
ID :
ID Term-n
Syntactic typesSyntactic typesFrequencyFrequency
Frequency counter
![Page 7: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/7.jpg)
Annotation AnalyticsAnnotation Analytics
Analyzing tagged data for hypothesis generation in bioinformatics
![Page 8: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/8.jpg)
Genome
Generic GO based analysis routineGeneric GO based analysis routine
Reference set
Study Set• Get annotations for each
gene in a set
• Count the occurrence of each annotation term in the study set
• Count the occurrence of that term in some reference set (whole genome?)
• P-value for how surprising their overlap is.
![Page 9: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/9.jpg)
Annotation Analytics Landscape
SNOMED-CT
Gene Ontology
Gene Sets
NCIT
ICD-9
Human Disease
Cell Type
MeSH
Drugs, Chemicals
Grant Sets
Paper Sets
Patient Sets
Drug Sets
: ??
Health Indicator Warehouse datasets
![Page 10: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/10.jpg)
Open questions
1. Can we use something other than the GO?
2. Lack of annotations—even today, roughly 20% of genes lack any GO annotation.
3. Annotation bias—annotation with certain ontology terms is not independent of each other.
4. Lack of a systematic mechanism to define a level of abstraction.
![Page 11: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/11.jpg)
Profiling a set of Aging genes
Disease Ontology
~ 30% of genome
261 Age-related genes
Genome
![Page 12: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/12.jpg)
Using ontologies other than GO
ERCC6 nucleoplasmPARP1 protein N-terminus bindingERCC6 nucleoplasmPARP1 protein N-terminus binding
ERCC6 <disease term?> PARP1 <disease term?>ERCC6 <disease term?> PARP1 <disease term?>
![Page 13: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/13.jpg)
ERCC6 GO:0005654 PMID:16107709ERCC6 GO:0008094 PMID:16107709PARP1 GO:0047485 PMID:16107709ERCC6 GO:0005730 PMID:16107709PARP1 GO:0003950 PMID:16107709
http://www.geneontology.org/GO.downloads.annotations.shtmlhttp://www.geneontology.org/GO.downloads.annotations.shtml
Enrichment Analysis with the DO
www.ncbi.nlm.nih.gov/pubmed/16107709www.ncbi.nlm.nih.gov/pubmed/16107709
NCBO Annotator:http://bioportal.bioontology.orgNCBO Annotator:http://bioportal.bioontology.org
{ERCC6, PARP1} PMID:16107709{ERCC6, PARP1} PMID:16107709
{ERCC6, PARP1} {Cockayne syndrome, DNA damage}{ERCC6, PARP1} {Cockayne syndrome, DNA damage}
![Page 14: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/14.jpg)
Annotation Analytics on EMR dataAnnotation Analytics on EMR data
Analysis of tagged data from electronic health records
![Page 15: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/15.jpg)
Profiling patient setsProfiling patient sets
86k patient Reports
ICD9 789.00 (Abdominal pain, unspecified site)
Patient records processed from U. Pittsburgh NLP Repository with IRB approval.
![Page 16: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/16.jpg)
Annotation (Clinical Text)
![Page 17: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/17.jpg)
Term – 1:::Term – nSyntactic typesSyntactic types
FrequencyFrequency
Term recognition tool NCBO Annotator
Term recognition tool NCBO Annotator
NegEx Patterns
NegEx Patterns
NegEx Rules – Negation detection
NegEx Rules – Negation detection
P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9
P1 T1, T2, no T4
… T5, T4, T3
… T4, T3, T1
T8, T9, T4
… T6, T8, T10
T1, T2, no T4
P2
P2
P3
P3
:
:
Pn
PnTerms form a temporal series of tags
Coh
ort
of
Inte
rest
DiseasesDiseases
ProceduresProcedures
DrugsDrugs
BioPortal – knowledge graph
Creating clean lexicons
Annotation Workflow
Furt
her A
naly
sis
Text clinical note
Terms Recognized
Negation detection
Generation of tagged dataGeneration of tagged data
![Page 18: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/18.jpg)
ROR of 2.058, CI of [1.804, 2.349]The X2 statistic has p-value < 10-7
ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816.
Detecting the Vioxx Risk SignalDetecting the Vioxx Risk Signal
Vioxx Patients (1,560)
RA Patients (14,079)
MI Patients (1,827)
VioxxMI (339)
p-value < 1.3x10-24
![Page 19: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/19.jpg)
Detecting Adverse Events
![Page 20: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/20.jpg)
Detecting Adverse Events
Linear Space Features Logarithmic Space Features
Drug frequency Drug frequency
Disease frequency Disease frequency
Observed drug-first fraction Observed co-mention count
Drug-first fraction z-score (fixed drug)
Co-mention count z-score (fixed drug)
Drug-first fraction z-score (fixed disease)
Co-mention count z-score (fixed disease)
![Page 21: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/21.jpg)
Detecting Adverse Events
![Page 22: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/22.jpg)
Detecting Off-label useDetecting Off-label use
![Page 23: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/23.jpg)
Annotation Analytics Landscape
SNOMED-CT
Gene Ontology
Gene Sets
NCIT
ICD-9
Human Disease
Cell Type
MeSH
Drugs, Chemicals
Grant Sets
Paper Sets
AgingAging
Patient Sets
Drug Sets
:
EMRsEMRs
What questions
can we ask?
What questions
can we ask?
Health Indicator Warehouse datasets
![Page 24: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/24.jpg)
Associations and outcomes
Gene Disease Drug Device Procedure Environment
Gene
Disease
Drug
Device
Procedure
Environment
Side effects
Off-label Indications
Enrichment
What questions
can we ask?
What questions
can we ask?
![Page 25: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/25.jpg)
Acknowledgements
• Paea LePendu• Yi Liu• Srinivasan Iyer• Steve Racunas• Anna Bauer-Mehren• Clement Jonquet• Rong Xu
• Mark Musen• NIH – NCBO funding• Mayo Team
• Hongfang Liu• Stephen Wu
• Sylvia Holland• Alex Skrenchuk
![Page 26: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/26.jpg)
Mining Annotations of Grants, Publications
Grants from 1972 to 2007 30 funding agencies
Publications from MedlineOnly “Journal articles”
![Page 27: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/27.jpg)
Sponsorship and AllocationSponsorship and Allocation
![Page 28: Using ontologies to make sense of unstructured medical data](https://reader035.fdocuments.net/reader035/viewer/2022081603/56814644550346895db34fe3/html5/thumbnails/28.jpg)
Who funds whatWho funds what