Semantic phenotyping for disease diagnosis and discovery

61
CBIIT SPEAKER SERIES 1.21.15 SEMANTIC PHENOTYPING FOR DISEASE DIAGNOSIS AND DISCOVERY @monarchinit www.monarchinitiative.org Matchmaker Exchange Melissa Haendel @ontowonka

Transcript of Semantic phenotyping for disease diagnosis and discovery

Page 1: Semantic phenotyping for disease diagnosis and discovery

CBIIT SPEAKER SERIES

1.21.15

SEMANTIC

PHENOTYPING FOR

DISEASE DIAGNOSIS

AND DISCOVERY

@monarchinit

www.monarchinitiative.orgMatchmaker

Exchange

Melissa

Haendel

@ontowonka

Page 2: Semantic phenotyping for disease diagnosis and discovery

TODAY’S TALK

The computable phenotypic profile

Exome analysis for disease diagnosis

Crossing the species divide

What is GOOD phenotyping?

Chronological considerations

Page 3: Semantic phenotyping for disease diagnosis and discovery

http://anthro.palomar.edu/abnormal/abnormal_4.htmhttp://www.pyroenergen.com/articles07/downs-syndrome.htm

http://www.theguardian.com/commentisfree/2009/oct/27/downs-syndrome-increase-

terminations

YOU ALL KNOW THIS PRESENTATION

Page 4: Semantic phenotyping for disease diagnosis and discovery

BUT A COMPUTER DOES NOT

“Phenotypic Profile”

Page 5: Semantic phenotyping for disease diagnosis and discovery

Often free text or checkboxesDysmorphic features

• df

• dysmorphic

• dysmorphic faces

• dysmorphic features

Congenital malformation/anomaly:

• congenital anomaly

• congenital malformation

• congenital anamoly

• congenital anomly

• congential anomaly

• congentital anomaly

• cong. m.

• cong. Mal

• cong. malfor

• congenital malform

• congenital m.

• multiple congenital anomalies

• multiple congenital abormalities

• multiple congenital abnormalities

Examples of lists:

* dd. cong. malfor. behav. pro.

* dd. mental retardation

* df< delayed puberty

* df&lt

* dd df mr

* mental retar.short stature

CLINICAL PHENOTYPING

Page 6: Semantic phenotyping for disease diagnosis and discovery

6% OF THE GENERAL POPULATION SUFFERS FROM

A RARE DISORDER

6% of patients contacting the NIH Office of

Rare Disorders do not have a diagnosis

Page 7: Semantic phenotyping for disease diagnosis and discovery

THE YET-TO-BE DIAGNOSED PATIENT

Known disorders not recognized during

prior evaluations?

Atypical presentation of known

disorders?

Combinations of several disorders?

Novel, unreported disorder?

Page 8: Semantic phenotyping for disease diagnosis and discovery

THE CHALLENGE: INTERPRETATION OF

DISEASE CANDIDATES

?

What’s in the box?

How are

candidates

identified?

How do they

compare?

Prioritized

Candidates,

functional validation

C1

C2

C3

C4

...

Phenotypes

P1

P2

P3

Genotype

G1

G2

G3

G4

…Pathogenicity, frequency,

protein interactions, gene

expression, gene

networks, epigenomics,

metabolomics….

Environments

E1, E2, E3, E4 …

Page 9: Semantic phenotyping for disease diagnosis and discovery

MATCHING PATIENTS TO DISEASES

Patient

Disease X

Differential diagnosis with similar but non-matching phenotypes is difficult

Flat back of head Hypotonia

Abnormal skull morphology Decreased muscle mass

Page 10: Semantic phenotyping for disease diagnosis and discovery

SEARCHING FOR PHENOTYPES USING

TEXT ALONE IS INSUFFICIENT

OMIM Query # Records

“large bone” 785

“enlarged bone” 156

“big bone” 16

“huge bones” 4

“massive bones” 28

“hyperplastic bones” 12

“hyperplastic bone” 40

“bone hyperplasia” 134

“increased bone growth” 612

Page 11: Semantic phenotyping for disease diagnosis and discovery

Phenotypes.

Page 12: Semantic phenotyping for disease diagnosis and discovery

THINK GRAPHICALLY

Each node is a different phenotype, classified by anatomical system

Page 13: Semantic phenotyping for disease diagnosis and discovery

DISEASE X IS A COLLECTION OF NODES

Each disease is associated with different phenotype nodes in the graph

Disease X

Page 14: Semantic phenotyping for disease diagnosis and discovery

EACH DISEASE IS ANNOTATED WITH A

PHENOTYPIC PROFILE

Chromosome 21 Trisomy

Failure

to thrive

Umbilical

hernia

Broad

handsAbnormal

ears

Flat

head

Down’s

Syndrome

Page 15: Semantic phenotyping for disease diagnosis and discovery

PHENOTYPE “BLAST”: WHICH PHENTOYPIC

PROFILE IS GRAPHICALLY MOST SIMILAR?

Disease X

Patient

Disease Y

Page 16: Semantic phenotyping for disease diagnosis and discovery

FINDING THE PHENOTYPE GRAPH IN

COMMON

Disease X

Patient

Disease Y

Page 17: Semantic phenotyping for disease diagnosis and discovery

THE HUMAN PHENOTYPE ONTOLOGY

Used to annotate:

• Patients

• Disorders/Diseases

• Genotypes

• Genes

• Sequence variants

In human

Reduced pancreaticbeta cells

Abnormality ofpancreatic islet

cells

Abnormality of endocrinepancreas physiology

Pancreatic islet cell adenoma

Pancreatic islet celladenoma

Insulinoma

Multiple pancreaticbeta-cell adenomas

Abnormality of exocrinepancreas physiology

Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Page 18: Semantic phenotyping for disease diagnosis and discovery

WHY DO WE NEED THE HUMAN

PHENOTYPE ONTOLOGY?

Winnenburg and Bodenreider, ISMB PhenoDay, 2014

How does HPO relate to other clinical vocabularies?

Page 19: Semantic phenotyping for disease diagnosis and discovery

EXOME ANALYSIS

Recessive, de novo filters

Remove off-target, common variants,

and variants not in known disease

causing genes

http://compbio.charite.de/PhenIX/

Target panel of 2741 known

Mendelian disease genes

Compare

phenotype

profiles using

data from:

HGMD, Clinvar,

OMIM, Orphanet

Zemojtel et al. Sci Transl Med 3 September 2014: Vol. 6,

Issue 252, p.252ra123

Page 20: Semantic phenotyping for disease diagnosis and discovery

CONTROL PATIENTS WITH KNOWN

MUTATIONS

Inheritance Gene Average

Rank

AD ACVR1, ATL1, BRCA1, BRCA2, CHD7 (4),

CLCN7, COL1A1, COL2A1, EXT1, FGFR2 (2),

FGFR3, GDF5, KCNQ1, MLH1 (2), MLL2/KMT2D,

MSH2, MSH6, MYBPC3, NF1 (6), P63, PTCH1,

PTH1R (2), PTPN11 (2), SCN1A, SOS1, TRPS1,

TSC1, WNT10A

1.7

AR ATM, ATP6V0A2, CLCN1 (2), LRP5, PYCR1,

SLC39A4

5

X EFNB1, MECP2 (2), DMD, PHF6 1.8

52 patients with diagnosed rare diseases

Page 21: Semantic phenotyping for disease diagnosis and discovery

PHENIX HELPED DIAGNOSE 11/40 PATIENTS

global developmental delay (HP:0001263)

delayed speech and language development (HP:0000750)

motor delay (HP:0001270)

proportionate short stature (HP:0003508)

microcephaly (HP:0000252)

feeding difficulties (HP:0011968)

congenital megaloureter (HP:0008676)

cone-shaped epiphysis of the phalanges of the hand (HP:0010230)

sacral dimple (HP:0000960)

hyperpigmentated/hypopigmentated macules (HP:0007441)

hypertelorism (HP:0000316)

abnormality of the midface (HP:0000309)

flat nose (HP:0000457)

thick lower lip vermilion (HP:0000179)

thick upper lip vermilion (HP:0000215)

full cheeks (HP:0000293)

short neck (HP:0000470)

Page 22: Semantic phenotyping for disease diagnosis and discovery

WHAT ABOUT THE PATIENTS WE CAN’T

SOLVE?

HOW DO WE UNDERSTAND RARE

DISEASE ETIOLOGY AND DISCOVER

TREATMENTS?

Page 23: Semantic phenotyping for disease diagnosis and discovery

B6.Cg-Alms1foz/fox/J

increased weight,

adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)

[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,

diabetes mellitus,

insulin resistance

increased food intake,

hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

MODELS RECAPITULATE VARIOUS

PHENOTYPIC ASPECTS

???

Page 24: Semantic phenotyping for disease diagnosis and discovery

HOW MUCH PHENOTYPE DATA?

Human genes have poor phenotype coverage

GWAS

+

ClinVar

+

OMIM

Page 25: Semantic phenotyping for disease diagnosis and discovery

HOW MUCH PHENOTYPE DATA?

Human genes have poor phenotype coverage

What else can we leverage?

GWAS

+

ClinVar

+

OMIM

Page 26: Semantic phenotyping for disease diagnosis and discovery

HOW MUCH PHENOTYPE DATA?

Human genes have poor phenotype coverage

What else can we leverage? …animal models

Orthology via PANTHER v9

Page 27: Semantic phenotyping for disease diagnosis and discovery

WHY WE NEED ALL THE MODELS

Combined, human and model phenotypes can be linked to

>75% human genes.

Orthology via PANTHER v9

Page 28: Semantic phenotyping for disease diagnosis and discovery

PHENOTYPIC DIVERSITY ACROSS SPECIES

=> May need different models to recapitulate different

aspects of the disease

Page 29: Semantic phenotyping for disease diagnosis and discovery

PROBLEM: CLINICAL AND MODEL

PHENOTYPES ARE DESCRIBED DIFFERENTLY

Page 30: Semantic phenotyping for disease diagnosis and discovery

lung

lung

lobular organ

parenchymatousorgan

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

PROBLEM: EACH ORGANISM USES

DIFFERENT VOCABULARIES

develops_frompart_ofis_a (SubClassOf)

surrounded_by

Page 31: Semantic phenotyping for disease diagnosis and discovery

SOLUTION: BRIDGING SEMANTICS

Mungall et al. (2012). Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

anatomical structure

endoderm of forgut

lung bud

lung

respiration organ

organ

foregut

alveolus

alveolus of lung

organ part

FMA:lung

MA:lung

endoderm

GO: respiratory gaseous exchange

MA:lungalveolus

FMA: pulmonary

alveolus

is_a (taxon equivalent)

develops_frompart_ofis_a (SubClassOf)

capable_of

NCBITaxon: Mammalia

EHDAA:lung bud

only_in_taxon

pulmonary acinus

alveolar sac

lung primordium

swim bladder

respiratory primordium

NCBITaxon:Actinopterygii

Köhler et al. (2014) F1000Research 2:30

Haendel et al. (2014) JBMS 5:21 doi:10.1186/2041-1480-5-21

Page 32: Semantic phenotyping for disease diagnosis and discovery

=> Web application for model phenotyping and G2P validation

PROBLEM: EACH SPECIES MAKES DIFFERENT

G2P ASSOCIATIONS

Page 33: Semantic phenotyping for disease diagnosis and discovery

INTEGRATED GENTOYPE-2-

PHENOTYPE DATA IN MONARCH

Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS;

Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources

Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse

Species Data

source

Genes Genotypes Variants Phenotype

annotations

Diseases

mouse MGI 13,433 59,087 34,895 271,621

fish ZFIN 7,612 25,588 17,244 81,406

fly Flybase 27,951 91,096 108,348 267,900

worm Wormbase 23,379 15,796 10,944 543,874

human HPOA 112,602 7,401

human OMIM 2,970 4,437 3,651

human ClinVar 19,694 111,294 252,838 4,056

human KEGG 2,509 3,927 1,159

human ORPHANET 3,113 5,690 3,064

human CTD 7,414 23,320 4,912

Page 34: Semantic phenotyping for disease diagnosis and discovery

EXOMISER: DIAGNOSING UDP_930 USING

A PHENOTYPICALLY SIMILAR MOUSE

Chronic acidosis

Neonatal hypoglycemia

Ostopenia

Short stature

decreased circulating

potassium level

Decreased circulating

glucose level

Decreased bone

mineral density

decreased body length

abnormal ion

homeostasis

Decreased

circulating

glucose level

Decreased

bone mineral

density

Short stature

UDP_930/29

phenotypesSms

tm1a(EUCOMM)Wtsi

Robinson et al. (2013). Genome Res, doi:10.1101/gr.160325.113

Page 35: Semantic phenotyping for disease diagnosis and discovery

EXOMISER: COMBINING PHENOTYPIC

SIMILARITY WITH OTHER DATA

MED21

MAU2

MED8

MED26

Recurrent otitis media

Spasticity

Esotropia

Cerebral palsy

Conductive hearing

impairment

Limitation of joint mobility

Strabismus

Hypertonia

Abnormality of

the middle ear

Abnormal joint

mobility

Strabismus

Abnormality of

central motor

function

UDP_2146/56

phenotypes

Brachmann-de

Lange syndrome

NIPBLMED23

?

CCNC

Contractures of the joints of the

lower limbs

Hypertonicity

CDK8

Page 36: Semantic phenotyping for disease diagnosis and discovery

UDP CASES ANALYZED WITH

EXOMISER

=> Use of genotype, phenotype, PPI, and inheritance

together provide best prioritization

Page 37: Semantic phenotyping for disease diagnosis and discovery

ANALYSIS OF UNSOLVED UDP CASES

4 families now have a diagnosis including, one novel

disease-gene association discovered: York Platelet

syndrome and STIM1

Strong candidates identified for 19 families that are

now undergoing functional validation through mouse

and zebrafish modeling

Several hundred UDP cases now being analyzed

using Exomiser and cross-species phenotype data

Page 38: Semantic phenotyping for disease diagnosis and discovery

HOW DOES THE CLINICIAN KNOW THEY’VE

PROVIDED ENOUGH PHENOTYPING?

How many annotations…?

How many different categories?

How many within each?

Page 39: Semantic phenotyping for disease diagnosis and discovery

Image credit: Viljoen and Beighton, J Med Genet. 1992

Schwartz-jampel Syndrome, Type I

Schwartz-jampel Syndrome, Type I

Caused by Hspg2 mutation, a proteoglycan

~100 phenotype annotations

Page 40: Semantic phenotyping for disease diagnosis and discovery

EVALUATION METHOD

Create a variety of “derived” diseases

More general (depth)

Remove subset(s) (breadth)

Introduce noise

Assess the change in similarity between the derived

disease and it’s parent.

Ask questions:

Is the derived disease considered similar to original?

…or more similar to a different disease?

Is it distinguishable beyond random?

Are there any specific factors that influence similarity?

Page 41: Semantic phenotyping for disease diagnosis and discovery

FINDING THE PHENOTYPE GRAPH IN

COMMON

The most specific phenotypic profile in common

Page 42: Semantic phenotyping for disease diagnosis and discovery

METHOD: DERIVE BY CATEGORY

REMOVAL

Remove annotations that are subclasses of a

single high-level node

Repeat for each 1° subclass

Page 43: Semantic phenotyping for disease diagnosis and discovery

Example: Schwartz-jampel Syndrome, Type I

to test influence of a single

phenotypic category

Page 44: Semantic phenotyping for disease diagnosis and discovery

Example: Schwartz-jampel Syndrome derivations

to test influence of a single

phenotypic category

Page 45: Semantic phenotyping for disease diagnosis and discovery

Example: Schwartz-jampel Syndrome derivations

Page 46: Semantic phenotyping for disease diagnosis and discovery

SEMANTIC SIMILARITY ALGORITHMS ARE ROBUST

IN THE FACE OF MISSING INFORMATION

(avg) 92% of derived diseases are most-similar to

original disease

Severity of impact follows proportion of

phenotype

Similarity of Derived Disease to Original Derived Disease Profile Rank

Page 47: Semantic phenotyping for disease diagnosis and discovery

METHOD: DERIVE BY LIFTING

Iteratively map each class to their direct

superclass(es)

Keep only leaf nodes

Page 48: Semantic phenotyping for disease diagnosis and discovery

SEMANTIC SIMILARITY ALGORITHMS ARE

SENSITIVE TO SPECIFICITY OF INFORMATION

Severity of impact increases with more-general

phenotypes

Similarity of Derived Disease to Original Derived Disease Profile Rank

Page 49: Semantic phenotyping for disease diagnosis and discovery

ANNOTATION SUFFICIENCY SCORE

http://www.phenotips.orghttp://www.monarchinitiative.org

Page 50: Semantic phenotyping for disease diagnosis and discovery

ANNOTATION SUFFICIENCY SCORE

Page 51: Semantic phenotyping for disease diagnosis and discovery

CONSIDERING TIME

Page 52: Semantic phenotyping for disease diagnosis and discovery

PATIENT 1

Lower back pain

Motor weakness

Unpleasant muscle twitching

40yrs old

Page 53: Semantic phenotyping for disease diagnosis and discovery

PATIENT 2

Unpleasant muscle twitching

65 yrs old

Stumbling

Leg weakness

Page 54: Semantic phenotyping for disease diagnosis and discovery

PATIENT 1

Diagnosis: Degenerative disc disease with L3 nerve root

radiculopathy causing muscle weakness.

More recent onset of benign fasciculation syndrome, a non-

progressive disease.

Page 55: Semantic phenotyping for disease diagnosis and discovery

PATIENT 2

Diagnosis: Amyotrophic lateral sclerosis

(Lou Gehrig disease)

Page 56: Semantic phenotyping for disease diagnosis and discovery

ADDING CHRONOLOGY TO THE

ALGORITHM

Page 57: Semantic phenotyping for disease diagnosis and discovery

ADDING EXPOSURE TO THE ALGORITHM

Patient 1

Disease/condition

Drug/chemical

Page 58: Semantic phenotyping for disease diagnosis and discovery

ADDING NEGATION TO THE ALGORITHM

Patient

Disease X

Page 59: Semantic phenotyping for disease diagnosis and discovery

CONCLUSIONS

Phenotypic data can be represented using ontologies to support improved comparisons within and across species

For known disease-gene associations comparison to human phenotype data is effective at variant prioritization.

For unknown disease-gene associations the expansion of phenotypic coverage using model organisms greatly improves variant prioritization.

Phenotype breadth is recommended to buffer lack of information, ALSO very specific phenotypes are necessary to ensure quality matches

Page 60: Semantic phenotyping for disease diagnosis and discovery

FUTURE WORK

Add additional variables to semantic similarity algorithm – e.g. negation, environment, chronology

Validate existing animal models for recapitulation of disease

Further characterization of organism-specific phenotypes

Adding many more non-model organisms to the analysis

Page 61: Semantic phenotyping for disease diagnosis and discovery

ACKNOWLEDGMENTS

NIH-UDPWilliam Bone

Murat Sincan

David Adams

Amanda Links

Joie Davis

Neal Boerkoel

Cyndi Tifft

Bill Gahl

OHSUNicole Vasilesky

Matt Brush

Bryan Laraway

Shahim Essaid

Kent Shefchek

GarvanTudor Groza

Lawrence BerkeleyNicole Washington

Suzanna Lewis

Chris Mungall

UCSDJeff Grethe

Chris Condit

Anita Bandrowski

Maryann Martone

U of PittChuck Boromeo

Vincent Agresti

Becky Boes

Harry Hochheiser

SangerAnika Oehlrich

Jules Jacobson

Damian Smedley

TorontoMarta Girdea

Sergiu Dumitriu

Heather Trang

Bailey Gallinger

Orion Buske

Mike Brudno

JAXCynthia Smith

CharitéSebastian Kohler

Sandra Doelken

Sebastian Bauer

Peter RobinsonFunding:

NIH Office of Director: 1R24OD011883

NIH-UDP: HHSN268201300036C, HHSN268201400093P