Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based...

62
A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY PH.D. DISSERTATION DEFENSE DELROY CAMERON AUGUST 18, 2014 PH.D. COMMITTEE AMIT P. SHETH (ADVISOR) KRISHNAPRASAD THIRUNARAYAN MICHAEL RAYMER RAMAKANTH KAVULURU (UKY) THOMAS C. RINDFLESCH (NIH) VARUN BHAGWAN (YAHOO! All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)

description

Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ... While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. .. This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research. Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer, Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs) Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/) D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation) D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013 D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%) D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010

Transcript of Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based...

Page 1: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY

PH.D. DISSERTATION DEFENSEDELROY CAMERONAUGUST 18, 2014

PH.D. COMMITTEEAMIT P. SHETH (ADVISOR)KRISHNAPRASAD THIRUNARAYANMICHAEL RAYMERRAMAKANTH KAVULURU (UKY)THOMAS C. RINDFLESCH (NIH)VARUN BHAGWAN (YAHOO! LABS)

All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)

Page 2: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

2

Historical Perspectives

Walter Sutton(1877 – 1916)

Theodor Boveri(1862 – 1915)

Gregor Johann Mendel(1822 – 1884)

Mendelian Laws of Inheritance(1866)

Boveri-Sutton Chromosome Theory(1903)

Page 3: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

3

Science of Making Discoveries

Discovery

Information ProcessingSystem

×What is promising?

Page 4: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

4

Thesis Statement

An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.

Page 5: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

5

Motivation

Rofecoxib Osteoarthritis1999 TREAT

Merck & Co.

Increased risk of Heart Attack

2002

2004

$254.3 millionSettlement

2005

VioxxWithdrawn

$4.85 billionSettlement

Confirmed byClinical Trial

2007 2011

$950 millionSettlement

2013

$23 millionSettlement

Page 6: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

6

Motivation

Literature-Based Discovery (LBD)

Page 7: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

7

Literature-Based Discovery (LBD)

ABC Model

AnC Model

Context-Driven Subgraph Model

A CB

A CB

1

B2

Bi

Source: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson

Keyword-basedConcept-based

Relations-based

2006 20111986 1996

ARROWSMITH v1Term Frequency

1999

IRIDESCENTTerm Co-occurrence

2001

DADMetaMAP

UMLS

2003

LitlinkerMeSH, UMLS, Rules

Level of Support

Contribution #1Context-Driven

Subgraph Model for LBD

SemBTSemantic Predications

Level of Support

Discovery Browsing Degree Centrality

Cooperative Reciprocity

Manual

2013

ManjalUMLS, MeSH

Topic Profiles, TF-IDF

2004

RajolinkMeSH, Rarity

BioSbKDSUMLS Relations

MeSH

2005

BITOLAUMLS, MeSHAssoc. Rules,

Confidence

Graph-based

ACS (2004)MeSH,

Hebbian Learning

A CBCAUSESINHIBITS

CAUSESA CDISRUPTS

PRODUCES

INHIBITS

STIM

ULATE

S

PRODUCES

INHIB

TS

ISA

TREATS

Discovery Patterns

Hybrid

ARROWSMITH v28 Features (2007)

Semantic MEDLINESummarization

Discovery Browsing

EpiphanetPredications-based Semantic Indexing

CoPubKeywords, Mutual

Information

2010

Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”).

Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery

Page 8: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

8

Application: Raynaud Syndrome – Fish Oil

ISA

Prostaglandin I3

CONVERTS_TO

Dietary Fish Oils

Platelet Aggregation

DISRUPTS

ISA

DISRUPTS

DISRUPTS

Epoprostenol

DISRUPTS

ISA

STIMULATES

Prostaglandin

CONVERTS_TO

Raynaud Syndrome

TREATS

CAUSES

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

Dietary Fish Oils

Platelet Aggregation

Raynaud Syndrome

DISRUPTS CAUSESDietary Fish Oils

Platelet Aggregation

Raynaud Syndrome

Keyword/Conceptbased

Relationsbased

Subgraphbased

Inferred predicates

Page 9: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

9

Comparison

Scenario Intermediate Cameron [19] Srinivasan [88, 89]

Weeber [101, 102]

Gordon [36,37,38]

Hristovski [40]

Raynaud Syndrome – Dietary Fish

Oils

Blood Viscosity × × × × ×

Platelet Aggregation × × × × ×

Vascular Reactivity × × × ×

Ramakrishnan [72]*

?

?

?

Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil

Page 10: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

DISRUPTS

ISA

ISA

Dietary Fish Oils

Platelet Aggregation

DISRUPTS Raynaud SyndromeCAUSES

Prostaglandins

CONVERTS_TO

Prostacyclin (PGI2)

DISRUPTSProstaglandin I3

(PGI3) TREATSSTIMULATES

Raynaud Syndrome

Dietary Fish Oils

Fatty Acid

Essential Fatty Acid

Triglyceride

Lipid

ISA

DISRUPTS CAUSES

ISAINHIBIT

AFFECTS

ISA

INHIBITS

Blood Viscosity

Cellular Activity

Blood Physiology

Problem

How to automate this?

TissueFunction

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis usingSemantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

DISRUPTS

ISA

Dietary Fish Oils

Prostaglandin I3 (PGI3)

Prostacyclin (PGI2)

Raynaud SyndromeCAUSESVasoconstrictionINHIBIT

CONVERTS_TO

AFFECTS DISRUPTSTREATS

Page 11: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

Discovery

Context-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations & Future

Work

Page 12: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

PREDICATIONS GRAPH

Page 13: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

13

. . .

Subgraph Model

Predications Graph (G)

CandidateGraph (RG)

Subgraphs (SG)

No two contexts are the same

R(s,t)(c1) R(s,t)(c2) R(s,t)(ck)

R(s,t)

. . .

. . .

What is context?

Page 14: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

Page 15: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

15

• Path Relatedness• Semantic Predication Context

Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it.

Semantic Underpinnings

Relational Semantic Summary

Textual Semantic Summary

Concept-LevelSemantic Summary

Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.

Page 16: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

16

Linguistic Underpinnings

Linguistic items with similar distributions have similar meanings

“You shall know a word by the company it keeps”

– J. R. Firth 1957

Semantic Predications with shared contexts in their distributions are related

Distributional Semantics

Context-sensitive nature of meaning

Page 17: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

Page 18: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

18MeSH Hierarchy

MeSH Hierarchy

Automatic Subgraph Creation

m1 m2

m7 m8

m1 m7 m2 m8

m1 m5 m9 m8

Semantic Relatednessof MeSH Context Vectorsm9m1

m5 m8

Contribution #2 Context of a path

as a vector of MeSH Descriptors

pi

pj

Page 19: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

19

Path Relatedness

3 32

5 42

2

53 6

Objective #1: Maximize weights of In-Context Descriptors

Objective #2: Minimize weights of Out-Of-Context Descriptors

C(pi)

C(pj) 1 3 1 2

2

3 00 00 02 0 0 03 22

5 42 53 61 3 1 20 00

p – patht – semantic predication

m1 m2 m3 m4 m5

m1 m2 m6 m7 m8 m9 m10 m11 m12 m13

m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5

C(pi)

C(pj)

Page 20: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

20

Path Relatedness: Shared Context

1 00 00 01 0 0 01 11

1 11 11 11 1 1 10 00

Platelet aggregation

Plateletactivation

EpoprostenolPlatelet

adhesivenessProstaglandinsm3 m4 m5 m9 m10 m11 m12 m13

G-Tree

platelet aggregation

hemostasis

Blood physiological

process

Blood physiological phenomena

Circulatory and respiratory physiological phenomena

platelet adhesiveness

platelet activation Epoprostenol

D-Tree

Prostaglandins I

Arachidonic Acids

Fatty Acids, Unsaturated

Fatty Acids

Lipids

Prostaglandins

Eicosanoids

Contribution #3 Structured Background Knowledge

for computing shared context of paths

C(pi)

C(pj)

Page 21: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

21

Path Relatedness Score

*Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006

Page 22: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

22

Hierarchical Agglomerative Clustering

A C A CA CA C A CA CA C A C

Iteration 1

Iteration n

. . .Bucket PopulationBucket Merging. . .

A C

A C

A C

A C

Path Relatedness Threshold

1. Bucket Population

2. Bucket Merging

3. Subgraph Ranking

Page 23: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

23

Summary of Metrics

• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Threshold: ??

• MeSH Semantic Similarity– Model: MeSH Hierarchy– Metrics: Dice Similarity– Threshold: Manually

Page 24: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

24

Automatic Threshold Selection

RS-DFO Experiment

Manual Threshold = 3.0

Gaussian Distribution

Path Relatedness Score

Num

ber

of P

ath

Pai

rs

Page 25: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

25

Automatic Threshold Selection

Gaussian Function

Path Relatedness Score

Exp

ecte

d V

alue

Page 26: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

26

Automatic Threshold Selection

• Gaussian Distribution

Diagram courtesy of Wikipedia*

Points of Inflection

Page 27: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

27

Threshold Comparisons

ScenarioPath Relatedness Score

Max2 Std Dev. Manual 3 Std Dev.

RS-DFO 2.68 3.0 3.04 3.38

Testosterone-Sleep 3.35 3.5 3.8262 6.22

DEHP-Sepsis 3.94 4.0 4.53 4.84

Table 2: Path Relatedness Threshold Comparisons

Page 28: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

28

Bucket Merging

Ba

Bb

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482

Straggly Clusters Compact Clusters

Broad Clusters

Page 29: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

29

Subgraph Ranking

Intra-Cluster Rank

Page 30: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

30

Singleton Ranking

Association Rarity

Page 31: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

31

Summary of Metrics

• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Manual Threshold for Semantic Similarity, Dice Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian

• Bucket Relatedness– Model: Set of Paths– Metric: Inter-Cluster Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian

• Subgraph Ranking– Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)

Page 32: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

32

Algorithm

Time Complexity: Θ(N 2logN )

Page 33: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

Page 34: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

34

Raynaud Syndrome – Dietary Fish Oil

Inferred predicates

Path Relatedness Threshold = 3σ

Page 35: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 1: Raynaud Syndrome – Dietary Fish Oil

Details Intermediate Association Status

Cut-off date: Nov. 1985By. D. R. Swanson(Article)

Blood ViscosityDietary Fish Oils INHIBITS Blood

ViscosityBlood Viscosity CAUSES Raynaud

SyndromeZR-15

Platelet AggregationDietary Fish Oils INHIBITS Platelet

AggregationPlatelet Aggregation CAUSES Raynaud

SyndromeS1

VasoconstrictionDietary Fish Oils INHIBITS

VasoconstrictionVasoconstriction CAUSES Raynaud

Syndrome

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 36: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 2: Magnesium – Migraine

Details Intermediate Association Status

Cut-off date: Apr. 1987By. D. R. Swanson(Article)

Calcium Channel BlockersMagnesium ISA Calcium Channel

BlockerCalcium Channel Blockers TREATS

MigraineS22

Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9

Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine

Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3

Platelet ActivityMagnesium INHIBITS Platelet

AggregationPlatelet Aggregation CAUSES Migraine S1

ProstaglandinsMagnesium STIMULATES

ProstaglandinsProstaglandins DISRUPTS Migraine S4

Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine

Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1

Cortical DepressionMagnesium INHIBITS Spreading

Cortical DepressionSpreading Cortical Depression CAUSES

Migraine

Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine

Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 37: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 3: Somatomedin C – Arginine

Details Intermediate Association Status

Cut-off date: Apr. 1989By. D. R. Swanson(Article)

Growth HormoneArginine STIMULATES Growth

HormoneGrowth Hormone STIMULATES

Somatomedins (IGF1)S5

Body Weight (body mass)Somatomedins (IGF1) STIMULATES

GrowthArginine STIMULATES Growth S7

Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7

Wound Healing (NK activity)

Somatomedins STIMULATES Wound Healing

Arginine STIMULATES Wound Healing

Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Legend

ZR-zero rarity singleton

S-Subgraph

Not Found

Page 38: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 4: Indomethacin – Alzheimer’s Disease

Details Intermediate Association Status

Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)

Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4

Lipid PeroxidationIndomethacin INHIBITS Lipid

PeroxidationLipid Peroxidation CAUSES Alzheimers S2

M2-MuscarinicIndomethacin INHIBITS M2-

MuscarinicM2-Muscarinic CAUSES Alzheimers

Membrane FluidityIndomethacin INHIBITS Membrane

Fluidity Membrane Fluidity CAUSES Alzheimers

LymphocytesIndomethacin STIMULATES Natural

Killer T-Cell ActivityT-Cell Activity INHIBITS Alzheimers S14

ThyrotropinIndomethacin STIMULATES

ThyrotropinThyrotropin AFFECTS Alzheimers ZR-20

T-lymphocytes (T-Cells)Indomethacin STIMULATES T-

lymphocytesT-lymphocyte Activity INHIBITS

AlzheimersS3

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 39: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 5: Estrogen – Alzheimer’s Disease

Details Intermediate Association Status

Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)

Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4

Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3

Calbindin D28kEstrogen REGULATES Caldindin

D28kCalbindin D28k AFFECTS Alzheimers S4

Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers

Cytochrome C Oxidase Subunit III

Estrogen STIMULATES Cytochrome C Oxidase Subunit III

Cytochrome C Oxidase Subunit IIIAFFECTS Alzheimers

Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers

Receptor PolymorphismEstrogen EXHIBITS Receptor

PolymorphismReceptor Polymorphism AFFECTS

Alzheimers

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 40: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 6: Calcium Independent PLA2 – Schizophrenia

Details Intermediate Association StatusCut-off date: 1997By. Swanson/Smalheiser(Article)

Oxidative StressOxidative Stress INHIBITS Calcium-

Independent PLA2Oxidative Stress CAUSES Schizophrenia ZR-2

SeleniumSelenium INHIBITS Calcium-

Independent PLA2Selenium PREVENTS Schizophrenia ZR-2

Vitamin EVitamin E INHIBITS Calcium-

Independent PLA2Vitamin E PREVENTS Schizophrenia ZR-2

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 41: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 7: Chlorpromazine – Cardiac Hypertrophy

Details Intermediate Association StatusCut-off date: 01/01/2002By. J. D. Wren(Article)

Calcineurin Chlorpromazine INHIBITS CalcineurinCalcineurin CAUSES Cardiac

HypertrophyS5

IsoproterenolChlorpromazine INHIBITS

IsoproterenolIsoproterenol CAUSES Cardiamegaly S12

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 42: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 8: Testosterone – Sleep

Details Intermediate Association StatusCut-off date: 01/01/2012By. Miller/Rindflesch(Article)

Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 43: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis

Details Intermediate Association StatusCut-off date: 01/01/2013By. Cairelli/Rindflesch(Article)

PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Page 44: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

44

Statistical Evaluation

Association Rarity Interestingness

Page 45: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

45

Statistical Evaluation

Experiment # Unique Associations

Total MEDLINE

Frequency

Rarity r(E)

Interestingness I(E)

Raynaud-Fish Oil 10 0 0.00 1.00

Magnesium-Migraine 48 27 0.56 0.64

SomaC-Arginine 18 306 17.00 0.06

Indomethacin-Alzheimers

21 9 0.43 0.70

Estrogen-Alzheimers 42 36 0.86 0.54

PLA2-Schizophrenia 10 0 0.00 1.00

CPZ-Cardiac Hypertrophy

21 2 0.10 0.91

Testosterone-Sleep 61 654 10.72 0.09

Average 29 129 3.71 0.62

Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries

Page 46: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

Page 47: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

47

Predications-based Knowledge Exploration

Corpus

Predications Graph

Definitional Knowledge (UMLS + MeSH)

Provenance

Knowledge Abstraction

D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011.

Contribution #4 Combining Assertional and

Definitional Knowledgefor Knowledge Exploration

Page 48: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

48

Levels of Contexts

A CBPredication

Context

A CB

1

B2

Bi

PathContext

A CB

1

B2

B3

A CB

1

B2Shared

Context

CAUSESA CDISRUPTS

PRODUCES

INHIBITS

STIM

ULATE

S

PRODUCES

INHIB

TS

ISA

TREATSSubgraphContext

… A C

A C

A C

Dimensions

Page 49: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

Page 50: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

50

Dissertation Contributions

1. Context-Driven Subgraph Model– Knowledge Rediscovery & Decomposition

2. Predication/Path Context– Vector of MeSH Descriptors

3. Shared Context– Background Knowledge (MeSH Hierarchy)

4. Semantic Predications-based Text Exploration– Obvio Web Application

Page 51: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

51

Innovation

System/TechniqueTechnique

TypeAutomatic Relational

Evidence-based

Thematic

Results

#Discoveries #Rediscoveries

IRIDESCENT [108] Keyword 1 0

ARROWSMITH [84]Keyword/Concept

5 0

DAD [101,102] Concept 0 2

BITOLA [46] Concept 0 1

Litlinker [110] Concept 0 2

Manjal [87,88] Concept × 0 5

SemBT [40,41,42] Relations × × 0 1

BioSbKDS [47] Relations × × 0 1

Wilkowski [107] Graph × × 0 0

Ramakrishnan [72] Graph × × 0 1*

Zhang [114] Graph × × × 0 0

Obvio [19, 21] Graph × × × × 0 8

ARROWSMITH v2 [86,98] Hybrid × 0 6*

Semantic MEDLINE [18,63] Hybrid × × 2 0

Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery

Table 4: Comparison of capabilities and accomplishments of LBD techniques

Page 52: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

Page 53: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

53

Limitations

1. Manual Threshold– MeSH Semantic Similarity

2. Path Relatedness Threshold– Only Approximate Gaussian

3. Definition of Context

Page 54: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

54

Levels of Semantic Representation

Keywords

Concepts

MeSH Descriptors

Semantic Predications

Ensemble of Features

Relationships

A B

Semantic PredicationPREDICATE

Page 55: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

55

Limitations

1. Manual Threshold– MeSH Semantic Similarity

2. Path Relatedness Threshold– Only Approximate Gaussian

3. Definition of Context

4. MEDLINE Querying– Deep integration of Assertional/Definitional

5. Contradiction Detection

6. Statistical Evaluation

7. Scalability of Clustering Algorithm

8. Subgraph Labeling

Page 56: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

56

Take Away

• Future of Information Processing– Rich Knowledge Representations

o Implicit, Formal, Powerful semantics

– Application to Literature-Based Discovery

Page 57: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

57

Conclusion

• Context-Driven Subgraph Model – Manually create Complex Associations– Automatic Subgraph Creation

o Novel definitions for Context and Shared Contexto Multiple Thematic Dimensions

– Predications-based Knowledge Exploration o Predicateso Highlighted MEDLINE sentences

– Knowledge Rediscoveryo 8 out of 9 existing scientific discoveries

Page 58: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

58

Publications

1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation)

2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics)

3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013.

5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013.

6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012.

7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012.

8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011.

9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010.

10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010.

11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On-Demand Model Creation. Web Science Conference (WebSci10), 2010.

12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.

Page 59: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

59

Research Expertise

Literature-Based Discovery

Text MiningQuestion Answering

[1]

InformationRetrieval

[2]

[3]

[6]

[4]

[8]

[10]

[5]

[7]

Page 60: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

60

Parting Words

“...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.”

– H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay).

H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999

Page 61: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

61

Acknowledgements

• Olivier Bodenreider• Marcelo Fiszman• Mike Cairelli• Swapna Abhyankar• Drashti Dave• Dongwook Shin

• Special Thankso Pavano Shreyansho Swapnilo Nishita

• PREDOSE Teamo Nishitao Gaurisho Alano Revathy

Page 62: Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

62

Ph.D. Committee Members

Amit P. Sheth (Advisor)

T.K. Prasad Michael Raymer

Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan