Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking
description
Transcript of Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking
Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking
One of the principal goals of biomedical research is to elucidate the complex network of gene interactions underlying common human diseases. Although integrative genomics based approaches have been shown to be successful in understanding the underlying pathways and biological processes in normal and disease states, most of the current biomedical knowledge is spread across different databases in different formats. Semantic Web principals, standards and technologies provide an ideal platform to integrate such heterogeneous information and bring forth implicit relations hitherto embedded in these large integrated biomedical and genomic datasets. Semantic Web query languages such as SPARQL can be effectively used to mine the biological entities underlying complex diseases through richer and complex queries on this integrated data. However, the end results are frequently large and unmanageable. Thus, there is a great need to develop techniques to rank resources on the Semantic Web which can later be used to retrieve and rank the results and prevent the information overload. Such ranking can be used to prioritize the discovered disease–gene, disease–pathway or disease–processes novel relationships. We implemented an existing semantic web based knowledge mining technique which not only discovers underlying genes, processes and pathways of diseases but also determines the importance of the resources to rank the results of a search while determining the semantic associations.
Data Integration- RDF MODELData Integration- RDF MODEL
Ranga Chandra Gudivada1,2, Xiaoyan A. Qu 1,2, Anil G Jegga2,3,4, Eric K. Neumann5 , Bruce J Aronow1,2,3,4
Departments of Biomedical Engineering1 and Pediatrics2, University of Cincinnati, Center for Computational Medicine3 and Division of Biomedical Informatics4,Cincinnati Children’s Hospital Medical Center, Cincinnati OH-45229, USA and Teranode Corporation5, Seattle, WA 98104
Case Study-Prioritizing Modifier Genes, Pathways and Biological Processes for Case Study-Prioritizing Modifier Genes, Pathways and Biological Processes for CARDIOMYOPATHY, DILATEDCase Study-Prioritizing Modifier Genes, Pathways and Biological Processes for Case Study-Prioritizing Modifier Genes, Pathways and Biological Processes for CARDIOMYOPATHY, DILATEDAbstractAbstractAbstractAbstract
Computational ProblemComputational ProblemComputational ProblemComputational Problem
Data integration: biological feature complexity is deep, heterogeneous, and extensive.
Data complexity poses a formidable challenge to efforts to integrate, formally model, and simulate biological systems behaviors
Likelihood Ranking requires mining and prioritization of entities and events that function in the context of biological networks
Biological ProblemBiological ProblemBiological ProblemBiological Problem
Disease genes discovered to date likely represent the easy ones. Discovering the genetic basis of remaining Mendelian and complex gene-X-gene-X-environment disorders will be challenging and require consideration of many more features and causal relationships
No gene operates in vacuum, all gene, protein, pathway interactions can lead to Modifier Gene effects
Identifying modifier genes, i.e. gene networks underlying diseases is challenging (pathways, biological processes and functions)
Benefits of Semantic WebBenefits of Semantic WebBenefits of Semantic WebBenefits of Semantic Web
Semantic Web standards such as Resource Description Framework (RDF) & Ontology Web Language (OWL) facilitate semantic integration of heterogeneous multi-source data
SPARQL, a semantic web query language , capable of making queries of higher order relationships in multi dimensional data can be used to mine Bio-RDF graphs
Prioritization of biological entities on semantic web can be accomplished by extending[2] and applying existing graph algorithms, such as Kleinberg Aglorithm[1]
Cell.ComponentGO ID
DiseaseCUI
GeneSymbol
Mol.FunctionGO ID
PathwayId
Biol.ProcessGO ID
Biol.ProcessDescription
Anatomy CUIDisease
Name
Anatomy Name
Mol.FunctionDescription
PathwayDescription
Cell.ComponentDescription
rdfs
:lab
el
rdfs:label
rdfs
:lab
el
rdfs:la
bel
rdfs:label
rdfs:label
inBiological
Process
inMolecula
rFunction
occursIn
Pathway
hasAssociatedGene
ha
sA
ss
oc
iate
dA
na
tom
y
hasAssociated
Disease
Mouse PhenotypeID
Mouse PhenotypeDescription
hasMouse
PhenoType
rdfs
:labe
l
Ranking on Semantic WebRanking on Semantic Web
BIND
REACTOME
Nature Pathway Interaction database
KleinBerg Algorithm (1)
Hig
h A
uth
oritative sco
re
Au
tho
ritative no
de
Pointed by good hubs its authoritative score increasesH
igh
Hu
b s
core
Hu
b N
od
es
Points to many authoritative sites, increases the hub scores
Extending ‘KleinBerg Algorithm’(2) for Semantic Web
gene Pathway
associatedPathway
Objectivity weight
SubjectivityWeight
Subjectivity weight > objectivity weight
A single gene participating in multiple biological pathways is considered more sensitive to perturbation than a single pathway having a large number of nodes (Different weights for non - symmetric properties); corollary :
geneA geneB
interacts
Objectivity weight
SubjectivityWeight
Subjectivity weight = objectivity weight
GeneA interacting with various genes has
equal significance as GeneB interacting with
various genes (Equal weights for symmetric
properties)
CARDIOMYOPATHY,
DILATED,
X-LINKED
Primary Genes
(1)
DMD
Pathways
(1)
Interacting
Partners
(16)
Biological Processes
(4)
Primary genes
+
Interacting Partners
(1+16)
Pathways
(28)
Biological Processes
(27)
Biological Process
GO_0006936 muscle contraction
GO_0007016 cytoskeletal anchoring
GO_0043043 peptide biosynthesis
GO_0007517 muscle development
h_agrPathwayAgrin in Postsynaptic Differentiation
Pathways
QUERY RESULTWITH
PRIORITIZATION
Step1
Step2
Modifier Genes (16)
1 h_agrPathway Agrin in Postsynaptic Differentiation 1.1349842422 h_hsp27Pathway Stress Induction of HSP Regulation 0.1398879183 h_actinYPathway Y branching of actin filaments 0.0939089763 h_no1Pathway Actions of Nitric Oxide in the Heart 0.0939089763 h_nfatPathway NFAT and Hypertrophy of the heart (Transcription in the broken heart)0.0939089763 h_metPathway Signaling of Hepatocyte Growth Factor Receptor0.0939089763 h_salmonellaPathwayHow does salmonella hijack a cell 0.0939089763 h_mCalpainPathway mCalpain and friends in Cell motility 0.0939089763 h_PDZsPathway Synaptic Proteins at the Synaptic Junction 0.0939089763 h_rabPathway Rab GTPases Mark Targets In The Endocytotic Machinery0.093908976
Pathways (28)
1 GO_0006936 muscle contraction 1.53858592 GO_0007517 muscle development 0.35627623 GO_0007165 signal transduction 0.11394034 GO_0048741 skeletal muscle fiber development 0.11029094 GO_0030240 muscle thin filament assembly 0.11029094 GO_0043043 peptide biosynthesis 0.10279024 GO_0007016 cytoskeletal anchoring 0.1027902
Biological Processes (27)
OMIM
Mammalian Phenotype
Others
Disease
Entrez Gene
SwissProt
Gene Ontology
others
Gene / Protein
Annotations
BIOCARTA
KEGG
BIOCYC
Pathways
Molecular
Interactions
PREFIX CCHMC:<http://www.cchmc.com/test.owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?pathway
where {
?pathway rdf:type CCHMC:Pathway .
?resource ?PROPERTY ?pathway .
}
SPARQL QUERY
1.Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (Sep. 1999)
2 Bhuvan Bamba, Sougata Mukherjea: Utilizing Resource Importance for Ranking Semantic
Web Query Results. SWDB 2004: 185-198
ConclusionConclusionConclusionConclusion
We have shown that related yet heterogeneous information can be integrated using RDF-OWL and that this approach can support mechanistic analyses of diseases. Specifically, we have uncovered additional genes and pathways that could play a role in the onset and treatment of Cardiomyopathy. We intend to expand our analyses into additional modalities such as anatomy, cellular type, and symptoms/ phenotypes.