What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data
description
Transcript of What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data
An Ontological Characterization for theIntegration of Genetic Variation Data
WHAT’S IN A GENOTYPE?
Matthew H. Brush, Chris Mungall, Nicole Washington, and Melissa HaendelOregon Health and Science University, Lawrence Berkeley Labs
International Conference in Biomedical OntologyJuly 8, 2013
Genotype-to-Phenotype Research
B6.Cg-Alms1foz/fox/J
increased weight,adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,diabetes mellitus, insulin resistance
increased food intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
G2P research seeks a mechanistic understanding of how genetic variation is linked to organismal biology and disease
Integrating G2P Data
Integrating G2P Data
The Monarch InitiativeThe Monarch Initiative aims to bring G2P and related data together under a common semantic framework to support
integrated exploration and analysis.
Integration Challenges
I. Reconciling G2P data annotated to different ‘levels’ of a genotype II. Integrating ‘non-genomic’ forms of variationIII. Creating semantic links to biological data
Technical Challenges Terminological, syntactic, organizational variation in data is common
Knowledge-Based Challenges Reflect inherent complexity in the way G2P data is
generated and what it represents
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
Decomposition of a Genotype
genotype genomic variationcomplementgenomic background
= + CGTAGC
CGTACC
apchu745/+; fgf8ati282/ti282(AB)
genomic variationcomplement
variant single locuscomplement
variant locus(allele)
sequence alteration
has_part has_part
apchu745/+
apchu745
hu745
has_part has_part
has_part has_part
XAACGTACCGACGCTCGCTACGGGCGTATC
(AB) apchu745/+; fgf8ati282/ti282
apchu745/+; fgf8ati282/ti282
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
AACGTAGCGACGCTCGCTACGGGCGTATC
AACGTACCGACGCTCGCTACGGGCGTATC X
ACAC
X
X
X
X
Genotype – an information entity that specifies an entire genome sequence in terms of its variation from some reference genome
AACGTAGCGACGCTCGCTACGGGCGTATC
X ACAC
X
X
X
XX
I. Reconciling Levels of G2P Association
apchu745/+; fgf8ati282/ti282(AB)
increased cell proliferationdisrupted digestive tract development
gut deformation
APC (NM_000038.5)c.937_938delGA
X
Phenotype AllelePhenotype Genome CGTACCG
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
XAACGTACCGACGCTCGCTACGGGCGTATC
AACGTAGCGACGCTCGCTACGGGCGTATC
XX
intestinal polypsabnormal retinal pigmentation
sebaceous cysts
allele: apchu745
gene: apc fgf8a allele: c.937_938delGAgene: apc
(PHENOTYPEPROPAGATION)
I. Reconciling Levels of G2P Association
inferred
apchu745/+; fgf8ati282/ti282(AB)
increased cell proliferationdisrupted digestive tract development
gut deformation
APC (NM_000038.5)c.937_938delGA
X
Phenotype Genome CGTACCG
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
XAACGTACCGACGCTCGCTACGGGCGTATC
AACGTAGCGACGCTCGCTACGGGCGTATC
XX
intestinal polypsabnormal retinal pigmentation
sebaceous cysts
Phenotype Allele
Property chains exploit the transitive genotype partonomy to infer phenotype associations
[variant] is_variant_part_of genotype
genotype has_phenotype phenotype
Atomic Relations
Composed Relation
is_variant_part_of o has_phenotype -->
is_variant_with_phenotype
Implementation of Phenotype Propagation
Example of Phenotype Propagation has_phenotype
apchu745/+;fgf8ati282/ti282(AB)cell proliferation,
digestive tract developmentgut deformation
1. Monarch ingests phenotypes annotated to a genotype
genotype
Example of Phenotype Propagation
apchu745,fgf8ati282
hu745ti282
has_variant_part
has_variant_part
has_variant_part
has_variant_part
apchu745/+;fgf8ati282/ti282(AB)
apchu745/+;fgf8ati282/ti282
apchu745/+ ,fgf8ati282/ti282
cell proliferation,digestive tract development
gut deformation
apc fgf8a
1. Monarch ingests phenotypes annotated to a genotype
2. Genotype is parsed to create instances down partonomy Alleles
GVC
VSLCs
Seq.Alts
Genes
has_phenotype
Example of Phenotype Propagation
1. Monarch ingests phenotypes annotated to a genotype
2. Genotype is parsed to create instances down partonomy
3. Phenotype propagation infers associations between phenotypes and each level in the partonomy
apchu745,fgf8ati282
hu745ti282
apc fgf8a
has_variant_part
has_variant_part
has_variant_part
has_variant_part
apchu745/+;fgf8ati282/ti282(AB)
apchu745/+;fgf8ati282/ti282
apchu745/+ ,fgf8ati282/ti282
cell proliferation,digestive tract development
gut deformation
Alleles
GVC
VSLCs
Seq.Alts
Genes
has_phenotype
is_variant_with_
phenotype
II. Integrating Non-Genomic Variation‘Extrinsic genotypes’ describe sequences subject to transient variations in expression at the
time of an experiment
Representing extrinsic variation data in terms of the
targeted genes facilitates integration with ‘intrinsic’ G2P
data
Morpholino-mediated gene knockdown
;
III. Semantic Links to Related Data
GENO In the OBO Foundry• GENO modeled according to OBO Foundry principles, under
conceptual frameworks of the BFO, IAO, and SO
• Collaborators in SO refactoring to enhance genetic variation representation, and ensure integration of Monarch data with SO-annotated genomes
Summary and Future DirectionsGENO in the Monarch Data Integration Pipeline
1. Raw data ingested into Monarch RDB2. Views generated that contain “GENO-enhanced” data
(standardized syntax, unpacked genotypes, links to external data) 3. D2RQ maps relational data to GENO and generates RDF4. GENO-supported reasoning adds inferred G2P associations (e.g.
phenotype propagation)
Future Directions1. Modeling of transgenes, human variation, and related data types2. Develop property chains and algorithms to improve specificity and
weighting of inferred G2P associations3. Separate application features to provide a community model for
public release and integration with SO
Acknowledgements
OHSUMelissa Haendel
Carlo TorniaiShahim Essaid
Nicole VasilevskyScott Hoffman
LBNLChris Mungall
Suzi LewisNicole Washington
UCSD/NIFMaryann MartoneAnita Bandrowski
Jeff GretheAmarnath Gupta
Trish Whetzel
University of PittsburghHarry HochheiserChuck Borromeo
Monarch Initiative / NIF
Sequence OntologyUniversity of Utah
Karen EilbeckUniversity of Colorado
Mike Bada
Funding NIH # 1R24OD011883-01
We are under construction
OHSU OntologyDevelopment Group
www.ohsu.edu/library/ontologyGENO ontology
purl.obolibrary.org/obo/geno.owl