Post on 12-Jan-2016
Sharing Microarray Experiment Knowledge
Chips to Hits Oct. 28, 2002
Chris Stoeckert, Ph.D.
Dept. of Genetics & Center for Bioinformatics
University of Pennsylvania
Nature, October 3, 2002
http://plasmodb.org/David Roos, Jessie Kissinger, Bindu Gajria, Martin Fraunholz, Jules Milgram, Phil
Labo, Amit Bahl, Dave Pearson, Dinesh Gupta, Hagai GinsburgJonathan Crabtree, Jonathan Schug, Brian Brunk, Greg Grant, Trish Whetzel, Matt
Mailman, Li Li
Desirable Microarray Queries
• Return all experiments using developmental stage X.– Sort by platform type– Which are untreated? Treated?
• Treated by what
• How comparable are these?
• What can these experiments tell me?
Microarray Information to be Shared
Figure from:David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14
The Computational View of Microarray Information
Need an ontology to unambiguously represent this information.
What is an Ontology?• In philosophy, an ontology is a systematic account of
Existence.• In AI, an ontology is a systematic account of what can
be represented.• The knowledge of a domain is represented in a
declarative formalism.– Classes, relations, functions, or other objects are defined
with human-readable text describing what the names mean, and formal axioms that constrain the interpretation.
• A common ontology defines the vocabulary with which queries and assertions are exchanged.
Excerpted and adapted from: http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
An Experimental Ontology
• An ontology for microarray experiments– Not an ontology of life but of experiments – Parts are applicable to describing experiments in
general
• Our approach to interfacing with other ontologies is “experimental”– Not mapping terms from related ontologies– Provide a framework to hang other ontologies off of
• Know where to find different types of annotation• How to interpret that annotation
http://www.mged.org
Relationship of MGED Efforts
MAGEMIAMEDB
MIAMEDBExternal
Ontologies/CVs
MGED Ontology
Software and database developers
Investigators annotating experiments
The MGED Ontology Home Page
http://www.cbil.upenn.edu/Ontology
The MGED Ontology Home Page
http://mged.sourceforge.net/ontologies/
The MGED Ontology Provides a Listing of Resources for Many Species
The MGED Ontology Organizes the Resources According to Concepts
The MGED Ontology is Structured in DAML+OIL using OILed 3.4
MGED Ontology: BiomaterialDescription: BiosourceProperty: Age
MGED Ontology: BiosourceOntologyEntry: DiseaseState
External References ©-BioMaterialDescription
©-Biosource Property
©-Organism
©-Age
©-DevelopmentStage
©-Sex
©-StrainOrLine
©-BiosourceProvider
©-OrganismPart
©-BioMaterialManipulation
©-EnvironmentalHistory
©-CultureCondition
©-Temperature
©-Humidity
©-Light
©-PathogenTests
©-Water
©-Nutrients
©-Treatment
©-CompoundBasedTreatment
(Compound)
(Treatment_application)
(Measurement)
MGED Ontology Instances
NCBI TaxonomyNCBI Taxonomy
Mouse Anatomical DictionaryMouse Anatomical Dictionary
International Committee on Standardized Genetic Nomenclature for Mice
International Committee on Standardized Genetic Nomenclature for Mice
Mouse Anatomical DictionaryMouse Anatomical Dictionary
ChemIDplusChemIDplus
Mus musculus musculus id: 39442
7 weeks after birth
Stage 28
Female
C57BL/6N
Charles River, Japan
Liver
22 2C
55 5%
12 hours light/dark cycle
Specified pathogen free conditions
ad libitum
MF, Oriental Yeast, Tokyo, Japan
Fenofibrate, CAS 49562-28-9
in vivo, oral gavage
100mg/kg body weight
An example of microarray sample annotation using the MGED ontology Susanna A. Sansone, Helen Parkinson, Philippe Rocca-Serra,
Chris Stoeckert and Alvis Brazma
The MGED Ontology in Action: MIAMExpress
Journals are Adopting the MGED Standards
Use of Minimal Information About Microarray Experiment (MIAME)
The MGED Ontology in Action: RAD
Generating Forms from the MGED Ontology
OntologyEntry
ExternalDatabases
PHP/SQL WWW
RAD Forms
MGED OntologyAnatomy
DevelopmentalStageDiseaseLineage
PATOAttributePhenotype
Taxon
SRES
RAD3
MGED Ontology
Using the MGED standards in RAD• RAD: RNA Abundance Database
– Stoeckert et al.(2000) Bioinformatics
• RAD 3.0– MIAME compliant and MAGE supportive– Building Importers, exporters for MAGE
• Incorporates MGED ontology– Uses OntologyEntry to point to internal tables and
external resources
• Expand processing and analysis information storage– Driven by experience and new approaches
ElementAnnotation
Analysis
AnalysisImplementationParam
AnalysisInput
AnalysisImplementation1
0..*1
0..*
1 0..*1 0..*
AnalysisInvocationParamAnalysisInvocation1
0..*1
0..*
1
0..*
1
0..*
1 0..*1 0..*
AnalysisOutput
1
0..*
1
0..*
CompositeElementAnnotation
ArrayAnnotation
CompositeElementImp
0..*0..1 0..*0..1
1
0..*
1
0..*
ElementResultImp CompositeElementResultImp
1
0..*
1
0..*
0..10..* 0..10..*
QuantificationParam
RelatedQuantification
Study
StudyDesignDescription
StudyAssay10..* 10..*
StudyDesignAssay
StudyFactorValueAssayLabeledExtract
BioMaterialImp1
0..*
1
0..*
LabelMethod
0..1
0..*
0..1
0..*
ProtocolParam
MAGEDocumentation
MAGE_ML
0..*
1
0..*
1
AcquisitionParam
Assay
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
Channel
1
0..*
1
0..*
0..*0..1
0..*0..1
Quantification1
0..*
1
0..*1
0..*
1
0..*
10..*
10..*
1 0..*1 0..*1 0..*1 0..*
Acquisition1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
RelatedAcquisition1 0..*1 0..*1 0..*1 0..*
ProcessImplementationParam
ProcessIO
ProcessInvocation
1
0..*
1
0..*
ProcessInvocationParam10..* 10..*
Array
1
0..*
1
0..*
10..*
10..* 1 0..*1 0..*
BioMaterialMeasurement1 0..*1 0..*
Protocol
1
0..*
1
0..*
1
0..*
1
0..*
0..1
0..*
0..1
0..*
0..1
0..*
0..1
0..*Treatment
1
0..*
1
0..*
1
0..*
1
0..*
0..1
0..*
0..1
0..*
StudyDesign
1
0..*
1
0..*10..* 10..*
1 0..*1 0..*
BioMaterialCharacteristic1
0..*1
0..*
ProcessImplementation10..* 10..*
1
0..*
1
0..*
ElementImp
0..10..* 0..10..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
1
0..*
Control
1
0..*
1
0..*
ProcessResult1 0..*1 0..*
StudyFactor
1
0..*
1
0..*
10..* 10..*
OntologyEntry10..* 10..*
0..*0..1
0..*0..1
1
0..*
1
0..*
RAD schema uses MAGE/MIAMEMAGE
ExperimentArray
BioMaterialBioAssay
BioAssayData Protocol, Descr.
HigherLevelAnalysis
MAGEExperiment
ArrayBioMaterial
BioAssayBioAssayData
Protocol, Descr.HigherLevelAnalysis
MIAMEExperimental Design
Array designSamples
Hybridization, MeasureNormalization
.
MIAMEExperimental Design
Array designSamples
Hybridization, MeasureNormalization
.
RAD is now part of GUS-3.0 GUS has 5 name spaces compartmentalizing different
types of information.
Namespace Domain Features
Core Data Provenance Workflows
Sres Shared resorurces Ontologies
DoTSsequence and
annotationCentral dogma
RAD Gene expresssion MIAME/MAGE
TESS Gene regulation Grammars
Data Integration
• GO• Species• Tissue• Dev. Stage
Ontologies
SRes
acute myeloid leukemia
Data Provenance
• Ownership• Protection• Algorithms• Similarity• Versioning• Workflow
Core
with sequence similarity to c-fos
GenomicSequence
• Genes, gene models• STSs, repeats, etc• Cross-species analysis
TranscribedSequence
• Characterize transcripts• RH mapping• Library analysis • Cross-species analysis• DOTS
ProteinSequence
• Domains• Function• Structure• Cross-species analysis
DoTS
Transcription factors
•Arrays•SAGE•Conditions
TranscriptExpression
RAD
up-regulated in
• Binding Sites• Patterns• Grammars
Gene Regulation
TESS
and common promoter motifs
GUS Supports Multiple ProjectsAllGenesAllGenes PlasmoDBPlasmoDB
EPConDBEPConDB
CoreSRESTESSRADDoTS
Oracle RDBMS Object Layer for Data Loading
Java ServletsOther sites,Other projects,e.g. GeneDB
Other sites,Other projects,e.g. GeneDB
Available at http://www.gusdb.org
Summary• The MGED ontology is being developed within the microarray
community to provide consistent terminology for experiments.– Make it easier and more accurate to annotate a microarray experiment. – Use structured fields and controlled terms to query databases.
• This community effort has resulted in a list of multiple resources for many species and a machine-readable document of microarray concepts, definitions, and values.– The MGED Ontology is a work in progress but can be used now to
build forms for databases• RAD has incorporated the MGED ontology for forms
– Can export data from RAD into MAGE– RAD as part of GUS provides integration of gene expression,
annotation, and sequence.
Acknowledgements
• MGED Ontology– Helen Parkinson (EBI)
– Trish Whetzel
– The MGED Ontology Working Group
– MAGE working group
• RAD/GUS– Brian Brunk– Jonathan Crabtree– Steve Fischer– Yongchang Gan– Greg Grant – Hongxian He– Li Li– Junmin Liu – Matt Mailman– Elizabetta Manduchi– Joan Mazzarelli– Shannon McWeeney (OHSU) – Debbie Pinney– Angel Pizarro– Jonathan Schug– Trish Whetzel
www.mged.org www.cbil.upenn.edu
http://www.ebi.ac.uk/SOFG