dbgg – database for genetical genomics update
description
Transcript of dbgg – database for genetical genomics update
/ 1groningen bioinformatics center
Morris Swertz
Morris Swertz ([email protected]) Braunschweig CASIMIR meeeting
July 2, 2008
dbgg – database for genetical genomicsupdate
/ 2groningen bioinformatics center
Morris Swertz
Objective› Share genotype/phenotype data and
tools:
/ 3groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Main work flow
Data dependency
Biomaterial/result
Lab/analysis process
Scale of information
Associated data files
process
material
10,000
Complicated experiments
/ 4groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Collaborator 1
Collaborator 2
Collaborator 3
Barriers to sharing data
Incomplete data!
Incompatible data!
/ 5groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Investigation 1
Investigation 2
Investigation 3
Barriers to sharing data
Incomplete and/or
incompatible data!
/ 6groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Barriers to sharing software tools
/ 7groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Barriers to sharing software tools
/ 8groningen bioinformatics center
Morris Swertz
10,000
QTL profiles
10,000
QTL profiles
10,000
QTL profiles
Barriers to sharing software tools
Hard to find and reuse tools
/ 9groningen bioinformatics center
Morris Swertz
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
10.000
map
Use a standard tool?
/ 10groningen bioinformatics center
Morris Swertz
Main work flow
Data dependency
Biomaterial/result
Lab/analysis process
Scale of information
Associated data files
process
material
10,000
arab 220903
100 200 300 400 500 600 700 800 900 1000m/z0
100
%
Koornneef0007 526 (11.117) AM (Top,4, Ar,10000.0,556.28,0.70,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702
1396
649.3804551
526.3066248172.1795
162
650.3882224
809.4496;80
inbreed
100
100.000
10,000,000
1000
10,000
10
1000
genotypeindividuals
mass peaks
genotypes QTL profiles
strains
network
SNP arrays
correlate
LC/MS
genome
map
preprocess aligned peaks
More biotechnologies, more protocols
Yes, if it could be easily adapted!(and they can’t)
/ 11groningen bioinformatics center
Morris Swertz
Objectives› Share genotype/phenotype data and tools:
1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software
2. Build on extensible data model• Data• Annotations• Investigations• Integration references
› Next steps
/ 12groningen bioinformatics center
Morris Swertz
The software› Share genotype/phenotype data and tools:
1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software
2. Build on extensible data model• Data• Annotations• Investigations• Integration references
› Next steps
/ 13groningen bioinformatics center
Morris Swertz
Software: flat file exchange format› Raw and processed data in matrix form
BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD71415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222E.g. microarray data.
Rows = individuals, cols = affy probes.
/ 14groningen bioinformatics center
Morris Swertz
Software: flat file exchange format› Annotation info in tabular formname gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069
E.g. probe annotation data.Rows = probes
cols = attributes of each probe.
/ 15groningen bioinformatics center
Morris Swertz
Software: exchange an experiment Described on
http://gbic.biol.rug.nl/dbgg
dbGG database
dbGG Importtool
dbGG Exporttool
annotations
Raw and processed data
/ 16groningen bioinformatics center
Morris Swertz
Software: web user interface
Software
http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do
/ 17groningen bioinformatics center
Morris Swertz
Software: interface to Rsource(“http://localhost:8080/molgenis4gg/R”)
#download data use.experiment(name=“metanetwork”) #set default traits <- get.metabolitedata(name=“mytraits”) genotypes <- get.markerdata(name=“mygenotypes")
#calculate mQTLslibrary(“MetaNetwork”) qtls <- qtlMapTwoPart(genotypes=genotypes,
traits=traits, spike=4)
#upload results for others to useadd.mqtldata(qtls, name=“myqtls”)
RIL1 RIL3 RIL4 … LCavg.1537 NA 942 2402 … LCavg.1594 NA 4 10 … LCvag.1610 NA 55 62 … … … … … …
Input markers,traits, genotypes
AMap QTLs(qtlMapTwoPart)
QTL profiles{qtlProfiles}
QTL threshold{qtlThres}
BSimulation/FDR(qtlThreshold/qtlFDR)
QTL summary{qtlSumm}
Input mass/chargepeaks
CQTL summary(qtlSummary) Significant QTLs
DZero-ordercorrelation
Peak multiplicity{peakMultiplicity}
HPeak multiplicity(findPeakultiplicity)
Correlation matrix{corrZeroOrder}
FPermutation(qtlCorrThreshold)
E2nd-order correlation(qtlCorrSecondOrder )
Correlation matrix{corrSecondOrder}
Correlation threshold{corrThres}
GCreate network(createCytoFiles)
Network files[network.sif,
network.eda]
inspectMetaNetwork protocol:Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.
/ 18groningen bioinformatics center
Morris Swertz
Software: interface to Taverna
add dbGG interface
/ 19groningen bioinformatics center
Morris Swertz
Software: interface to TavernaUse data in
dbGG
/ 20groningen bioinformatics center
Morris Swertz
This enables automatic processing(see also CASIMIR use ‘case 1’)
Smedley, Swertz, Wolstencroft et al, Submitted.
dbGG
/ 21groningen bioinformatics center
Morris Swertz
Use BioMART and MOLGENIS to access data and Taverna to automate the workflows
Smedley, Swertz, Wolstencroft et al, Submitted.
ws
ws
ws
SNPsStrain SNP Alleles
Pathways ws
Gene symbols
YourdbGG
/ 22groningen bioinformatics center
Morris Swertz
Reusable assets and generator/interpreter
Little language<!-- entity organization --><entity name="Experiment" label="Experiment"> <field name="ExperimentID" key="1“ readonly="true" label="ExperimentID(autonum)"/> <field name="Medium" type="xref" xref_field="Medium.name"/> /> <field name="Protocol" label="Experiment Protocol"/> <field name="Temperature" type="int"
Domain specific language<!-- entity organization --><entity name="Experiment" label="Experiment"> <field name="ExperimentID" key="1“ readonly="true" label="ExperimentID(autonum)"/> <field name="Medium" type="xref" xref_field="Medium.name"/> /> <field name="Protocol" label="Experiment Protocol"/> <field name="Temperature" type="int" +
Software: extension procedure(using MOLGENIS)
dbGG v1: for microarrays
dbGG v2: for mass
spectrometry
/ 23groningen bioinformatics center
Morris Swertz
Software: extension procedure
<entity name="Metabolite" extends="Trait"> <field name="Formula" nillable="true" description="The chemical formula of a metabolite." /> <field name="Mass" type="decimal" nillable="true" description="The mass of this metabolite" /> <field name="Structure" type="text" nillable="true" description="The chemical structure of this metabolite." /> </entity>
/ 24groningen bioinformatics center
Morris Swertz
Website: demos and downloads
http://gbic.biol.rug.nl/dbgg
/ 25groningen bioinformatics center
Morris Swertz
Outline› To share genotype/phenotype data and
tools:1. Interoperable software
• Flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software
2. Build on extensible data model• Data• Annotations• Investigations• Integration references
› Next steps
/ 26groningen bioinformatics center
Morris Swertz
BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD7rs13475697 1 1 0 1 0 1 0rs13475698 1 0 0 0 0 0 1rs13475699 0 0 0 1 0 1 1rs13475700 1 1 1 1 0 1 0rs13475701 1 0 1 0 0 1 1rs2228909 1 1 0 1 0 0 0rs2228910 0 0 1 1 0 0 0rs3022775 0 0 0 1 1 0 1rs3024102 1 0 1 0 0 0 0rs3024103 1 0 0 1 0 0 0rs3024104 0 1 0 0 0 0 0rs3024105 0 0 1 0 0 0 1rs30462182 1 0 0 0 0 0 0rs30522279 0 1 0 0 1 0 0
Data› Simple and close to current practice:
Genotype data
MARKERS
Subjects: STRAINS
DATA ELEMENTS
Traits:
TRAIT SUBJECT
/ 27groningen bioinformatics center
Morris Swertz
Data› Simple and close to current practice:
Genotype dataExpression data
BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD71415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222
PROBES
Subjects: INDIVIDUALS
DATA ELEMENTS
Traits:
TRAIT SUBJECT
/ 28groningen bioinformatics center
Morris Swertz
Data› Simple and close to current practice:
Genotype dataExpression data
Classic phenotype dataMetabolite abundance data
Protein abundance dataAnd so on…
TRAIT SUBJECT
/ 29groningen bioinformatics center
Morris Swertz
Data with any Dimension Type
TRAIT
SUBJECT
DATA ELEMENT
• Individual,
• Strain,• Sample,• …
• Probe• Marker• Mass
Peak• …
TRAIT SUBJECT
/ 30groningen bioinformatics center
Morris Swertz
Data› Simple and close to current practice:
What about QTL data?
rs13475697rs13475698rs13475699rs13475700rs13475701rs2228909 rs22289101415670_at 0,981848 0,293227 0,034092 0,360978 0,298958 0,466545 0,3703691415671_at 0,464346 0,817348 0,990231 0,204923 0,353808 0,668164 0,4493541415672_at 0,243834 0,900083 0,69971 0,217804 0,471408 0,701617 0,0266091415673_at 0,712543 0,001536 0,209082 0,196611 0,191452 0,91619 0,5356591415674_a_at 0,159777 0,101577 0,678902 0,233476 0,251812 0,349968 0,5671711415675_at 0,777691 0,371057 0,670919 0,410665 0,742277 0,142381 0,5409451415676_a_at 0,320175 0,358505 0,207274 0,952688 0,615915 0,07167 0,2258231415677_at 0,840063 0,281845 0,773908 0,396397 0,482995 0,56668 0,199461415678_at 0,880974 0,471662 0,906012 0,711181 0,622078 0,575441 0,8688161415679_at 0,164846 0,957785 0,794479 0,207902 0,091649 0,727786 0,7960581415680_at 0,56679 0,823206 0,321578 0,513087 0,593739 0,272818 0,6208171415681_at 0,215698 0,384919 0,691254 0,550108 0,603988 0,110792 0,3801261415682_at 0,45273 0,36089 0,733234 0,911573 0,549316 0,086473 0,6396251415683_at 0,526019 0,740045 0,955297 0,797566 0,149079 0,370645 0,57789
PROBES
Traits: MARKERS
DATA
Traits:
/ 31groningen bioinformatics center
Morris Swertz
rs13475697rs13475698rs13475699rs13475700rs13475701rs2228909 rs22289101415670_at 0,981848 0,293227 0,034092 0,360978 0,298958 0,466545 0,3703691415671_at 0,464346 0,817348 0,990231 0,204923 0,353808 0,668164 0,4493541415672_at 0,243834 0,900083 0,69971 0,217804 0,471408 0,701617 0,0266091415673_at 0,712543 0,001536 0,209082 0,196611 0,191452 0,91619 0,5356591415674_a_at 0,159777 0,101577 0,678902 0,233476 0,251812 0,349968 0,5671711415675_at 0,777691 0,371057 0,670919 0,410665 0,742277 0,142381 0,5409451415676_a_at 0,320175 0,358505 0,207274 0,952688 0,615915 0,07167 0,2258231415677_at 0,840063 0,281845 0,773908 0,396397 0,482995 0,56668 0,199461415678_at 0,880974 0,471662 0,906012 0,711181 0,622078 0,575441 0,8688161415679_at 0,164846 0,957785 0,794479 0,207902 0,091649 0,727786 0,7960581415680_at 0,56679 0,823206 0,321578 0,513087 0,593739 0,272818 0,6208171415681_at 0,215698 0,384919 0,691254 0,550108 0,603988 0,110792 0,3801261415682_at 0,45273 0,36089 0,733234 0,911573 0,549316 0,086473 0,6396251415683_at 0,526019 0,740045 0,955297 0,797566 0,149079 0,370645 0,57789
PROBES
Traits: MARKERS
DATA
Data› Simple and close to current practice:
What about QTL data?Probe association data?
Interaction network data?
TRAIT TRAITSUBJECT SUBJECT
Traits:
/ 32groningen bioinformatics center
Morris Swertz
DATA ELEMENT
Data with any Dimension Type› Minimal data model
TRAIT
SUBJECT
DATA ELEMENT columns
rows
dimension ELEMENT
/ 33groningen bioinformatics center
Morris Swertz
The data model› To share genotype/phenotype data and
tools:1. Extensible data model
• Data• Annotations• Investigations• Integration references
/ 34groningen bioinformatics center
Morris Swertz
Annotations› Simple and close to current practice
Probe annotations
name gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069
PROBE IS A VARIANT OF TRAITHAVING:-Name-Gene-Chromosme-Locus
/ 35groningen bioinformatics center
Morris Swertz
DATA ELEMENT
Annotation extends Trait or Subject
TRAIT
SUBJECT
column
row
dimension ELEMENT
PROBE-Name-Gene-Chromosme-Locus
MARKER-Name-Allele-Chromosme-Locus
MASSPEAK-Name-MZ-RetentionTime
STRAIN-Name-Type: CSS, RIL..-Parent Strains
INDIVIDUAL-Name-Strain-Mother-Father-Sex
SAMPLE-Name-Individual-Tissue
And so on…
And so on…
/ 36groningen bioinformatics center
Morris Swertz
DATA ELEMENTMARKER
STRAIN
Annotation simple in practice
DATA ELEMENT
PROBE
MARKER
Genotype data QTL data
DATA ELEMENT
MARKER
INDIVIDL
Expression data Extensions are automatic “under the hood”PROBE isa TRAIT isa DIMENSION ELEMENT PROBE
TRAIT
dimension ELEMENT
/ 37groningen bioinformatics center
Morris Swertz
DATA ELEMENTSBxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD7
1415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222
PROBES
Data and annotations
name gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069
/ 38groningen bioinformatics center
Morris Swertz
The data model› To share genotype/phenotype data and
tools:1. Extensible data model
• Data• Annotations• Investigations• Integration references
/ 39groningen bioinformatics center
Morris Swertz
Investigation workflow in the lab
?
?
?DATA ELEMENTMARKER
STRAIN
DATA ELEMENT
PROBE
MARKER
DATA ELEMENT
MARKER
INDIVIDL
DATA DATA
DATA
Genotype data QTL data
Expression data
/ 40groningen bioinformatics center
Morris Swertz
DATA
Investigation building on FuGE
DATA
Genotype data QTL dataQTL
MappingAffy
Array
SNPArray DATA
Expression data
FuGE: Jones et al Nature Biotech 25, 1127-1133
MappingProtocol
Illumina
RSoftware
IlluminaProtocol
Affy M430Protocol
BeadStudio
DATA
application
Protocol
Software
Equipment
BioconductorNorm.
Affy M430platform
DATA DATA
DATA
FuGE:
/ 41groningen bioinformatics center
Morris Swertz
Summary of data model
DATA ELEMENT
TRAIT
SUBJECT
dimension ELEMENT
DATA PROTOCOLAPPLICTION
PROTOCOL
INVESTIGATION
column
row Equipment
Software
STRAINPROBE INDIVIDLMARKER
…
/ 42groningen bioinformatics center
Morris Swertz
The data model› To share genotype/phenotype data and
tools:1. Extensible data model
• Data• Annotations• Investigations• Integration references
/ 43groningen bioinformatics center
Morris Swertz
DATABASE REFERENCE
Id = ENSMU0S98Db=ENSEMBL
References for integration› Ontology references and database references
GENEName = Mip1alpha
GENEName = Mip1a
INVESTIGATION 1
INVESTIGATION 2
DATABASE REFERENCE
Id = ENSMUS098Db=ENSEMBL
DATABASE REFERENCE
Id = ENSMUS98Db=ENSEMBL
DATABASE REFERENCE
Id = 1419561_ATDb=AFFY 430
ONTOLOGYENTRY
Id = 0005615Term = ABC
Ontology=GO
Hyperlink…
FuGE: Jones et al Nature Biotech 25, 1127-1133
ONTOLOGYENTRY
Id = MP:0005385Term = cardiovascular
Ontology=MP
Incompatible naming
CompatibleIdentifiers
Map mouse on human ontologies
/ 44groningen bioinformatics center
Morris Swertz
Summary of data model
DATA ELEMENT
TRAIT
SUBJECT
dimension ELEMENT
DATA PROTOCOLAPPLICTION
PROTOCOL
INVESTIGATION
column
row Equipment
Software
DATABASE REFERENCE
ONTOLOGYENTRY
Hyperlink…
STRAINPROBE INDIVIDLMARKER
extensible to more experiments…
/ 46groningen bioinformatics center
Morris Swertz
What is on the todo
/ 47groningen bioinformatics center
Morris Swertz
Todo› Publication: submitted
› Building a catalog of tools on top of dbGG•Experiments: in Braunschweig and Groningen
• Illumina, Affy, Metabolites•Tool ‘plug-ins’
• QTL graphs, import of annotations etc.
› Exploit interoperability•E.g. integrate mouse & human with ontologies•Load annotations from other dbGG/BioMARTs•Build on and extend R/Taverna interaction
/ 48groningen bioinformatics center
Morris Swertz
Summary and questions› Share genotype/phenotype data and tools:
1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software
2. Build on extensible data model• Data• Annotations• Investigations• Integration references
› Next steps
/ 49groningen bioinformatics center
Morris Swertz
Thank [email protected]
Morris A. SwertzBruno M. TessonRichard A. ScheltemaGonzalo VeraRudi AlbertsDamian SmedleyKaty WolstencroftAndrew R. Jones
Klaus SchughartJohn M. HancockHelen E. Parkinson Engbert O. de BrockCarole GoblePaul SchofieldRitsert C. Jansenthe GEN2PHEN consortiumthe CASIMIR consortium
/ 50groningen bioinformatics center
Morris Swertz
Appendix:Procedure to (re)generate a MOLGENIS
/ 51groningen bioinformatics center
Morris Swertz
MOLGENIS for data
/ 52groningen bioinformatics center
Morris Swertz
Describe in little language
Assay
ID : autoidName : varchar
ID : autoidValue : object
DataColumn
1
Assay 1Row 1
ID : autoidName : varchar
Experiment
Experiment 1Experiment1
ID : autoidName : varchar
Trait
ID : autoidName : varchar
Subject
Experiment1
individuals
expressions
probes
/ 53groningen bioinformatics center
Morris Swertz
Assay
ID : autoidName : varchar
ID : autoidValue : object
DataColumn
1
Assay 1Row 1
ID : autoidName : varchar
Experiment
Experiment 1Experiment1
ID : autoidName : varchar
Trait
ID : autoidName : varchar
Subject
Experiment1
Describe in little language
/ 54groningen bioinformatics center
Morris Swertz
Describe in little languageAssay
ID : autoidName : varchar
ID : autoidValue : object
DataColumn
1
Assay 1Row 1
ID : autoidName : varchar
Experiment
Experiment 1Experiment1
ID : autoidName : varchar
Trait
ID : autoidName : varchar
Subject
Experiment1
/ 55groningen bioinformatics center
Morris Swertz
Case GG: Generate and evaluate
http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase
/ 56groningen bioinformatics center
Morris Swertz
Describe in little language
http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase