ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
-
Upload
susanna-assunta-sansone -
Category
Technology
-
view
112 -
download
0
description
Transcript of ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
Toward interoperable bioscience data
Susanna-Assunta Sansone, PhD
Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK
@isatools @biosharing
ISMB 2012, Long Beach, California, USA, July 15-17
ISMB hashtag: #PP44
Highlights Track: Databases and Ontologies
§ ISA Commons, a grass-root collaborative that works to facilitate collection, curation and sharing of experiments in an increasingly diverse set of life science domains, using a common, structured representation of the experiments that • transcends individual biological and technological domains, • follows the appropriate community norms and standards, many
listed in the BioSharing catalogue and • is implemented by several curation, storage and data sharing tools
What is this presentation about?
www.biosharing.org
www.isacommons.org
TOWARDS INTEROPERABLE BIOSCIENCE DATA
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Feb 2012 www.isacommons.org
doi:10.1038/ng.1054
ISMB tag: #PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
From reusable data to reproducible research
To make the datasets comprehensible, interoperable and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated results.
Consistent reporting will have a positive and long-lasting impact on the
value of collective scientific outputs.
ISMB tag: #PP44
§ Capture all salient features of the experimental workflow
§ Make annotation explicit and discoverable
§ Structure the descriptions for consistency, tracking § independent variables § dependent variables using § cross reference and
resolvable identifiers
Structured description of datasets ISMB tag:
#PP44
§ We must strike a balance between • depth and breadth of
information; and • sufficient information
required to reuse the data
Not too much, not too little, just ‘right’ ISMB tag:
#PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
6
Example of experiments by InnoMed PredTox a FP6 public-private consortium
sample characteristic(s)
experimental design
experimental variable(s)
technology(s)
measurement(s)
protocols(s)
data file(s)
......
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
Challenges: different communities, different norms and standards, lack of coordination, fragmentation and uneven coverage…
A ‘general mobilization’ to develop standards, e.g.: ISMB tag:
#PP44
Growing number of reporting standards
+ 130
Estimated
+ 150
Source: MIB
BI,
EQU
ATOR
+ 303
Source: BioPortal
MIAME!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
Each one focuses on a particular biological or technological domains
ISMB tag: #PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
9
A catalogue to map the landscape of standards : over 400 bio-standards (public and in curation)
Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598
Example of multi-assays study – how many ‘standards’ are applicable to this?
ISMB tag: #PP44
Example of multi-assays study – how many ‘standards’ are applicable to this?
ISMB tag: #PP44
Example of multi-assays study – how many ‘standards’ are applicable to this?
ISMB tag: #PP44
Example of multi-assays study – how many ‘standards’ are applicable to this?
ISMB tag: #PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
user community
ISMB tag: #PP44
Metadata tracking framework, designed to support the use us several standards checklists, terminologies conversions to (a growing number of) other metadata formats, used by public repositories, e.g. Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG)
MAGE-Tab Pride-xml
SRA-xml SOFT
ISMB tag: #PP44
(Rocca-Serra et al, 2010)
a collaborative effort of international research/service groups: University of Oxford, EBI, Harvard School of Public Health, NERC Environmental Bioinformatics Centre, Genomic Standards Consortium, US FDA Center for Bioinformatics, Leibniz Institute of Plant Biochemistry and more….
ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level
ISMB tag: #PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
17
empowering researchers to use standards
To mint DOIs
ISMB tag: #PP44
Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M. Taxonomy-based Glyph Design -- with a Case Study on Visualizing Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012
(in press)
ISMB tag: #PP44
Ontology Search and Tagging in Google Spreadsheets
ISMB tag: #PP44
Ontology Search and Tagging in Google Spreadsheets
ISMB tag: #PP44
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics,
We aim to achieve a common representation of experimental content that transcends individual bioscience domains
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build
a library of cellular signatures
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
Nanotechnology Informatics Working
Group
Some of the internal projects: Some of the public groups/resources:
4
Stem Cell Commons
Stem Cell Commons
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build
a library of cellular signatures
Implementations at Harvard
ISA
ISMB tag: #PP44
Importance of a local community
Implementations at Harvard ISMB tag:
#PP44
Importance of a local community
Implementations at Harvard
data sharing in ISA-Tab
ISMB tag: #PP44
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
26
Implementation at the EBI ISMB tag:
#PP44
Data papers
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
28
Nanotechnology Informatics Working Group
Extensions
@isatools @biosharing isacommons.org biosharing.org
www.biosharing.org
www.isacommons.org
TOWARDS INTEROPERABLE BIOSCIENCE DATA
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Feb 2012 www.isacommons.org
doi:10.1038/ng.1054
Development timeline
Community involvement and uptake!
Core developments!
2008 2009 2010
1st ISA-Tab workshop!3rd ISA-Tab workshop!
2nd ISA-Tab workshop!
Final ISA-Tab spec! Database instance !at EBI!
ISA software v1!
2011
1st public instance: !Harvard Stem Cell !Discovery Engine!
RDF format starts!
Conversions to !Pride-XML/SRA-XML/!MAGE-Tab and more!
User workshops/visits - start!Growing number of systems starts to adopt ISA framework!
Publications!
‘Omics data sharing!(Science)!
ISA-Tab and !ISA software suite!(Bioinformatics)!
Stem Cell !Discovery !Engine!(NAR)!
2007 2012
Strawman ISA-Tab spec!
Other tools implement !ISA-Tab!
Workshop reports!ISA Commons!(Nature Genetics)!
Links to analysis tools starts!