An exemplar for data integration in the biomedical domain...
Transcript of An exemplar for data integration in the biomedical domain...
![Page 1: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/1.jpg)
An exemplar for data integration in the biomedical domain driven by the ISA framework
Shannan Ho SuiAMIA, March 19, 2013
http://stemcellcommons.org
![Page 2: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/2.jpg)
This is a story about collaboration...
![Page 3: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/3.jpg)
ISA
![Page 4: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/4.jpg)
ISA
![Page 5: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/5.jpg)
• Inconsistent data formats, experimental descriptions and results
Disparate Stem Cell Resources
![Page 6: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/6.jpg)
Disparate Stem Cell Resources
• Inconsistent data formats, experimental descriptions and results
![Page 7: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/7.jpg)
The Stem Cell Commons
• A shared data and analytical resource
• Bioinformatics support for research at the HSCI
• A community
Data repository
Analysis system
Support/consults
![Page 8: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/8.jpg)
Susanna-Assunta Sansoneisacommons.org
user community
![Page 9: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/9.jpg)
General-purpose, configurable format, designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other me t ad a t a formats , u s ed by publ i c repositories, e.g.
MAGE-Tab
SRA-xml SOFT
Pride-xml
![Page 10: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/10.jpg)
Rationale for developing ISA
Capture all salient features of the experimental workflow
Make annotation explicit and discoverable
Support data provenance tracking
Use community standards
Susanna-Assunta Sansoneisacommons.org
![Page 11: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/11.jpg)
ISA
Manual merging process
53 studies
1098 assays
87 studies
1179 assays
Curator
148 studies
2356 assays
![Page 12: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/12.jpg)
ISA
Conversion driven by ISA-Tab
53 studies
1098 assays
87 studies
1179 assays
ISA-Tab
148 studies
2356 assays
![Page 13: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/13.jpg)
![Page 14: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/14.jpg)
Data uploads and annotation
![Page 15: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/15.jpg)
Current Data Statistics
![Page 16: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/16.jpg)
Filtering data using metadata as search facets
![Page 17: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/17.jpg)
Experiment description
![Page 18: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/18.jpg)
Experimental protocols and data downloads
![Page 19: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/19.jpg)
ISA-Tab metadata downloads and export
![Page 20: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/20.jpg)
Linking data to the Galaxy workflow engine
![Page 21: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/21.jpg)
Refinery: An analysis and visualization framework
In development
![Page 22: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/22.jpg)
Viewing and selecting samples in list view
![Page 23: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/23.jpg)
Viewing and selecting samples in matrix view
![Page 24: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/24.jpg)
Initiating workflows
![Page 25: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/25.jpg)
Monitoring progress
![Page 26: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/26.jpg)
Integration with the IGV genome browser
![Page 27: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/27.jpg)
Challenges• Changing research culture(s) to recognize the value
of data sharing
• Manually curating the data for consistency and completeness
• Managing large volumes of data
• Standardizing workflows
• Ensuring interoperability when integrating multiple systems and tools
• Technical complexity of software development effort
![Page 28: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/28.jpg)
Refinery
Psalm HaseleyNils Gehlenborg Richard Park Ilya SytchevPeter Park Shannan Ho Sui
![Page 29: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/29.jpg)
ISA Commons
Philippe Rocca-Sera
Eamonn MaguireSusanna Sansone
Oxford e-Research CentreA growing community that uses the ISA metadata tracking framework to facilitate standards-compliant collection, curation, managementand reuse of datasets.
![Page 30: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/30.jpg)
WikiPathways
![Page 31: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent](https://reader034.fdocuments.net/reader034/viewer/2022042123/5e9e58ce0c941c03f2299256/html5/thumbnails/31.jpg)
Meet the TeamCenter for Stem Cell Bioinformatics
Winston HideProgram Leader
Shannan Ho SuiAnalytics
Oliver HofmannCore services
Ilya SytchevBioinformatics Developer
John HutchinsonHSCI Analyst
Sudeshna DasRepository
Stéphane CorlosquetBioinformatics Engineer
Emily MerrillBioinformatics Analyst
• Nils Gehlenborg• Richard Park• Psalm Haseley• Peter Park
Collaborators
• Eamonn Maguire• Philippe Rocca-Sera• Susanna Sansone