Bioanalytical method validation - Global regulatory chalenges
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook
-
Upload
ardan-patwardhan -
Category
Science
-
view
33 -
download
0
Transcript of 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook
Public archiving of bio-imaging data – perspectives, challenges and outlook Ardan Patwardhan
Outline• Introduction• EMDB and EMPIAR status• Resources for EMDB and EMPIAR• On-going projects, initiatives and plans
Introduction
Molecular and Cellular Structure
• Maintain and manage archives• PDB for atomic coordinate
models• EMDB for 3DEM
reconstructions• EMPIAR for 3DEM raw data
• Develop and maintain web-services – searching, visualisation and validation
• Facilitate community-wide initiatives
• Key themes – integration with other bioinformatics resources and imaging scales and validation
Structural data archivesArchive Type of
dataFounded Organization Funding # people # entries Size
PDB Atomic coordinate models structures
1971 wwpdb (EBI, RCSB, PDBj, BMRB)
Core + grants
60-80 124286 1 TB (8 MB)
EMDB 3DEM volume structures
2002 EBI (+ RCSB, PDBj)
Core + grants
<10 4276 340 GB (80 MB)
EMPIAR Raw image data for EMDB structures
2014 EBI grant <5 61 40 TB (660 GB)
Stats until 9th Nov 2016
What goes where...• Final single-particle and sub-tomogram average maps must
go to EMDB (tomograms strongly recommended)• Fitted models must go to PDB• Deposition of raw image data to EMPIAR is encouraged
EMDBFinal map
EMPIARRaw image data
PDBFitted model
Benefits of public archiving• Reuse of data
• starting models• compare structures of different functional states• different emphasis may lead to new discoveries
• Validation, methods development, testing, training• Safe storage of data• Integration of data with other public archives• A resource for data mining• Enables a birds-eye perspective of the field
What does archiving involve?• Working with the community, partners and
journals to achieve a consensus on practices, policies and procedures
• Adapting to changing needs of data and meta-data collection• new sample preparation methods• new validation methods
• Providing means to deposition data, e.g., web-based deposition systems
• Curating data – automated + manual, remediation• maximize structured annotation, minimize free-text
• Developing added value resources for searching, validating and visualizing data
Viability• Community support• Value – uploads versus downloads• Data transfer technologies – Aspera, Globus• Data storage – file systems, object stores• Data fidelity – quality measures and validation• Annotation – structured versus unstructured• Centralised versus distributed
EMPIAR• Electron microscopy pilot (or public?) image archive• Started in 2014• Raw 2D image datasets related to EMDB• Usage: validation, development, testing, teaching
and…• Safe storage of your data!• Was source for data in EM Map Validation Challenge
• Multi-frame micrographs, averaged micrographs, particle-stacks, tilt series
• Uses Aspera, Globus, ftp, http for data transfers
Websites• emdb-empiar.org – EMDB website• empiar.org – EMPIAR website• pdbe.org – PDBe website• wwpdb.org – Coordinating organization for pdb
archive• emdatabank.org – EMDataBank NIH project
website• https://www.facebook.com/proteindatabank• https://twitter.com/pdbeurope
EMDB and EMPIAR status
EMDB trends – released entries
Stats until 2 Nov 2016
EMPIAR metrics• Number of entries: 61 (40TB; average size ~ 650GB)• 7 TB+ sets; one 10TB+ dataset • Transfer speed: uploads 1-2 TB/24h (Europe, US, Australia)• “empiar” cited 20+ times in full-text open-access papers• Nature Methods publication (Iudin et al., 2016)
2014 2015 20160
0.51
1.52
2.53
3.54
Aspera uploads/month (users)
2014 2015 20160
0.5
1
1.5
2
2.5
3
Aspera uploads/month (TB)
2014 2015 201601020304050607080
Total downloads (users)
2014 2015 201605101520253035
Total downloads (data)
Resources for EMDB and EMPIAR
Searching EMDB - quick links + latest entries
emdb-empiar.org
EMStats – journal stats
Volume slicer• Available for all EMDB entries• Published in J Struct Biol (Salavert Torres et al., 2016)
emdb-empiar.org/emd-2363/3dslice
EMPIAR website
empiar.org
EMPIAR entry pages
empiar.org/empiar-10030
EMPIAR API
empiar.org/api/entry/empiar-10004
On-going projects, initiatives and plans
Volume browser• Integrated visualisation of structural data• Spanning scales from cells to molecules
Expert workshop on “3D segmentations and transformations - building bridges between cellular and molecular structural biology”
Madingley Hall, 6-7 Dec 2015
Co-funded by
File format and translators• EMDB Segmentation File Format (EMDB-SFF)
• adds structured biological annotation• handles transforms between tomograms and subtomograms
• Python scripts to read Segger, IMOD and Amira and convert to EMDB-SFF
• Working on displaying segmentations in Omero• Public open source distribution through CCP-EM
Future directions• Archiving for related imaging modalities including
• 3D scanning electron microscopy• correlative light and electron microscopy• soft X-ray tomography
• Data harvesting pipelines• Validation
• Deposition support for new kinds of validation data• Validation servers, e.g., for visual analysis, map versus model
FSC• Data-mining EMDB to develop new validation metrics
• Fast archive-wide sub-structure volumetric (or shape-based) searches
Acknowledgements• Gerard Kleywegt• EM group
• Sanja Abbott• Andrii Iudin• Paul Korir• Carlos Lugo• Eduardo Sanz Garcia• Jose Salavert Torres (UPV)• Ingvar Lagerstedt (EL)• Maya Holmdahl (UU)• Vladislav Lysenkov (MAMK)
• Birkbeck• Maya Topf• Agnel Praveen Joseph• Helen Saibil
• Baylor – Wah Chiu• RCSB – Cathy Lawson• Francis Crick
• Lucy Collinson
• Raffaella Carzaniga • STFC
• Martyn Winn• Tom Burnley
• Dundee• Jason Swedlow• Josh Moore
• CNB Madrid• Jose Maria Carazo• Pablo Conesa• Jose Miguel de la Rosa Trevin• Joan Segura Mora
• And many more!