2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

27
Public archiving of bio-imaging data – perspectives, challenges and outlook Ardan Patwardhan

Transcript of 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Page 1: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Public archiving of bio-imaging data – perspectives, challenges and outlook Ardan Patwardhan

Page 2: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Outline• Introduction• EMDB and EMPIAR status• Resources for EMDB and EMPIAR• On-going projects, initiatives and plans

Page 3: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Introduction

Page 4: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Molecular and Cellular Structure

• Maintain and manage archives• PDB for atomic coordinate

models• EMDB for 3DEM

reconstructions• EMPIAR for 3DEM raw data

• Develop and maintain web-services – searching, visualisation and validation

• Facilitate community-wide initiatives

• Key themes – integration with other bioinformatics resources and imaging scales and validation

Page 5: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Structural data archivesArchive Type of

dataFounded Organization Funding # people # entries Size

PDB Atomic coordinate models structures

1971 wwpdb (EBI, RCSB, PDBj, BMRB)

Core + grants

60-80 124286 1 TB (8 MB)

EMDB 3DEM volume structures

2002 EBI (+ RCSB, PDBj)

Core + grants

<10 4276 340 GB (80 MB)

EMPIAR Raw image data for EMDB structures

2014 EBI grant <5 61 40 TB (660 GB)

Stats until 9th Nov 2016

Page 6: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

What goes where...• Final single-particle and sub-tomogram average maps must

go to EMDB (tomograms strongly recommended)• Fitted models must go to PDB• Deposition of raw image data to EMPIAR is encouraged

EMDBFinal map

EMPIARRaw image data

PDBFitted model

Page 7: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Benefits of public archiving• Reuse of data

• starting models• compare structures of different functional states• different emphasis may lead to new discoveries

• Validation, methods development, testing, training• Safe storage of data• Integration of data with other public archives• A resource for data mining• Enables a birds-eye perspective of the field

Page 8: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

What does archiving involve?• Working with the community, partners and

journals to achieve a consensus on practices, policies and procedures

• Adapting to changing needs of data and meta-data collection• new sample preparation methods• new validation methods

• Providing means to deposition data, e.g., web-based deposition systems

• Curating data – automated + manual, remediation• maximize structured annotation, minimize free-text

• Developing added value resources for searching, validating and visualizing data

Page 9: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Viability• Community support• Value – uploads versus downloads• Data transfer technologies – Aspera, Globus• Data storage – file systems, object stores• Data fidelity – quality measures and validation• Annotation – structured versus unstructured• Centralised versus distributed

Page 10: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMPIAR• Electron microscopy pilot (or public?) image archive• Started in 2014• Raw 2D image datasets related to EMDB• Usage: validation, development, testing, teaching

and…• Safe storage of your data!• Was source for data in EM Map Validation Challenge

• Multi-frame micrographs, averaged micrographs, particle-stacks, tilt series

• Uses Aspera, Globus, ftp, http for data transfers

Page 11: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Websites• emdb-empiar.org – EMDB website• empiar.org – EMPIAR website• pdbe.org – PDBe website• wwpdb.org – Coordinating organization for pdb

archive• emdatabank.org – EMDataBank NIH project

website• https://www.facebook.com/proteindatabank• https://twitter.com/pdbeurope

Page 12: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMDB and EMPIAR status

Page 13: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMDB trends – released entries

Stats until 2 Nov 2016

Page 14: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMPIAR metrics• Number of entries: 61 (40TB; average size ~ 650GB)• 7 TB+ sets; one 10TB+ dataset • Transfer speed: uploads 1-2 TB/24h (Europe, US, Australia)• “empiar” cited 20+ times in full-text open-access papers• Nature Methods publication (Iudin et al., 2016)

2014 2015 20160

0.51

1.52

2.53

3.54

Aspera uploads/month (users)

2014 2015 20160

0.5

1

1.5

2

2.5

3

Aspera uploads/month (TB)

2014 2015 201601020304050607080

Total downloads (users)

2014 2015 201605101520253035

Total downloads (data)

Page 15: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Resources for EMDB and EMPIAR

Page 16: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Searching EMDB - quick links + latest entries

emdb-empiar.org

Page 17: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMStats – journal stats

Page 18: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Volume slicer• Available for all EMDB entries• Published in J Struct Biol (Salavert Torres et al., 2016)

emdb-empiar.org/emd-2363/3dslice

Page 19: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMPIAR website

empiar.org

Page 20: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMPIAR entry pages

empiar.org/empiar-10030

Page 21: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

EMPIAR API

empiar.org/api/entry/empiar-10004

Page 22: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

On-going projects, initiatives and plans

Page 23: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Volume browser• Integrated visualisation of structural data• Spanning scales from cells to molecules

Page 24: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Expert workshop on “3D segmentations and transformations - building bridges between cellular and molecular structural biology”

Madingley Hall, 6-7 Dec 2015

Co-funded by

Page 25: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

File format and translators• EMDB Segmentation File Format (EMDB-SFF)

• adds structured biological annotation• handles transforms between tomograms and subtomograms

• Python scripts to read Segger, IMOD and Amira and convert to EMDB-SFF

• Working on displaying segmentations in Omero• Public open source distribution through CCP-EM

Page 26: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Future directions• Archiving for related imaging modalities including

• 3D scanning electron microscopy• correlative light and electron microscopy• soft X-ray tomography

• Data harvesting pipelines• Validation

• Deposition support for new kinds of validation data• Validation servers, e.g., for visual analysis, map versus model

FSC• Data-mining EMDB to develop new validation metrics

• Fast archive-wide sub-structure volumetric (or shape-based) searches

Page 27: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Acknowledgements• Gerard Kleywegt• EM group

• Sanja Abbott• Andrii Iudin• Paul Korir• Carlos Lugo• Eduardo Sanz Garcia• Jose Salavert Torres (UPV)• Ingvar Lagerstedt (EL)• Maya Holmdahl (UU)• Vladislav Lysenkov (MAMK)

• Birkbeck• Maya Topf• Agnel Praveen Joseph• Helen Saibil

• Baylor – Wah Chiu• RCSB – Cathy Lawson• Francis Crick

• Lucy Collinson

• Raffaella Carzaniga • STFC

• Martyn Winn• Tom Burnley

• Dundee• Jason Swedlow• Josh Moore

• CNB Madrid• Jose Maria Carazo• Pablo Conesa• Jose Miguel de la Rosa Trevin• Joan Segura Mora

• And many more!