AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sciences

50
BioSharing. org – mapping the landscape of standards in the life sciences Peter McQuilton,PhD (@drosophilic) BioSharing content lead On behalf of the BioSharing team (@biosharing)

Transcript of AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sciences

BioSharing.org – mappingthelandscapeofstandardsinthe

lifesciences

PeterMcQuilton,PhD(@drosophilic)BioSharing contentlead

OnbehalfoftheBioSharing team(@biosharing)

Outline

• Standards,databasesandpoliciesinthelifesciences

• BioSharing – aninformativeandeducational

resource

• Whatitis

• Howtouseit

• Howitcanhelpyou

Agrowthindata,agrowthindatabases

Number ofdatabasesintheNARdatabaseissue, upto2015(from@AlexBateman1)

Credit:ttps://projects.ac/blog/five-top-reasons-to-protect-your-da ta-and-practise-safe-science/ 2014

Betterdata=betterscience

TheFAIRPrinciples

Butinallfairness,notalldataisFAIR!

A B C D E1 Group1 Group22 Day 03 Sodium 139 1424 Potassium 3.3 4.85 Chloride 100 1086 BUN 18 187 Creatine 1.2 1.28 Uric acid 5.5* 6.2*9 Day 710 Sodium 140 14611 Potassium 3.4 5.112 Chloride 97 108

S1Sh.cuo

Creditto:IainHrynaszkiewicz

Sharingstartswithgoodmetadata…

A B C D E1 Group1 Group22 Day 03 Sodium 139 1424 Potassium 3.3 4.85 Chloride 100 1086 BUN 18 187 Creatine 1.2 1.28 Uric acid 5.5* 6.2*9 Day 710 Sodium 140 14611 Potassium 3.4 5.112 Chloride 97 108

S1Sh.cuo Meaninglesscolumntitles

Specialcharacterscancausetextmining

errors

Nounits

Unhelpfuldocumentname

Undefinedabbreviation

Formattingforinformationthatshouldbeinmetadata

Creditto:IainHrynaszkiewicz

….whichthisisn’t...

A B C D E F1 Parameter Day Control Treated Units P2 Sodium 0 139 142 mEq/l 0.823 Sodium 7 140 146 mEq/l 0.704 Sodium 14 140 158 mEq/l 0.035 Sodium 21 143 160 mEq/l 0.026 Potassium 0 3.3 4.8 mEq/l 0.067 Potassium 7 3.4 5.1 mEq/l 0.078 Potassium 14 3.7 4.7 mEq/l 0.109 Potassium 21 3.1 3.6 mEq/l 0.5210 Chloride 0 100 108 mEq/l 0.5611 Chloride 7 97 108 mEq/l 0.6812 Chloride 14 101 106 mEq/l 0.79

Table_S1_Shanghai_blood.xls

Creditto:IainHrynaszkiewicz

….Thisismuchclearer!

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared…

Fromnaturallanguagetostructureddata

Age valueUnitStrain nameSubject of the experimentType of diet and experimental conditionAnatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

Type of protocol – cell preparation

Type of protocol - sample treatment

Type of protocol – liver preparation

Fromnaturallanguagetostructureddata

• Data/content standards:

• Structure, enrich and report the description of the

datasets and the experimental context under which they

were produced

• Facilitate the discovery, sharing, understanding and

reuse of datasets

Datahastobestructuredforsharing– weneedstandards

de jure de facto

grass-rootsgroups

standard organizations

Nanotechnology Working Group

Communitymobilisationtodevelopcontentstandards

Formats Terminologies Guidelines

Guidelines=Minimuminformation

reportingrequirements,checklists

o Reportthesamecore,essential

information

o e.g.ARRIVEguidelines

Terminologies=Controlled

vocabularies,taxonomies,

thesauri,ontologies etc.

o Usethesamewordand

refertothesame‘thing’

o e.g.GeneOntology

Models/Formats=Conceptual

model,conceptualschema,

exchangeformats

o Allowdatatoflowfromone

systemtoanother

o e.g.FASTA

Enablers:tobetterdescribe,shareandquerydata

Formats Terminologies Guidelines

19385

346

miameMIAPA

MIRIAMMIQASMIX

MIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

MAGE-TabGCDML

SRAxmlSOFT FASTA

DICOM

MzMLSBRML

SEDML…

GELML

ISA-Tab

CML

MITAB

AAOCHEBI

OBIPATO ENVOMOD

BTOIDO…

TEDDY

PROXAO

DO

VO

Thereareover600standardsinthelifesciences

Formats Terminologies Guidelines

Datapolicies(30)

Databases(763)

data/metadatastandards(652)

Acomplexandevolvinglandscape

Formats Terminologies Guidelines

Isthereadatabase,implementingstandards,wheretodepositmy

metagenomics dataset?

Myfunder’sdatasharingpolicyrecommendstheuseof

establishedstandards,butwhichonesarewidelyendorsedandapplicabletomytoxicological

andclinicaldata?

AmIusingthemostup-to-dateversion ofthisterminologytoannotatecell-basedassays?

Iunderstandthisformathasbeendeprecated;whathasbeenreplaced by

andhowisleadingthework?

Aretheredatabasesimplementingthisexchangeformat,whosedevelopment

wehavefunded?

Whatarethematurestandards and

standards-compliantdatabasesweshouldrecommendtoour

authors?

Helpingpeoplemaketherightdecision

IntroducingBioSharing

Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences

Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences

1,400recordsandgrowing

WhatisBioSharing?

Aweb-based,curated andsearchableportalthat monitorsthedevelopmentandevolution ofstandards,theiruse indatabasesandtheadoptionofbothin

datapolicies, toinformandeducatetheusercommunity.

Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences

Mapping the landscape of ‘standards’ in the life, environmental and biomedical sciences

WhatisBioSharing?

Launchedin2011,asanevolutionoftheMIBBIportal(2008-2011)ManuallycuratedCommunitydriven

Growinguserbase andvisibility

1,400recordsandgrowing

alsooperatesasaWG inRunat isalsoan Resource that

TheBioSharing community

1,400recordsandgrowing

Howdowedescribestandards?

23

Criteriaforevaluatingstandards

Linkingstandards,databasesandpolicies

Model/format formalizingreportingguideline -->

<-- Reportingguidelineusedbymodel/format

Cross-linkingstandardstostandardsanddatabases

Model/format formalizingreportingguideline -->

<-- Reportingguidelineusedbymodel/format

Cross-linkingstandardstostandardsanddatabases

Indicatorsoflifecyclestatus

Readyforuse,implementation,orrecommendation

Indevelopment

Statusuncertain

Deprecatedassubsumedorsuperseded

Manuallycurated,approvedbythecommunity

AninformativeandeducationalresourceSimpleandadvancedsearches,askourwizardor

viewjournalrecommendations

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Search,filter,andrefineusingourfacetedsearchSearch,filter,andrefineusingourfacetedsearch

Collections grouptogether

oneormoretypesofresource

bydomain,projector

organization.

Recommendations areacore-

setofresourcesthatare

selectedandrecommended

byafunderorjournaldata

policy.

Standardsanddatabasesrecommended byjournaldatapolicies

Standardsanddatabasesrecommended byjournaldatapolicies

Thewizard:

• Guidesusersthroughthedata• Willgrowinfunctionalityand

complexity,basedonuserfeedback

• Poweredbycurateddescriptionsofeachstandardanddatabase,andtheirrelations

Isthereadatabase,implementingstandards,wheretodepositmy

metagenomics dataset?

Myfunder’sdatasharingpolicyrecommendstheuseof

establishedstandards,butwhichonesarewidelyendorsedandapplicabletomytoxicological

andclinicaldata?

AmIusingthemostup-to-dateversion ofthisterminologytoannotatecell-basedassays?

Iunderstandthisformathasbeendeprecated;whathasbeenreplaced by

andhowisleadingthework?

Aretheredatabasesimplementingthisexchangeformat,whosedevelopment

wehavefunded?

Whatarethematurestandards and

standards-compliantdatabasesweshouldrecommendtoour

authors?

Helpingpeoplemaketherightdecision

BioSharing – whatwedo

Inform – what’soutthere,whichdatabasesusewhichstandards.Mapthelandscape.

Educate– whatdatabasesarerecommendedbyyourfunder,orjournalofchoice,whichstandardsshouldyoubeusing,whichstandardsanddatabasesshouldyourecommend?Explorethelandscape.

Acknowledgements

EamonnMaguire,DPhilSoftwareEngineer(contractor)

PhilippeRocca-Serra,PhDSeniorResearchLecturer

AlejandraGonzalez-Beltran,PhDResearchLecturer

MiloThurston,DPhilResearchSWEngineer

MassimilianoIzzo,PhDResearchSWEngineer

PeterMcQuilton,PhDKnowledgeEngineer

AllysonLister,PhDKnowledgeEngineer

DavidJohnson,PhDResearchSWEngineer

Susanna-AssuntaSansone,PhDCentre’sAssociateDirector,PrincipalInvestigatorandSpringerNature’sConsultantforScientificData