Materials Registry Working Group Meeting...• Clare Paul – Air Force Research Laboratory •...
Transcript of Materials Registry Working Group Meeting...• Clare Paul – Air Force Research Laboratory •...
MaterialsRegistryWorkingGroupMeeting
ChandlerBeckerandRayPlante
Sharief Youssef,AldenDima,ZacharyTrautt,KimberlyTryka,RobertHanisch,JimWarren,MaryBrady
NationalInstituteofStandardsandTechnology
LauraBartoloNorthwesternUniv.
RDAPlenary,15Sept2016
CurrentWGmembers(9/13/16)
• BrianMatthews– ScienceandTechnologyFacilities
Council• ChandlerBecker
– NationalInstituteofStandardsandTechnology
• ClarePaul– AirForceResearchLaboratory
• DeborahMies– Granta Design,Ltd.
• Haiqing Yin– BeijingUniv.ofScienceandTech.
• JamesWarren– NationalInstituteofStandardsand
Technology• KathleenFontaine
– RochesterPolytechnicInstitute(RDA)• LauraBartolo
– NorthwesternUniv.
• RaphaelRitz– MaxPlanckSociety,Garching
• RaymondPlante– NationalInstituteofStandardsand
Technology• RobertHanisch
– NationalInstituteofStandardsandTechnology
• Sharief Youssef– NationalInstituteofStandardsand
Technology• TobiasWeigel
– GermanClimateComputingCenter(DKRZ)
• Vasily Bunakov– ScienceandTechnologyFacilities
Council• ZacharyTrautt
– NationalInstituteofStandardsandTechnology
Sayafewwordsaboutyourproject.
Anynewpeoplewhohaven’tjoinedyet?
Overview
• MotivationforWGandrelatedefforts• Summaryoftimelinefromcasestatement• ProposedactivitiesandtopicstoaddressinWG
• NISTpiloteffort• Thoughtsonhowtoproceed,followedbydiscussion
Motivationfortheworkinggroup
• Manymaterialsresourcesexist(datasets,websites,repositories,registries,etc.),andthenumberisgrowing.
• Howcanwelinktheminawaythatmakesiteasiertofindandsharerelevantinformationanddata?
Startbycreatingcatalogsofresources
Hostedinmanydifferentlocationswithdiversecontent
Thenconnectthem
Viadata- andinformation-sharingprotocols
Whatisaregistry?
• Registryisacatalogcontainingdescriptionsofresources* thatareusefulfor(materialsscience)data-drivenresearch* Mainlydatasets,databases,anddataservices* Canalsobeportals,software,organizations,…
• Astartingpointfordiscovering usefuldataandtools– Bymakingthemetadatadescriptionssearchable– Candirectuserstothewebsitesthathostthedata
Connectedcatalogs
Turninto…
BuildingaRegistryFederation• Whatdoesfederationmean?
– Comprisedofanetworkofregistries;thereisnosingleRegistryAnyregistrycancollectaglobally-comprehensivecollectionofresourcedescriptionsandmakeitsearchable
– ResourcemetadataexchangeThereacommonmechanism(s)forsharingdescriptionsofavailabledataresources
– AllowlocalmetadatacurationAnyorganizationcanrunregistryoftheirowndataresourcesandshareitwiththeworld
• Whyfederate?– Distributemetadatacuration
Allowexpertswhoprovide/operatedataresourcestomanagehowtheyaredescribed,updatedescriptionsastheyevolve
– Nosinglepointoffailure(includingfundingfailure)– Allowinnovationinprovidingsearchcapabilities
• Howdowefederate?– Commonmetadataexchangemechanism
WeproposestartingwithOAI-PMH– Commonmetadataschema
ARegistryFederation
Local PublishingRegistry
FullSearchableRegistry
FullSearchableRegistry
DataCenters
harvest(pull)
Local PublishingRegistry
Portal
Dataset
Database
Dataset
Portal
Database
Small DataProviders
Dataset
Dataset
Dataset
e.g.operatedbyNIST
e.g.operatedby MaterialsDataFacility
MaterialsProject
NOMAD
DataRepository
MDF
datacollections
manualentry
harvest(pull)
Keyfeatures:- Networked(robustness)- Exchangelocallycuratedmetadata- Supportsearchanddiscovery
WantThis…
Commonprotocols
Mappingsbetweencontentandapproachesofdifferentprojects
NotThis…
Lotsofincompatibleresourcesandcatalogs
Confusion,frustration,dataloss,missedopportunity
Words,words,words
• Forthistowork,weneedwordsthatdescribetheresources beingregistered
• Sometermsaregeneric(basedonDublinCore(dublincore.org)):– Organization– Contactinformation– Accessmethodsandlocations
• Butothershavetobedomain- (i.e.,materials-)specific
• Notthecompletemetadatarequiredtofullydocumentthedataintheresource
• Wanttobeuser-friendly,whichcurrentlymeansselectingfromarelativelylimitedlistofhigh-leveltermsandusingsearchablefreetext
ResourceConceptModel
• AResource is– Athingwewanttodescribeanddiscover– Anidentified,described,anddiscoverablecomponentofthedistributed
dataenvironment– Differenttypesofresources
(somecanbeofmultipletypessimultaneously)– Modelimpliessomecommonmetadata,eachsubtypecanadd
additionalmetadata
What kinds of Resources do we want to share and discover?
CategoriesofResourceMetadata
• Identity -- howwerecognizeit• Role – whattypeofResourceisit• Publication -- whoisresponsible• Content -- whatitisabout• Access -- howtogetatit• Applicability -- howitappliestodifferentdomains– Canhavemultipleentries,eachcontainingmetadataspecifictoadifferentdomain
– IncludeasectionforMaterialsSciencemetadata
MetadataExchange:Formats
• Format:Howtoencodemetadata• Commonencodingmechanismscurrentlyinuse:
– XML(asdefinedbyXMLSchema)• recommendedforWGdeliverable
– JSON(asdefinedbyJSONSchema)– JSON-LD
• WorkatNIST:interoperabilitybetweentheseformats– Bestpracticesfordefineformatschemas– Providestechnicalmechanismforsupportingextensibility– Enablewell-definedmechanismstoconvertbetween– https://github.com/usnistgov/mgi-resmd
• Collaborationonschemawelcome– GeneralResourcemetadata,formatting(viaSchema)– MaterialsScience-specificmetadata
TechnicalCollaboration
• “Entry-level”involvement– Describeyourresourcesatoneofthecommunityregistries
• Contributetometadataschemadevelopment• Operatearegistryforyourorganization– CanrunaninstanceoftheNISTMRRapplication– Goodifyouhavealargernumberofrecordstoshare– Canconnecttoyourlocalmetadatainfrastructure
• Createyourownregistryapplication– Supportexchangeformat– SupportOAI-PMH;helpsetprofile– Prototypealternateexchangemechanisms
17
“DoIhavetogiveyoumydata?”
• NO.
• ThedatacanbehostedsomewhereandanentryaddedtotheNMRR(oranotherinstance)topointtowherethedataisandhowtoaccessit.
• Companies,universities,otheragencies,professionalsocieties,etc.,areallwelcometoparticipate,maintainingcontroloverhowtheirdataisstoredandaccessed.
• Greatvaluetosmallerprojectsandtargetedcollections
• WearereadytostarttestingmetadataexchangeviaOAI-PMH
IntentforNISTregistryinstances
• Workwithotherstoimprovedatasharinganddiscoverythroughafederatedsystem
• Possible“registryofregistries”tofacilitateaccessacrossmultipleregistriesandinstitutions
• EventuallyprimarilyhaverecordsforNIST-specificresources(projects,data,software,etc.)
• HostfocusedregistryinstancesforparticularapplicationsinwhichNISTworksorhasaninterest
Workinggroupoverview
• CasestatementsubmittedJan.2016
• Proposedtimelineof12-18monthsforapilotmaterialsresourceregistrysystem
• ApprovedJuly2016– thusdatesarenowshiftedbacksixmonthsfromtheoriginalproposal
Fulltimeline• Month1(Jul’16)
– recruitdomainspecialiststoparticipateinWG• Month2(Aug/Sep’16)
– initiatediscussionsaboutconductingasurveyofexistingmaterialssciencedataproviders– develop20typicaldatadiscoveryqueriestoinformmetadatadiscussions
• Month3(Sep/Oct’16)– holdmeetingtodraft1st versionofmetadataextensionstoDublinCore
• Months4-8(Oct‘16-Feb’17)– disseminatedrafttothematerialssciencecommunity,bothwithinandexternaltoRDA,and
solicitfeedback• Month8(Feb’17)
– holdsecondtwo-daymeetingtorefinemetadataextensionsandestablishimplementationpilotprogram
– E.g.,NMRR,MDF,othersTBDwithinWG• Months9-12(Mar– Jun’17)
– implementpilotfederatedregistryandrecruittesters/evaluators– evaluategranularityissues– writebestpracticesguidelinesdocument
• Months13-15(Jul– Sep’17)– finetunemetadatadefinitionsanddocumentmetadatadevelopmentprocess:whatworked
well,whatdidn’t– expandcontentofpilotregistry
• Months16-18(Oct– Dec ’17)– Prepare finaldocumentfordelivery toRDA
Deliverables
• TwomaindeliverablesforWG:1. Reportcontainingmaterialsmetadataextensions
toDublinCore2. Pilotwithconnectedregistriestodemonstrate
harvesting
• Plussmalleritemsalongtheway(meetings,drafts,etc.)
Wherearewe?
• WGhasbeencreatedwithaninitialrosterofmembers
• Atthismeeting,weareidentifyingknowneffortsanddiscussingmaterialssciencequeries
• Needtodeterminemechanismanddateofnextmeeting.Telecon?Partofanexistingmeeting(e.g.,CHiMaD)?
• Needtoplanmeetingforapprox.March2017
Identificationofexistingefforts
• Registriesandprojectswithdatasharingenabled– E.g.,nanoHUB,MaterialsDataFacility,NoMaD,NIMS,Citrine,+?
• Ontologies,vocabularies,etc.– CollectitemsonWGwikipageforthiseffort?– XML-basedschemarepositoryunderdevelopment
Previouswordplay work
• Someschemas,vocabularies,andontologies– MatML,ThermoML,Plinius ontology,Ashino ontology,MatOnto,
PREMLP,ONTORULE(steels),SLACKS,MatOWL,matvocab– Nicereviewarticle:
• X.Zhang,C.Zhao,andX.Wang,ComputersinIndustry,73(2015)8-22.
• Covervariousareasbutnoteverything
• Somearebeingdeveloped(atalllevels),othersaredormant
• Othersareproprietaryorhaven’tbeenpubliclyreleased
Exampleeffort:NISTpilot
NISTMaterialsResourceRegistry
• Generalmaterialsscienceresources– ~70resourcesatthemoment;
workingtomigrateothersfromtheMGIcodecatalog
• Intendedtointeractwithotherregistriesthataremorefocusedand/orhousedatotherinstitutions
• OAI-PMHprotocolenabled,builtontheMaterialsDataCuration Systemplatform– CodeonGitHub– Butdon’trequireothers
tousethesamesoftware!
BrowseRegisteredResources
Linkstoregisteredresources andmoreinformation
Differenttypesofresources,including:• Organizations• Collections• Services• Software
Changewhichfieldsaredisplayed
SearchforresourcesAllmetadatatextissearchable
MovingtowardresourcesconnectedbymetadataharvestingprotocolssuchasOAI-PMH
Forexample:- Materialsdatafacility- Instanceshostedby
universitiesorprofessionalsocieties
- OtherimplementationsthatuseOAI-PMHbutdifferentcode
GetMoreInformationDetailedinformationaboutresources,including
• whocreatedthem• whomaintainsthem• whattheycontain• howtoaccessthem
Pluslinkstotheresourcesthemselves
Experimental&computational
AddaResource
BuiltontheMaterialsDataCuration Systemsoftware,butwithaspecializedschemaandinterface
SeedNMRRmetadatafieldsVersion1.ThesewillchangebasedonWGefforts!
Workinggroupactivities
WGitemsfordiscussion
• CommonlocationforourworkonRDAWGwebsite
• Identificationofexistingprojectsandresources.Whatothereffortsarerepresentedhere?
• Identificationofvocabs/ontologies/etc.• Identificationoftechnicalissues• Planningforfollow-upmeetingsandactivities• Identificationofvolunteersandinterestedpeople
Whatdoyouwanttoshareorbeabletofind?Whatdatasharingeffortsareunderway?
Samplequeries
• Al6065mechanicalproperties• EnvironmentaldegradationdataforPEinhumidity• Finiteelementmodelsofturbineblades• OpticalmicrographsofgammaphasesinNi3Al• CompoundformationenergiesforB2-NiAl• Sinteringtemperaturesforzirconiapowders• DielectricpropertiesforGaAs• Calphad modelsofInGaAs andrelatedmaterials• Dataforxalloyprocessedwithymethodandanalyzedwithzequipment
• …
AdditionalQueries?
RDAwebsites
• Interestgroup– https://rd-alliance.org/groups/rdacodata-materials-data-infrastructure-interoperability-ig.html
• Workinggroup– https://rd-alliance.org/groups/working-group-international-materials-resource-registries.html
• Casestatement– https://rd-alliance.org/group/international-materials-resource-registries-wg/case-statement/case-statement-rda-working-group