Preserving scientific data on our physical universe: a new strategy for archiving the nation's...
Transcript of Preserving scientific data on our physical universe: a new strategy for archiving the nation's...
![Page 1: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/1.jpg)
PreservingScientificDataOnOurPhysicalUniverse
ANewStrategyforArchivingtheNation'sScientificInformationResources
SteeringCommitteefortheStudyontheLong-termRetentionofSelectedScientificandTechnicalRecordsoftheFederalGovernment
CommissiononPhysicalSciences,Mathematics,andApplications
NationalResearchCouncil
NATIONALACADEMYPRESSWashington,D.C.1995
![Page 2: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/2.jpg)
title:
PreservingScientificDataOnOurPhysicalUniverse:ANewStrategyforArchivingtheNation'sScientificInformationResources
author:publisher: NationalAcademiesPress
isbn10|asin: 030905186Xprintisbn13: 9780309051866ebookisbn13: 9780585022888
language: English
subject
Communicationinscience--Governmentpolicy--UnitedStates,Science--UnitedStates--Dataprocessing,Technology--UnitedStates--Dataprocessing,Informationstorageandretrievalsystems--Science.
publicationdate: 1995lcc: Q224.3.U6N371995ebddc: 353.00819
subject:
Communicationinscience--Governmentpolicy--UnitedStates,Science--UnitedStates--Dataprocessing,Technology--UnitedStates--Dataprocessing,Informationstorageandretrievalsystems--Science.
![Page 3: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/3.jpg)
NOTICE:TheprojectthatisthesubjectofthisreportwasapprovedbytheGoverningBoardoftheNationalResearchCouncil,whosemembersaredrawnfromthecouncilsoftheNationalAcademyofSciences,theNationalAcademyofEngineering,andtheInstituteofMedicine.Themembersofthecommitteeresponsibleforthereportwerechosenfortheirspecialcompetencesandwithregardforappropriatebalance.
ThisreporthasbeenreviewedbyagroupotherthantheauthorsaccordingtoproceduresapprovedbyaReportReviewCommitteeconsistingofmembersoftheNationalAcademyofSciences,theNationalAcademyofEngineering,andtheInstituteofMedicine.
TheNationalAcademyofSciencesisaprivate,nonprofit,self-perpetuatingsocietyofdistinguishedscholarsengagedinscientificandengineeringresearch,dedicatedtothefurtheranceofscienceandtechnologyandtotheiruseforthegeneralwelfare.UpontheauthorityofthechartergrantedtoitbytheCongressin1863,theAcademyhasamandatethatrequiresittoadvisethefederalgovernmentonscientificandtechnicalmatters.Dr.BruceAlbertsispresidentoftheNationalAcademyofSciences.
TheNationalAcademyofEngineeringwasestablishedin1964,underthecharteroftheNationalAcademyofSciences,asaparallelorganizationofoutstandingengineers.Itisautonomousinitsadministrationandintheselectionofitsmembers,sharingwiththeNationalAcademyofSciencestheresponsibilityforadvisingthefederalgovernment.TheNationalAcademyofEngineeringalsosponsorsengineeringprogramsaimedatmeetingnationalneeds,encourageseducationandresearch,andrecognizesthesuperiorachievementsofengineers.Dr.RobertM.WhiteispresidentoftheNationalAcademyofEngineering.
TheInstituteofMedicinewasestablishedin1970bytheNational
![Page 4: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/4.jpg)
AcademyofSciencestosecuretheservicesofeminentmembersofappropriateprofessionsintheexaminationofpolicymatterspertainingtothehealthofthepublic.TheInstituteactsundertheresponsibilitygiventotheNationalAcademyofSciencesbyitscongressionalchartertobeanadvisertothefederalgovernmentand,uponitsowninitiative,toidentifyissuesofmedicalcare,research,andeducation.Dr.KennethI.ShineispresidentoftheInstituteofMedicine.
TheNationalResearchCouncilwasestablishedbytheNationalAcademyofSciencesin1916toassociatethebroadcommunityofscienceandtechnologywiththeAcademy'spurposesoffurtheringknowledgeandadvisingthefederalgovernment.FunctioninginaccordancewithgeneralpoliciesdeterminedbytheAcademy,theCouncilhasbecometheprincipaloperatingagencyofboththeNationalAcademyofSciencesandtheNationalAcademyofEngineeringinprovidingservicestothegovernment,thepublic,andthescientificandengineeringcommunities.TheCouncilisadministeredjointlybybothAcademiesandtheInstituteofMedicine.Dr.BruceAlbertsandDr.RobertM.Whitearechairmanandvicechairman,respectively,oftheNationalResearchCouncil.
SupportforthisprojectwasprovidedbytheNationalArchivesandRecordsAdministration(underContractNo.NAMA-S-92-0019),theNationalOceanicandAtmosphericAdministration(underContractNo.50-DGNE-3-00105),andtheNationalAeronauticsandSpaceAdministration(underContractNo.S-54040-Z).Theviewsexpressedinthisreportarethoseoftheauthorsanddonotnecessarilyreflecttheviewsofthesponsoringagenciesorsubagencies.
LibraryofCongressCatalogCardNumber94-68991InternationalStandardBookNumber0-309-05186-X
Additionalcopiesofthisreportareavailablefrom:
![Page 5: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/5.jpg)
NationalAcademyPress2101ConstitutionAve.,NWBox285Washington,DC20055800-624-6242202-334-3313(intheWashingtonMetropolitanArea)
B-499
Copyright1995bytheNationalAcademyofSciences.Allrightsreserved.
PrintedintheUnitedStatesofAmerica
![Page 6: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/6.jpg)
Pageiii
SteeringCommitteeForTheStudyOnTheLong-TermRetentionOfSelectedScientificAndTechnicalRecordsOfTheFederalGovernmentJEFFDOZIER,UniversityofCalifornia,SantaBarbara,Chair
SHELTONALEXANDER,PennsylvaniaStateUniversity
MARJORIECOURAIN,Consultant(deceased,January14,1994)
JOHNA.DUTTON,PennsylvaniaStateUniversity
WILLIAMEMERY,UniversityofColorado
BRUCEGRITTON,MontereyBayAquariumResearchInstitute
ROYJENNE,NationalCenterforAtmosphericResearch
WILLIAMKURTH,UniversityofIowa
DAVIDLIDE,Consultant,Gaithersburg,Maryland
B.K.RICHARD,TRW
JOANWARNOW-BLEWETT,AmericanInstituteofPhysics
NationalResearchCouncilStaff
PaulF.Uhlir,AssociateExecutiveDirector,CommissiononPhysicalSciences,Mathematics,andApplications
MarkDavidHandel,ProgramOfficer,BoardonAtmosphericSciencesandClimate
AliceKillian,ResearchAssociate,CommissiononGeosciences,Environment,andResources
JamesE.Mallory,StaffOfficer,ComputerScienceandTelecommunicationsBoard
![Page 7: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/7.jpg)
ScottT.Weidman,SeniorProgramOfficer,BoardonChemicalSciencesandTechnology
JulieM.Esanu,ResearchAssistant,CommissiononPhysicalSciences,Mathematics,andApplications
DavidJ.Baskin,ProjectAssistant,CommissiononPhysicalSciences,Mathematics,andApplications
![Page 8: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/8.jpg)
Pageiv
CommissionOnPhysicalSciences,Mathematics,AndApplicationsRICHARDN.ZARE,StanfordUniversity,Chair
RICHARDS.NICHOLSON,AmericanAssociationfortheAdvancementofScience,ViceChair
STEPHENL.ADLER,InstituteforAdvancedStudy
SYLVIAT.CEYER,MassachusettsInstituteofTechnology
SUSANL.GRAHAM,UniversityofCaliforniaatBerkeley
ROBERTJ.HERMANN,UnitedTechnologiesCorporation
RHONDAJ.HUGHES,BrynMawrCollege
SHIRLEYA.JACKSON,DepartmentofPhysics
KENNETHI.KELLERMANN,NationalRadioAstronomyObservatory
HANSMARK,UniversityofTexasatAustin
THOMASA.PRINCE,CaliforniaInstituteofTechnology
JEROMESACKS,NationalInstituteofStatisticalSciences
L.E.SCRIVEN,UniversityofMinnesota
A.RICHARDSEEBASSIII,UniversityofColorado
LEONT.SILVER,CaliforniaInstituteofTechnology
CHARLESP.SLICHTER,UniversityofIllinoisatUrbana-Champaign
ALVINW.TRIVELPIECE,OakRidgeNationalLaboratory
![Page 9: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/9.jpg)
SHMUELWINOGRAD,IBMT.J.WatsonResearchCenter
CHARLESA.ZRAKET,MITRECorporation(retired)
NORMANMETZGER,ExecutiveDirector
PAULF.UHLIR,AssociateExecutiveDirector
![Page 10: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/10.jpg)
Pagev
PrefaceInJanuary1992theNationalArchivesandRecordsAdministration(NARA)sponsoredathree-dayplanningmeetingattheNationalResearchCouncil(NRC)toreviewtheissuesrelatedtothelong-termretentionofthefederalgovernment'sscientificandtechnicaldatainthephysicalsciences.TheplanningmeetingwasorganizedbytheNRC'sCommissiononPhysicalSciences,Mathematics,andApplicationsandprovidedthebasisforthisstudy,whichwasinitiatedinthefallof1992attherequestofNARA.TheNationalOceanicandAtmosphericAdministration(NOAA)andtheNationalAeronauticsandSpaceAdministration(NASA)subsequentlyprovidedadditionalsupport.
Thestudy'ssteeringcommittee,inconsultationwiththesponsors,developedthefollowingchargetoguidethewritingofthisreport:Describethestatusandplansforthegovernment'sarchivingofobservationalandexperimentaldatainthephysicalsciences.Identifytheprincipalscientific,technical,informationmanagement,andinstitutionalissuesregardingthepermanentarchivingofsuchdata.Assessthecommonalitiesanddifferencesamongthecasestudiesprovidedbythepanelsorganizedunderthisstudy(seebelow)inordertodeterminetheextenttowhichcommonlong-termretentionpoliciesandappraisalguidelinescanbeappliedtodisciplinesthatcollectobservationalandexperimentaldatainthephysicalsciences.Establishasetofgoals,principles,andpriorities,aswellasgenericretentioncriteriaandappraisalguidelinesthatNARAcanincorporateintoitsmission,program,andbudgetplanning.SuggestmechanismsandprocessesforNARAandNOAAtouseinimplementingaprogramofdataappraisal,retention,andpreservation,andlaterinevaluatingtheeffectivenessoftheprogram.Provideasummaryoffindings,conclusions,andrecommendations.
![Page 11: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/11.jpg)
Thesteeringcommitteeformedfivepanelsinspacesciences,atmosphericsciences,oceansciences,geosciences,andphysics,chemistry,andmaterialssciencestoprovidetheirviewsonthekeydataretentionissuesfromdifferentdisciplinaryperspectivesinthephysicalsciences.Thesepanelseachmettwiceandproducedasetofworkingpapers,whicharepublishedseparatelyinStudyontheLong-termRetentionofSelectedScientificandTechnicalRecordsoftheFederalGovernment:WorkingPapers(NationalAcademyPress,Washington,D.C.,1995).Theworkofthepanelswasinvaluabletothe
![Page 12: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/12.jpg)
Pagevi
steeringcommitteeinframingtheissues,informingitsconclusionsandrecommendations,andinproducingitsfinalreport.
Thereareseveralaspectsregardingthescopeandfocusofthisreportthatshouldbementioned.Thecommitteedevotedmostofitsattentiontodatastoredonelectronicmedia,ratherthanonpaperoronothermedia.Almostalldataarenowacquired,stored,anddistributedelectronically.Thus,thepreponderanceofdataarchivingproblemsandtheirsolutionsmustbeconsideredinthiscontext.Nevertheless,muchoftheadviceofferedhereisequallyrelevanttodatainotherformats.
Theprincipalfocusofthisreportisonthelong-termretentionofdatainthephysicalsciences.Muchofthediscussion,however,includesnear-termdatamanagementissues,becauseeffectivearchivingbeginswhentheplansforacquiringadatasetaremadeandextendsthroughoutthelifecycleofthedata.Althoughthefocusisexclusivelyondatainthephysicalsciences,thecommitteebelievesthatthedistinctionsithasdrawnbetweentheexperimentalandtheobservationaldata,aswellasthedatamanagementprinciplesithasprovided,arebroadlyapplicabletomostdataintheothernaturalsciences.Inaddition,thestrategicapproachadoptedbythecommitteenecessarilyinvolvesallfederalagenciesthatacquireandmanagephysicalsciencedata,andnotsimplythethreeagenciesthatsponsoredthisstudy.
Finally,itisnecessarytopointoutthatthecommitteewasunabletoachieveconsensusononemajorrecommendationofthestudy,namely,theproposaltoestablishtheNationalScientificInformationResource(NSIR)Federation.AppendixBcontainstheminorityopinionofthedissentingcommitteemember,RoyJenne.Therestofthecommitteemembers,whostronglysupporttheNSIRFederationrecommendation,aredisappointedbythislackofunanimityand
![Page 13: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/13.jpg)
considermanyoftheassertionsintheminorityopiniontobebasedonanerroneousinterpretationofwhatthereportactuallystatesorrecommends.Weleavethattothereadertojudge.Nevertheless,webelievethattheminorityopinioncanperhapsserveausefulpurposebydrawinggreaterattentiontotheseissuesandbybroadeningthediscussionofthemamongthesponsorsofthestudy,theotherscienceagencies,andtheresearchcommunity.
Inconclusion,thecommitteehopesthatitsadvicewillhelpbringaboutthechangesnecessarytoeffectivelypreservethevaluablescientificdataonourphysicaluniverse.
JeffDozierSteeringCommitteeChair
PaulF.UhlirStudyDirector
![Page 14: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/14.jpg)
Pagevii
AcknowledgmentsThesteeringcommitteeisverygratefultothemanyindividualswhoplayedasignificantroleinthecompletionofthisstudy,includingthemembersofthefiveadhocpanelsthatprovidedconclusionsandrecommendationsondataarchivingfromthedifferentphysicalsciencedisciplines;theindividualswhobriefedthesteeringcommitteeandpanels;andmembersoftheNationalResearchCouncil(NRC)staffwhoworkedonvariousaspectsofthisstudy.ThesteeringcommitteealsoextendsitsthankstoTrudyPetersonandKennethThibodeauoftheNationalArchivesandRecordsAdministration(NARA),WilliamTurnbullandHelenWoodoftheNationalOceanicandAtmosphericAdministration(NOAA),andJosephKingoftheNationalAeronauticsandSpaceAdministration(NASA),fromthestudy'ssponsoringagencies.
GerdRosenblatt,ofLawrenceBerkeleyLaboratory,chairedthePhysics,Chemistry,andMaterialsSciencesDataPanel.ThememberswereR.StephenBerry,UniversityofChicago;EdwardGalvin,TheAerospaceCorporation;J.G.Kaufman,TheAluminumAssociation;KirbyKemper,FloridaStateUniversity;DavidR.Lide,Jr.,consultant;andEdgarWestrum,Jr.,UniversityofMichigan.ThesteeringcommitteegratefullyacknowledgesthedetailedbriefingsandinformationprovidedtothispanelbyDonaldAlderson,DepartmentofDefenseNuclearInformationAnalysisCenter;FrankBiggs,SandiaNationalLaboratories;RobertBillingsley,DefenseTechnicalInformationCenter;MarkConrad,NARA;SuzanneLeech,Bionetics,Inc.;VictoriaMcLane,BrookhavenNationalLaboratory;andPatriciaSchuette,BattellePacificNorthwestLaboratory.
TheSpaceSciencesDataPanelwaschairedbyChristopherRusselloftheUniversityofCaliforniaatLosAngeles.Thepanelmemberswere
![Page 15: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/15.jpg)
GuiseppinaFabbiano,Harvard-SmithsonianCenterforAstrophysics;SarahKadec,consultant;WilliamKurth,UniversityofIowa;StevenLee,UniversityofColorado;andR.StephenSaunders,JetPropulsionLaboratory.Thesteeringcommitteeextendsitsthanksfortheassistanceofthefollowingindividuals,whoprovidedbriefingsandotherinformationtotheSpaceSciencesDataPanel:JoeAllen,NationalGeophysicalDataCenter;StevenBlair,LosAlamosNationalLaboratory;JosephBredekamp,NASA;DeanBundy,NavalResearchLaboratory;DaviddeYoung,NationalOpticalAstronomyObservatories;RobertFrederick,AirForceSpaceForecastCenter;JosephKing,NationalSpaceScienceDataCenter;KnoxLong,SpaceScienceTelescopeInstitute;GuentherRiegler,NASAAstrophysicsDivision;ThomasSmithandJudStailey,AirForceEnvironmentalTechnicalApplicationsCenter;EarlTech,LosAlamosNationalLaboratory;RaymondWalker,UniversityofCaliforniaatLosAngeles;andJamesWillet,NASASpacePhysicsDivision.
![Page 16: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/16.jpg)
Pageviii
WernerBaum,ofFloridaStateUniversity,wasthechairoftheAtmosphericSciencesDataPanel.ThememberswereMarjorieCourain,consultant(deceased,January14,1994);WilliamHaggard,ClimatologicalConsultingCorporation;RoyJenne,NationalCenterforAtmosphericResearch;KellyRedmond,DesertResearchInstitute;andThomasVonderHaar,ColoradoStateUniversity.ThesteeringcommitteegratefullyacknowledgesthediverseandsubstantialinputsprovidedbythefollowingindividualstotheAtmosphericSciencesDataPanel:LarryBaume,NARA;ThomasBoden,CarbonDioxideInformationandAnalysisCenter;DeanBundy,NavalResearchLaboratory;DonaldCollins,NASA;RichardDavis,NationalClimaticDataCenter,P.C.Hariharan,JohnsHopkinsUniversity;andGeraldStokes,PacificNorthwestLaboratories.
TheOceanSciencesDataPanelwaschairedbyBruceGritton,MontereyBayAquariumResearchInstitute.ThememberswereRichardDugdale,UniversityofSouthernCalifornia;ThomasDuncan,UniversityofCaliforniaatBerkeley;RobertEvans,RosenstielSchoolofMarineandAtmosphericScience;TerrenceJoyce,WoodsHoleOceanographicInstitution;andVictorZlotnicki,JetPropulsionLaboratory.ThesteeringcommitteeextendsitsthanksforthebriefingsandotherinformationprovidedtotheOceanSciencesDataPanelbyLarryBaume,NARA;DonaldCollinsandSusanDigby,JetPropulsionLaboratory;RonaldFauquet,NOAA;TedTsui,NavalResearchLaboratory;andR.S.Winokur,OfficeofNavalResearch.
TheGeoscienceDataPanelwaschairedbyTheodoreAlbert,aprivateconsultant.ThememberswereSheltonAlexander,PennsylvaniaStateUniversity;SaraGraves,UniversityofAlabamainHuntsville;DavidLandgrebe,PurdueUniversity;andSorooshSorooshian,UniversityofArizona.ThesteeringcommitteegratefullyacknowledgestheinformationprovidedatthemeetingsoftheGeosciencesDataPanelbythefollowingindividuals:RogerBarry,NationalSnowandIce
![Page 17: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/17.jpg)
DataCenter;DanielCavanaugh,U.S.GeologicalSurvey;DonaldCollins,JetPropulsionLaboratory;KatrinDouglass,SouthernCaliforniaEarthquakeCenterDataCenter;WilliamDraegar,U.S.GeologicalSurvey;JohnDwyer,NARA;ClaireHenson,NationalSnowandIceDataCenter;HerbMeyers,NationalGeophysicalDataCenter;RonWeaver,NationalSnowandIceDataCenter;andThomasYorke,U.S.GeologicalSurvey.
Finally,thesteeringcommitteeisgratefultothestaffoftheNationalResearchCouncil:PaulF.Uhlir,associateexecutivedirectoroftheCommissiononPhysicalSciences,Mathematics,andApplications,whoservedasstudydirector;MarkDavidHandelandTheresaFisher(BoardonAtmosphericSciencesandClimate),AliceKillian(CommissiononGeosciences,Environment,andResources),JamesE.Mallory(ComputerScienceandTelecommunicationsBoard),andScottT.WeidmanandTañaSpencer(BoardonChemicalSciencesandTechnology),whoprovidedstaffsupportforthefivepanels;JulieM.Esanu,fortheprogramassistanceprovidedtothesteeringcommitteeandpanelsandforthepreparationofthefinalmanuscript;DavidBaskin,forhisworkonpreparingthefinalmanuscript;LizPanos,forcoordinatingthereportreview;andRoseannePrice,whoeditedthefinalmanuscript.
![Page 18: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/18.jpg)
Pageix
ContentsSUMMARY
1INTRODUCTION
ImperativesforPreservingDataonOurPhysicalUniverse
ANewFutureforScientificData
2THECHALLENGE:PRESERVATIONANDUSEOFSCIENTIFICDATA
ExperimentalLaboratoryData
ObservationalDatainthePhysicalSciences
SummaryofMajorIssues
3RETENTIONCRITERIAANDTHEAPPRAISALPROCESS
RetentionCriteria
OtherElementsoftheAppraisalProcess
Recommendations
4THEOPPORTUNITIES:THERELATIONSHIPOFTECHNOLOGICALADVANCESTONEWDATAUSEANDRETENTIONSTRATEGIES
EnablingTechnologiesandRelatedDevelopments
OpportunitiesforNewOrganizationalStructures
5ANEWSTRATEGYFORARCHIVINGTHENATION'SSCIENTIFICANDTECHNICALDATA
FundamentalPrinciplesforLong-termDataRetention
![Page 19: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/19.jpg)
TheProposedNationalScientificInformationResourceFederation
RecommendationsfortheCreationoftheNSIRFederation
RecommendationsSpecificallyforNARA
RecommendationsSpecificallyforNOAA
REFERENCES
APPENDIXAListofAcronyms
APPENDIXBMinorityOpinion
![Page 20: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/20.jpg)
ThisstudyisdedicatedinfondmemoryofMarjorieCourain.
![Page 21: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/21.jpg)
Page1
SummaryScientificdatareflectboththeorganizationandthechaosofthenaturalworld.Theystimulateustodevelopconcepts,theories,andmodelstomakesenseofthepatternstheyrepresent.Theresultingabstractionsaretheformalandsystematicideasthatconstitutetheunderstandingofrelationshipsbetweencausesandconsequences,andperhapsmayenablepredictionoffuturesequencesofevents.Becausescientiststransformdatafromthematerialworldintoideas,theobservationsofobjectsandprocessesinthephysicalworldarethestimuliofscientificthought.Dataarethustheseedsofscientificideas.
Therearestrongmotivationsforpreservingscientificobservations:Manyobservationsaboutthenaturalworldarearecordofeventsthatwillneverberepeatedexactly.Examplesincludeobservationsofanatmosphericstorm,adeepoceancurrent,avolcaniceruption,andtheenergyemittedbyasupernova.Oncelost,suchrecordscanneverbereplaced.Observeddataprovideabaselinefordeterminingratesofchangeandforcomputingthefrequencyofoccurrenceofunusualevents.Theyspecifytheobservedenvelopeofvariability.Thelongertherecord,thegreaterourconfidenceintheconclusionswedrawfromit.Adatarecordmayhavemorethanonelife.Asscientificideasadvance,newconceptsmayemergeinthesameorentirelydifferentdisciplinesfromstudyofobservationsthatledearliertodifferentkindsofinsights.Newcomputingtechnologiesforstoringandanalyzingdataenhancethepossibilitiesforfindingorverifyingnewperspectivesthroughreanalysisofexistingdatarecords.Thus,therelativeimportanceofdata,bothcurrentandhistorical,canchangedramatically,ofteninentirelyunanticipateddirections.Thesubstantialinvestmentsmadetoacquiredatarecordsjustifytheirpreservation.Thecostofpreservationwillalmostalwaysbesmallin
![Page 22: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/22.jpg)
preservation.Thecostofpreservationwillalmostalwaysbesmallincomparisonwiththecostofobservation.Becausewecannotpredictwhichdatawillyieldthemostscientificbenefitinyearsahead,thedatawediscardtodaymaybethedatathatwouldhavebeeninvaluabletomorrow.
Theassembledrecordofobservationaldatathushasdualvalue:itissimultaneouslyahistoryofeventsinthenaturalworldandarecordofhumanaccomplishment.Thehistoryofthephysicalworldisanessentialpartofouraccumulatingknowledge,andtheunderlyingdataformasignificantpartofthatheritage.Theyalsoportrayahistoryofourscientificandtechnologicaldevelopment.
![Page 23: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/23.jpg)
Page2
Therearenumeroussocioeconomicreasons,inadditiontothecompellingscientificandhistoricalmotivations,forthelong-termretentionofobservational,aswellascertaintypesofexperimental,data.Forexample,historicalclimatedatahavehadwell-documentedusesinabroadrangeofapplicationsinthemanufacturing,energy,agriculture,transportation,communications,engineering,construction,insurance,andentertainmentsectors.SuchapplicationsarecommonaswellforothertypesofobservationaldataontheEarth'senvironment.Experimentaldatainthephysicalsciencesalsohavemanyindustrialandotherpracticaluses.
Todaywecanforeseethepossibilityofusingthenationalresourceofscientificdatamoreadvantageouslythaneverbeforeastechnologicaladvancesopennewvistasformanagingscientificinformation.Advancesindatastoragetechnologiesmakethelong-termretentionofvirtuallyalldatabothfeasibleandaffordable.TheexistenceoftheInternetandoftheemergingNationalInformationInfrastructure(NII)enablesnationwidesharingandapplicationofdatathatresideinappropriatelyconfigureddatabases.
Ournewpowertostore,distribute,andaccessdataandinformationischangingthewayweworkandthink.However,thecommunitiesinvolvedinthecreation,retention,anduseofscientificdataaboutthephysicalworldarenotoptimallyorganized.Theycommonlyworktowarddisparategoals,arenotwellconnected,anddonottakefulladvantageoftechnologicalandconceptualadvancesindatamanagementandcommunication.Anentirelynewapproachtothelong-termpreservationofscientificdataisnowbothfeasibleandessential.Itmusttakeadvantageofadvancingtechnologyandofdistributedcommunicationsandmanagementstructurestoempowerboththecreatorsandtheusersofsuchdata.
Thisstudy,performedattherequestoftheNationalArchivesand
![Page 24: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/24.jpg)
RecordsAdministration(NARA),andpartiallysupportedbytheNationalOceanicandAtmosphericAdministration(NOAA)andtheNationalAeronauticsandSpaceAdministration(NASA),identifiesthemajorissuesregardingeffortstoarchiveandusedatainthephysicalsciences,establishesretentioncriteriaandappraisalguidelinesforthosedata,reviewsimportanttechnologicaladvancesandrelatedopportunities,andproposesanewstrategytohelpensureaccesstothedatabyfuturegenerations.
TheChallengeOfEffectivePreservationAndUseOfScientificData
Theresultsofscientificresearcharedisseminatedinthiscountrythroughahybridsystemthatincludesprofessionalsocietyandothernot-for-profitpublishers,thecommercialsector,andthegovernment.Theformaljournalsarepublishedlargelybytheprofessionalsocietyandcommercialsectors,whilegovernmentagenciesmanagelessformalreports(grayliterature).Secondaryabstractingandindexingservicesprovideaccesstothisliterature,increasinglybyelectronicmeans.Whiletherearestrainsinthissystembecauseofrisingcosts,increasingworkload,andissuesrelatedtotheprotectionofintellectualproperty,ithasservedU.S.sciencewellandhasbeenaninvaluablelinkintheprocessoftranslatingscientificadvancesintofurtheradvances,usefultechnology,andeconomicbenefits.
Thecurrentsystem,however,isnotwellsuitedtohandlethescientificandtechnicalelectronicdatabasesthatarethefocusofthisstudy.Thecostofmaintainingthesedatabasesistypicallytoogreattobecoveredbyuserfees;insteadthesedatabasesmustbeconsideredpartofthenationalscientificheritage.Somegovernmentagencieshaveacceptedresponsibilityformaintaininganddisseminatingthedataresultingfromtheirresearchanddevelopment.Insomecases,thissystemisworkingreasonablywell,butinothersthereareproblemsevenwithprovidingcurrentaccess.Archivingforthelongtermraisesquestions
![Page 25: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/25.jpg)
inallcases,however.
Ageneralproblemprevalentamongallscientificdisciplinesisthelowpriorityattachedtodatamanagementandpreservationbymostagencies.Experienceindicatesthatnewresearchprojectstendtogetmuchmoreattentionthanthehandlingofdatafromoldones,eventhoughthepayofffromoptimalutilizationofexistingdatamaybegreater.
![Page 26: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/26.jpg)
Page3
Withregardtolaboratorydata,governmentprogramshaveexistedsincethe1960stocompileresultsfromtheworldscientificliterature,tocheckthedatacarefully,andtopreparedatabasesofcriticallyevaluateddata.Despitechronicunderfunding,theseprogramshaveproduceddatabasesoflastingvaluetothenation,andthegovernmentinvestmentincreatingandmaintainingthesedatabaseshasbeenrepaidmanytimesover.
Intheareaofobservationaldatabases,thesituationismixed.Federalagenciescollectlargeamountsofobservationaldata,whichinmanycasesarecontinuouslyaddedtotheavailablerecordofEarthandspaceprocesses.Thedatasetsresultingfromtheseactivitiesaresometimeswell-documentedandmaintainedinreadilyaccessibleform;inmanyothercases,however,whilethedataaresaved,theyareexceedinglydifficultorimpossibletoaccessoruse,andthusareeffectivelyunavailable.
Themostimportantdeficienciesareinthedocumentation,access,andlong-termpreservationofdatainusableform.Insufficientdocumentationisagenericproblemthataffects,invaryingdegrees,alltheclassesofdataaddressedinthisstudy.Furthermore,fewofthefederaldatacenterscangiveadequateattentiontolong-termarchivingbecausetheyarestretchedthinbycurrentdemandsandinadequateresources.Eventhedatathatarearchivedmaybecomeinaccessiblebecausetheyarenotregularlymigratedtonewstoragemediaasthehardwareandsoftwareusedtoaccessthedatabecomeobsoleteorinoperable.
Anothermajorprobleminhibitingaccesstodataisthelackofdirectoriesthatdescribewhatdatasetsexist,wheretheyarelocated,andhowuserscanaccessthem.Inmanycasestheexistenceofthedataisunknownoutsidetheoriginalscientificgroups,andevenifknown,therefrequentlyisnotenoughinformationforapotentialuser
![Page 27: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/27.jpg)
toassesstheirrelevanceandusefulness.Thelackofadequatedirectoriesadverselyaffectstheexploitationofournationaldataresourcesandleadstounnecessaryduplicationofeffort.
Asignificantfractionofthearchivedscientificdataisheldbythefederalagenciesthatcollectedthedataaspartoftheirmission.However,alargeamountofvaluablescientificdatagatheredwithfederalfundsisneverarchivedormadeaccessibletoanyoneotherthantheoriginalinvestigators,manyofwhomarenotgovernmentemployees.Inmanyinstances,theorganizationsandindividualsthatreceivegovernmentcontractsorgrantsforscientificinvestigationsareundernoobligationtoretainthedatacollected,ortoplacetheminanaccessiblearchiveattheconclusionoftheproject.Thus,datasetsthatcommonlyaregatheredatgreatexpenseandeffortarenotbroadlyavailableandultimatelymaybelost,squanderingvaluablescientificresourcesandmuchofthepublicinvestmentspentinacquiringthem.Clearly,thereisagreatneedfortheagenciestogetmorereturnontheirinvestmentinsciencebythesimpleexpedientofmakingthedatacollectedundertheirauspicesaccessibletoothers.
Finally,theholdingsofscientificandtechnicaldatabyNARAinelectronicoranyotherformareverysmallincomparisonwiththedataholdingsofthefederalagenciesandtheorganizationssupportedbythem.Moreover,NARA'sbudgetforitsCenterforElectronicRecords,whichhastheformalresponsibilityforarchivingalltypesoffederalelectronicrecords,wasonly$2.5millioninFY1994,abudgetlowerthanthatofmanyoftheindividualagencydatacentersreviewedbythecommitteeinthisstudy.GivenNARA'scurrentandprojectedlevelofeffortforarchivingelectronicscientificdata,itisobviousthatNARAwillbeunabletotakecustodyofthevastmajorityofthesescientificdatasets.Therefore,acoordinatedeffortinvolvingNARA,otherfederalagencies,certainnonfederalentities,andthescientificcommunityisneededtopreservethemostvaluabledataandensurethattheywillremainavailableinusableformindefinitely.The
![Page 28: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/28.jpg)
challengeistodevelopdatamanagementandarchivingproceduresthatcanhandletherapidincreasesinthevolumesofscientificdata,andatthesametimemaintainolderarchiveddatainaneasilyaccessible,usableform.Animportantpartofthischallengeistopersuadepolicymakersthatscientificdataandinformationareindeedapreciousnationalresourcethatshouldbepreservedandusedbroadlytoadvancescienceandtobenefitsociety.
![Page 29: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/29.jpg)
Page4
RetentionCriteriaAndTheAppraisalProcess
TheNationalArchivesandRecordsAdministrationappraisesrecordsonthebasisoftheirinformationalandevidentialvalue.Itisconcernedwithrecordsoflong-termvalue,thoserecordsthatwillprobablyhavevaluelongaftertheyceasetohaveimmediate,orprimary,uses.Thevalueofscientificandtechnicaldataisprimarilyinformationalandisbasedonthescientificcontentoftherecords,ratherthanontheevidencetheyprovideconcerningtheactivitiesoftheagencythatcollectedorcreatedthem.
Recommendations
Therecommendationsbelowregardingtheretentioncriteriaandappraisalprocessshouldbeappliedbythoseresponsibleforstewardshiptoallphysicalsciencedata.Similarcriteriaandappraisalguidelinesmustbedevelopedfordatainotherdisciplines.ThisisatopicofprimaryconcernnotonlytoNARA,NOAA,andNASA,buttoallscientists,datamanagers,andarchivistswhoworkwithsuchrecords.
Asageneralrule,allobservationaldatathatarenonredundant,useful,anddocumentedwellenoughformostprimaryusesshouldbepermanentlymaintained.Laboratorydatasetsarecandidatesforlong-termpreservationifthereisnorealisticchanceofrepeatingtheexperiment,orifthecostandintellectualeffortrequiredtocollectandvalidatethedataweresogreatthatlong-termretentionisclearlyjustified.Forbothobservationalandexperimentaldata,thefollowingretentioncriteriashouldbeusedtodeterminewhetheradatasetshouldbesaved:uniqueness,adequacyofdocumentation(metadata),availabilityofhardwaretoreadthedatarecords,costofreplacement,andevaluationbypeerreview.Completemetadatashoulddefinethecontent,formatorrepresentation,structure,andcontextofadataset.
![Page 30: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/30.jpg)
Theappraisalprocessmustapplytheestablishedcriteriawhileallowingfortheevolutionofcriteriaandprioritiesandmustbeabletorespondtospecialevents,suchaswhenthesurvivalofdatasetsisthreatened.Allstakeholdersscientists,researchmanagers,informationmanagementprofessionals,archivists,andmajorusergroupsshouldberepresentedinthebroadoverarchingdecisionsregardingeachclassofdata.Theappraisalofindividualdatasets,however,shouldbeperformedbythosemostknowledgeableabouttheparticulardataprimarilytheprincipalinvestigatorsandprojectmanagers.Insomecases,theymayneedtoinvolveanarchivistorinformationresourcesprofessionaltoassistwithissuesoflong-termretention.
Classifieddatamustbeevaluatedaccordingtothesameretentioncriteriaasunclassifieddatainanticipationoftheirlong-termvaluewheneventuallydeclassified.Evaluationoftheutilityofclassifieddataforunclassifiedusesneedstobedonebystakeholderswiththerequisiteclearancestoaccesssuchdata.
OpportunitiesCreatedByTechnologicalAdvancesForNewDataUseAndRetentionStrategies
Rapidprogressininformationtechnologycontinuallyaltersboththequantityandthequalityofscientificinformationandperiodicallystimulatesfundamentalmodificationofdatamanagementandarchivingstrategies.Recenttechnologicaladvanceshaveenablednewmethodsandstrategiesfordatastorageandretrievalandhavecreatedbetterwaysofconnectinguserstodataresourcesandtoeachother.Moreover,theevolvingtechnologiesarecatalystsforrevisingorganizationalstructurestomanagedistributedscientificdataarchivesmuchmoreeffectively.
TableS.1providesasummaryofnewtechnologiesandrelateddevelopmentsthatenableanewstrategyforthemanagementofscientificandtechnicaldata.Theseadvancesininformationtechnologies
![Page 31: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/31.jpg)
![Page 32: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/32.jpg)
Page5
TABLES.1NewTechnologiesandRelatedDevelopmentsThatEnableaNewStrategyfortheManagementofScientificandTechnicalData
NewTechnologyTrendsandRelatedDevelopments
KeyFeatures WhatIsEnabled?
High-performancecomputernetworks
Distributedfunctions;rapiddeliveryoflargedatavolumes
Locationofdatabasesandarchiveswherebestmanaged;collaborativework;distributedorganizations;distributedresponsibility
Lowanddecliningcostofstorage
Inexpensivebackup;continuallydecliningcost;easeofmigration
Deferralofarchivingdecisions;trustindistributedmanagementduetosafestoragebackup
Advanceddatamanagement
Abilitytorigorouslyandformallymanagediversedatatypes
Morecomplexdatastructures(otherthan''flatfiles")handledinarchives,withgreatpotentialadvantages
Changingrequirementsforinformationtechnologyprofessionals
Abilityofpersonnelwithlowertechnicalskillstosucceedindatamanagementroles
Abilitytoentrustscientificdatamanagementinadistributedenvironment
Highreliabilityoftechnologycomponents
Availabilityofbettercomponentsandconnections;reducedprocurementandoperationscosts
Reducedcostandeffortindatamigration;trustedconnectionsforcommunicationandcollaboration
Developmentandacceptanceofstandards
Agreementonterms,interfaces,media,procedures
Reducedefforttocommunicateandapplyresultsofothers;abilitytoconcentrateonmissionissuesandnotontechnologysupport
anddatamanagementsupportthecreationofahighlydistributed,
![Page 33: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/33.jpg)
anddatamanagementsupportthecreationofahighlydistributed,federatedmanagementstructureforournation'sscientificinformationresources.
ANewStrategyForArchivingTheNation'sScientificAndTechnicalData
Inordertorespondadequatelytotheimperativesforpreservingdataaboutthephysicaluniverseandtotakeadvantageofthetechnologicaladvancesdescribedabove,thefederalgovernmentshouldcreateanintegratedandadaptiveinfrastructureandrelatedprocessesforprovidingreadyaccesstothenationalresourceofscientificandtechnicaldataandrelatedinformation.Suchaneffortmustsupporttheneedsofdataoriginators,users,andcustodiansacrossallphasesofthedatalifecycle,fromorigintousebyfuturegenerations.Thecommitteebelievesthatthefollowingprinciplesshouldguidetheeffortofthegovernmentagenciesinthelong-termretentionofscientificandtechnicaldata:Dataarethelifebloodofscienceandthekeytounderstandingthisandotherworlds.Assuch,dataacquiredinfederalorfederallyfundedendeavors,whichmeetestablishedretentioncriteria,areacriticalnationalresourceandmustbeprotected,preserved,andmadeaccessibletoallpeopleforalltime.Thevalueofscientificdataliesintheiruse.Meaningfulaccesstodata,therefore,meritsasmuchattentionasacquisitionandpreservation.
![Page 34: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/34.jpg)
Page6Adequateexplanatorydocumentation,ormetadata,caneliminateoneoftoday'sgreatestbarrierstouseofscientificdata.Asuccessfularchiveisaffordable,durable,extensible,evolvable,andreadilyaccessible.Theonlyeffectiveandaffordablearchivingstrategyisbasedondistributedarchivesmanagedbythosemostknowledgeableaboutthedata.Planningactivitiesatthepointofdataoriginmustincludelong-termdatamanagementandarchiving.
TheProposedNationalScientificInformationResourceFederation
ThecommitteebelievesthatthefederalgovernmentshouldcreateaNationalScientificInformationResourceFederationanevolutionaryandcollaborativenetworkofscientificandtechnicaldatacentersandarchivestotakeonthechallengeofprovidingeffectiveaccesstoandpreservationofimportantdataandrelatedinformation.Suchaninitiativewouldbegintoexploitfullyournation'ssignificantinvestmentinthephysical(andother)sciencesandthedataacquiredwiththatinvestment.Severalcriticalconceptsmustgovernanyfederatedmanagementstructureforittofunctionproperly(Handy,1992):Subsidiaritythepowerisassumedtoliewiththesubordinateunitsofanorganization.Powercanberelinquished,butnottakenaway.Thesubordinateunitstypicallyarebestqualifiedtomakeoperationaldecisionsthatdirectlyaffectthemandthattheywillbeimplementing.Thecentralmanagementisallowedonlythosepowersneededtoensurethatthesubordinatesdonotdamagetheorganization.ItisclearthatthestrengthsofthecurrentsystemformanagingscientificandtechnicaldataandinformationintheUnitedStatesaredistributedamonganumberofdiversedatacentersandarchives,bothwithinandoutsidethegovernment.Asuccessfulfederationoftheseexistinginstitutionswouldrecognizethattheyarethelocationsofexpertiseontheirrespectivedataholdings.Thusthecentralorganizationshouldbesmallandshouldnot
![Page 35: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/35.jpg)
holdings.Thusthecentralorganizationshouldbesmallandshouldnotmicromanagetheday-to-dayoperationsofthesubsidiaryorganizations.Pluralismthemembersareinterdependent.Inafederation,theindividualsubsidiaryorganizationsrecognizetheadvantagesofbelongingtothefederation,becauseofproductsorservicesthatcanbeobtainedfromotherelementsinthefederation.Theexistenceofmanyspecializeddatacentersandarchives,aswellasthepossibilityofcreatingnewonesinanetworkedenvironment,canoffersignificanteconomiesofscaleandimprovedsharingofideasandexpertise.Whatisgoodforthesubsidiaryelementalsoshouldbegoodforthewhole.Pluralism,coupledwithsubsidiarity,guaranteesameasureofdemocracyinthefederation.Standardizationinterdependencerequirescompatiblelanguages,communications,basicrulesofconduct,andunitsofmeasurement.Theseelementsmaybesummarizedastechnicalandproceduralstandardization.Standardsthataredevelopedbyconsensusofthesubsidiaryelements(e.g.,theparticipatingdatacenters,archives,andresearchers)arewidelyrecognizedasessentialtothesuccessfulmanagementofdata.Separationofpowers(responsibilities)asystemofchecksandbalancesisnecessarytoensurethatthecentralauthoritydoesnottakeonunnecessarypower.Thisprinciplemustbeincorporatedintothefederation'sorganizationalstructure.Strongleadershipthecentralcoordinatingelementorexecutiveofficemustactasthestandardbearer,promotingthefederation'sestablishedgoalsandobjectiveswhileremindingthesubsidiaryorganizationsoftheimportanceofcarryingouttheirresponsibilities.
AfederateddatamanagementsystemwouldbeconsistentwiththegoaloftheNationalInformationInfrastructuretodistributeinformationresourcesbroadlythroughoutoursociety.Thetechnologyis
![Page 36: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/36.jpg)
![Page 37: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/37.jpg)
Page7
availabletomakeafullynetworked,buthighlydistributedsystemofdatacentersandarchivesbothfeasibleanddesirable.Suchasystemwouldbeefficientinprovidingaccesstoscientificdataandinformationtoalargenumberofpotentialusersandwouldmaximizethegovernment'sreturnontheverylargeinvestmentthatinitiallywentintoacquiringthosedata.Fromanorganizationalstandpoint,afederatedmanagementstructurewouldallowthedisparateelementstocontinuetospecializeinwhattheyeachdobestandtofulfilltheirindividualorganizationalmandates,whileprovidingsomeefficienciesofscaleandpoliticalleverageinaddressingthemostpressingissues.Thecommitteebelievesthisapproachisespeciallytimelyandimportantinaneraoffederalgovernmentbudgetreductions.
Recommendations
Thecommitteethusrecommendsthatthefederalgovernmenttakethefollowingstepsforadequatelypreservingandprovidingaccesstodataaboutourphysicaluniverse:
AdopttheNationalScientificInformationResource(NSIR)FederationconceptasanintegralpartoftheNationalInformationInfrastructure(NII).Thisconceptmustencompassnotonlyanelectronicnetwork,butalsoindividuals,organizations,communities,dataresources,procedures,guidelines,andassociatedactivitiesofdatageneration,management,custodianship,anduse.TheNSIRFederationthusshouldprovidethemeansfordefiningacoherentapproachtomanagingthelifecycleofscientificdata.Thisapproachshouldbedevelopedandimplementedthroughconsensusofcollaboratingorganizationswithdiverseandautonomousmissions.TheinteragencyGlobalChangeDataandInformationSystemisanexampleofaprototypeNSIRFederation,focusedondataforaspecificsetofinterdisciplinaryscienceproblems.TheNSIRFederationwouldbuildonsuchefforts,providingforbetter
![Page 38: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/38.jpg)
coordinationandinteractionamongthem,andwouldhelporganizefledglingeffortstopreserveandprovidebroadaccesstodatainotherdisciplines.
TheadministrationshouldtakethestepsnecessarytofullydefineandcreatetheNSIRFederation.Thereareatleasttwopotentialfocalpointswithintheadministrationforplanningsuchanactivity.ThesearetheinteragencyInformationInfrastructureTaskForcefortheNIIandtheNationalScienceandTechnologyCouncil.Aconvocationofrepresentativesfromthescientific,dataandinformationmanagement,andarchivingcommunitieswouldbeagoodwaytohelpdefineandinauguratethisinitiative.
FollowingtheformalauthorizationbythefederalgovernmentforcreatingtheNSIRFederation,theprincipalparties,includingNARAandNOAA,shouldconcludeagreementsfortheimplementationofadistributedarchivesystem.Thesystemshouldinvolveallrelevantinstitutions,includingnongovernmentalentitiesthatarefundedbythefederalgovernmentorthatmaintaindatathatwereacquiredwithfederalfunds.Asageneralprinciple,datacollectedbyanagencyshouldremainwiththatagencyindefinitely.ThecommitteerecognizesthatthisrecommendationmayrequiresignificantoperationalchangesforagenciesotherthanNOAA,andevensomechangeswithrespecttoNOAA'sdataactivities.Furthermore,theassociatedagenciesintheNSIRFederationmustworktogether,undertheleadofasmallexecutiveofficewiththeexpertisetoestablishdatamanagementguidelinesandminimumcriteriaforadequatemetadatathatcouldbeappliedacrosstheentireFederation.Theexecutiveofficecouldbeeitherahigh-levelinteragencycoordinatingcommitteeoranewofficeatanappropriatefederalagency,suchastheNationalScienceFoundation,whichhasabroadscientificandtechnicalaswellascommunicationmandate.Inanycase,theexecutiveofficeshouldresistthetypicaltendencytowardbureaucraticaccretionofpower,personnel,andresources,aswellasthetendencytoconsolidateand
![Page 39: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/39.jpg)
centralizedataholdings.Amanagementcouncilconsistingofrepresentativesofthememberorganizationsshouldbecreatedtohelpensurethattheexecutiveofficefunctionremainsfullyresponsivetoallmembersofthefederation.
![Page 40: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/40.jpg)
Page8
Dataaccessandpreservationservicesshouldbeimplementedonthemostcost-effectivebasispossiblefortheFederation.Forexample,oneinstitutionshouldprovideaservicetooneormoreotherinstitutionsinordertoexploitpotentialeconomiesofscaleandfocalpointsofexpertise.Thismeasuremightincreasethecosttotheprovidinginstitution,butwoulddecreasetheoverallcosttothefederation,thegovernment,andthetaxpayer.
TheinstitutionsbelongingtotheNSIRFederationshoulddevelopaprocessforcollaboratingeffectivelyonspecificinitiatives.Thisprocessshouldprovideamechanismtodefineandprioritizedatamanagementandpreservationinitiatives,toestablishtherequiredagreementsbetweencollaboratingorganizations,andtosecurefundingforeachinitiative.Eachparticipatingorganizationwouldcontributetothefederationaccordingtoitsparticularstrengthsandinamannerconsistentwiththefoundingcharter.Inaddition,anindependentadvisoryboardconsistingofexpertsfromusergroupsshouldbeformedinsupportofeachinitiative.
TheNSIRFederationshoulddevelopanationalresourceofinformationtechnologythatisconsistentwithitscharteredobjectivesandthatcanbeeffectivelydistributedtoinstitutionsthatmustmanagedata.Thesetechnologieswouldincludecompleteproducts,designs,guidelines,standards,andmethodologies.Arelatedlong-termtechnologystrategy,or"technologynavigation"function,shouldbedevelopedtohelpguidetheseefforts.
TheNSIRFederationshouldinstituteanindependentlymanagedprocessforawardingNSIRcertificationtomemberscientificinstitutionsandtheirdataandinformationsystemsonthebasisofwell-definedcriteriaandstandards.Thecertificationprocessshouldbemanagedbyanongovernmental,not-for-profitorganization,whichwouldreceivetechnicalguidancefromtheparticipatingfederal
![Page 41: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/41.jpg)
agencies.Thecertificationneedstohavecredibilityinthecommunity,sothatnonmemberinstitutionswillaspiretoattaincertificationandhaveittaggedtotheirproducts.Thecertificationalsoshouldbesomethingthatcommercialvalue-addedprovidersseektoincreasethecredibilityoftheirproducts.
ItalsoisimportantforthecommitteetostatewhattheNSIRFederationshouldnotbe.Itshouldnotbecomeanexpensivebureaucraticentity.Theexecutiveofficemustnotimposeanystandardsorinformationtechnologiesfromabovethathavenotbeenvalidatedthroughaconsensusprocessofthememberorganizations.Finally,theexecutiveofficemustnotattempttomicromanagetheoperationsoftheparticipants,norshouldithaveanydirectcontrolovertheirbudgetsandfundingallocations.
RecommendationsSpecificallyforNARA
AlthoughNARAhasalegislativemandatetopreservefederalrecords,itcannottoday,norwillitlikelyeverbeableto,actasthecustodianofmostphysicalsciencedata.ThedatavolumeistoogreatinrelationtotheverylowfundingappropriatedtoNARA,theNARAstaffdonothavethespecializedscientificknowledge,theinteragencylinkagesarenotinplace,andahugeinfrastructuresimilartothatwhichalreadyexistsatotheragencieswouldneedtobeduplicatedbyNARA.Inaddition,thedesignationofafederalrecordissometimesirrelevanttothearchivalprocessforscientificandtechnicaldata,andmanydataoflong-terminterestdonotmeettheexistingdefinitionofafederalrecord.*Hence,
*"'[Federal]records'includesallbooks,papers,maps,photographs,machinereadablematerials,orotherdocumentarymaterials,regardlessofphysicalformorcharacteristics,madeorreceivedbyanagencyoftheUnitedStatesGovernmentunderFederallaworinconnectionwiththetransactionofpublicbusinessandpreservedorappropriateforpreservationbythatagencyoritslegitimatesuccessorasevidenceoftheorganization,function,policies,decisions,procedures,operations,orother
![Page 42: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/42.jpg)
activitiesoftheGovernmentorbecauseoftheinformationalvalueofthedatainthem"(44U.S.C.3301).
![Page 43: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/43.jpg)
Page9
NARAhasaspecialroleasapartnerinthearchivingprocessforscientificandtechnicaldatasetsthatisdifferentfromitstraditionalroleasthenation'sarchives.
ThecommitteemakesthefollowingspecificrecommendationstoNARAinadditiontothosemadeelsewhereinthisreport:
NARAshouldstrengthenitsliaisonwitheachfederalagencythatproducesscientificandtechnicaldatatoensurethatappropriateattentionisdevotedtotheirlong-termretentioninadistributedstorageenvironment.
NARAshouldformstandingadvisorycommitteeswithmanagersofscientificdata,historians,andscientificresearcherstoaddresstheretentionandappraisalofscientificandtechnicaldatacollectionsandrelatedissues.
NARAshouldcollaboratewithotheragenciesthatmaintainlong-termcustodyofdatatodevelopaneffectiveaccessmechanismtothesedistributedarchives.Theinitialstepshouldfocusonlocatorsystemsandevolvetowardatransparentaccesssystem.
Finally,NARAshouldworkwiththescientificcommunityandpotentialsourcesofscientificdatatodevelopadaptableperformancecriteriafordataformatsandmedia,ratherthanmandatingnarrowandinflexibleproductstandards.
RecommendationsSpecificallyforNOAA
AsthelargestholderofearthsciencesdataintheUnitedStates,NOAAhasavastamountofscientificdatastoredatanumberoffacilitiesacrossthecountry.NOAAthushasanespeciallyimportantroleinthepreservationofournation'sobservationaldataonthephysicalenvironment.ThecommitteemakesthefollowingspecificrecommendationstoNOAA:
NOAAshouldplaceahigherpriorityondocumentingandestablishing
![Page 44: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/44.jpg)
NOAAshouldplaceahigherpriorityondocumentingandestablishingdirectoriesofitsdataholdings.
NOAA,withtheactivecooperationofNARA,shouldleadeffortstobetterdefinetechnology-independentstandardsforarchiving,storing,andtransmittingthedatawithinitspurview.
Finally,NOAA,aswellaseveryotherfederalscienceagency,shouldensurethat:allitsdataaresharedandreadilyavailable;itfulfillsitsresponsibilityforqualitycontrol,metadatastructures,documentation,andcreationofdataproducts;itparticipatesinelectronicnetworksthatenableaccess,sharing,andtransferofdata;anditexpresslyincorporatesthelong-termviewinplanningandcarryingoutitsdatamanagementresponsibilities.
Thecreationofthecommittee'sproposedNSIRFederationwouldhelpprovideacollaborativemechanismandmoresustainedpeerpressuretomeettheseobjectives,andthusenhancethevalueofscientificandtechnicaldataandinformationresourcestothenation.
![Page 45: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/45.jpg)
Page10
1IntroductionStandingattheintersectionofpastandfuture,wehumansarefascinatedwiththeeventsofyesteryearandintriguedwithwhattomorrowwillbring.Ourprehistoricancestorsbegantheprocessofrecordingaspectsoftheenvironmentthatwereimportanttothem(Marshack,1985;Boorstin,1992).Todaywearecuriousaboutmanymoreworlds,rangingfromthoseofatomicsizetothoseofcosmicscale.WithinstrumentsonEarthandinspace,weseektocaptureviewsofrealitythatwillhelpusunderstandnatureandourrelationshiptoit.
Scientificdatareflectboththeorganizationandthechaosofthenaturalworld.Theystimulateustodevelopconcepts,theories,andmodelstomakesenseofthepatternstheyrepresent.Theresultingabstractionsaretheproductofscientificendeavor,thegoalbeingtodeveloptheformalandsystematicideasthatconstitutetheunderstandingofrelationshipsbetweencausesandconsequencesandperhapsmayenablepredictionoffuturesequencesofevents.Becausescientiststransformdatafromthematerialworldintoideas,theobservationsofobjectsandprocessesinthephysicalworldarethestimuliofscientificthought.Dataarethustheseedsofscientificideas.
Sciencegenerallyworksbyproceedingfromdatatounderstandingthroughaprocessoforganizingthedataandanalyzingtheirimplications.Thefollowingdefinitions,adaptedfromSettingPrioritiesforSpaceResearch:OpportunitiesandImperatives(NRC,1992a),indicatehowtheprocessworks:Dataarenumericalquantitiesorotherfactualattributesderivedfromobservation,experiment,orcalculation.Informationisacollectionofdataandassociatedexplanations,interpretations,orothertextualmaterialconcerningaparticularobject,
![Page 46: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/46.jpg)
interpretations,orothertextualmaterialconcerningaparticularobject,event,orprocess.Knowledgeisinformationorganized,synthesized,orsummarizedtoenhancecomprehension,awareness,orunderstanding.Understandingisthepossessionofaclearandcompleteideaofthenature,significance,orexplanationofsomething;itisthepowertorenderexperienceintelligiblebyorderingparticularsunderbroadconcepts.
Thisprocessiscyclical.Newdataconfirmorrefuteexistingtheoriesandstimulatenewunderstanding,whichgeneratesnewanddeeperquestionsthatoftenneedentirelynewsetsofobservationstobegintheprocessofansweringthem.Newunderstandingalsoleadstoincreasedtechnologicalcapability,and
![Page 47: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/47.jpg)
Page11
thatinturnmakesnewobservationspossibleandagainallowsustocontemplatemoresophisticatedquestions.
Thusobservationsandscientificprogressareintertwined;datafromthephysicalworldensurethatscienceisfoundedonrealityaswetrytoanswertheunending"how"and"why"questionsthatarepartofbeinghuman.Theanswersbecomeunderstandingthatenablesustodevelopschemesforpredictingornotbeingsurprisedbyfutureevents.Andunderstanding,wehope,ultimatelyleadstowisdomaboutourinteractionswiththeworldaroundus.
ImperativesForPreservingDataOnOurPhysicalUniverse
Thescientificreasonsforpreservingdataderivefromthefactthatobservations,knowledge,andunderstandingarecumulative.Thuswebelievethatthemorecompletetherecord,themorewecanextractfromit.
Manyobservationsaboutthenaturalworldarearecordofeventsthatwillneverberepeatedexactly.Examplesincludeobservationsofanatmosphericstorm,adeepoceancurrent,avolcaniceruption,andtheenergyemittedbyasupernova.Oncelost,suchrecordscanneverbereplaced.
Observeddataprovideabaselinefordeterminingratesofchangeandforcomputingthefrequencyofoccurrenceofunusualevents.Thelongertherecord,thegreaterourconfidenceintheconclusionswedrawfromit.Ourtraditionalobservationalrecordshaveportrayedfrozeninstantsofreality.Ifpreserved,theywillcontinuetoprovideinsights,butifneglected,theywillmeltaway.
Adatarecordisalsoworthpreservingbecauseitmayhavemorethanonelife.Asscientificideasadvance,newconceptsemergeinthesameorentirelydifferentdisciplinesfromstudyofobservationsthatled
![Page 48: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/48.jpg)
earliertodifferentkindsofinsights.Newcomputingtechnologiesforstoringandanalyzingdataenhancethepossibilitiesforfindingorverifyingnewperspectivesthroughreanalysisofexistingdatarecords.Thus,therelativeimportanceofdata,bothcurrentandhistorical,canchangedramatically,ofteninentirelyunanticipateddirections.Thismeansthatthereanalysisofdata,eveninthedistantfuture,maybringnewunderstanding,whichwillagainincreasethevalueofthosedataoverthatwhichwemighthaveassignedtothematthetimeoftheirarchiving.Finally,thesubstantialinvestmentsmadetoacquiredatarecordsusuallyjustifytheirpreservation.Thecostofpreservationwillalmostalwaysbesmallincomparisonwiththecostofobservation.Becausewecannotpredictwhichdatawillyieldthemostscientificbenefitinyearsahead,thedatawediscardtodaymaybethedatathatwouldhavebeeninvaluabletomorrow.
Theassembledrecordofobservationaldatathushasdualvalue:itissimultaneouslyahistoryofeventsinthenaturalworldandarecordofhumanaccomplishment.Thehistoryofthephysicalworldisanessentialpartofouraccumulatingknowledge,andtheunderlyingdataformasignificantpartofthatheritage.Theyalsoportrayahistoryofourscientificandtechnologicaldevelopment.
Withappropriateexplanatorydocumentation,oftenreferredtoasmetadata,thedatademonstratetheincreasingsophisticationofourattemptstounderstandournaturalsurroundingsandthetechnologicalcapabilitiesweapplytothetask.Preservedforstudybyfuturegenerations,thedatawillspeakacrosstheyearsaboutwhatwetriedtodo,wherewesucceeded,andwherewefailed.Withincreasingcapabilitiesforanalyzingandconceptualizingpatternsindata,thosewhofollowmayfindinourarchiveddataimportantcluesthatwecouldnotordidnotsee.Atthesametime,ourdescendantswillbegratefulthatwepreservedasufficientlylonghistoryoftheirworldthattheycanmakeimportantdecisionsabouttheirownfuture.
![Page 49: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/49.jpg)
Therearenumeroussocioeconomicreasons,inadditiontothecompellingscientificandhistoricalmotivations,forthelong-termretentionofobservational,aswellascertaintypesofexperimental,data.Forexample,historicalclimatedatahavehadwell-documentedusesinabroadrangeofapplicationsinmanufacturing,energy,agriculture,transportation,communications,engineering,construction,insurance,andentertainment(OTA,1994).Suchapplicationsarecommonforothertypesofobservational
![Page 50: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/50.jpg)
Page12
dataontheEarth'senvironment.Experimentaldatainthephysicalsciencesalsohavemanyindustrialandotherpracticaluses.Additionalexamplesofthelong-termusesofthevariousphysicalsciencedataareprovidedinthenextchapter.
ANewFutureForScientificData
Thecollectionsofscientificdataacquiredwithgovernmentandprivatesupportarethefoundationforourunderstandingofthephysicalworldandforourcapabilitiestopredictchangesinthatworld.Intheyearsahead,thevolumesofthosecollectionsofdatawillincreasedramatically.Theywillstimulateadvancesinourscientificunderstandingandinourapplicationsofthatunderstandingtopursueimportantnationalgoals.Thescientificdatainfederal,state,andprivatedatabasesthusconstituteacriticalnationalresource,onewhosevalueincreasesasthedatabecomemorereadilyandbroadlyavailable.
Today,wecanforeseethepossibilityofusingthenationalresourceofscientificdatamoreadvantageouslythaneverbefore,astechnologicaladvancesopennewvistasformanagingandaccessingscientificinformation.Growingcomputationalpowerenablesnewapproachestotheanalysis,management,andapplicationofdata.Advancesindatastoragetechnologiesmakethelong-termretentionofvirtuallyalldatabothfeasibleandaffordable.TheexistenceoftheInternetandoftheemergingNationalInformationInfrastructure(NII)enableunprecedentednationwidesharingandapplicationofdatathatresideinappropriatelyconfigureddatabases.Automaticsearchprocedures,filetransfercapabilities,andtheacceleratinguseoftheWorldWideWebfunctionsontheInternetillustratethepowerofthecontemporarytechnology.Itisimportanttonotethattheseenablingtechnologieshaveemergedinashorttimespan;equallyrapidadvancescanbeanticipatedintheyearsahead,whichwillfurtherfacilitatethesearch
![Page 51: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/51.jpg)
forandaccesstothenation'sdataresources.
Ournewpowertostoreanddistributedataandinformationischangingthewayweworkandthink.However,thecommunitiesinvolvedinthecreation,retention,anduseofscientificdataaboutthephysicalworldarenotoptimallyorganized.Theycommonlyworktowarddisparategoals,arenotwellconnected,anddonottakefulladvantageoftechnologicalandconceptualadvancesindatamanagementandcommunication.Anentirelynewapproachtothelong-termpreservationofscientificdataisnowbothfeasibleandessential.Itmusttakeadvantageofadvancingtechnologyandofdistributedcommunicationsandmanagementstructurestoempowerboththecreatorsandtheusersofsuchdata.
Thisstudyidentifiesthemajorissuesregardingexistingeffortstoarchiveandusedatainthephysicalsciences,establishesretentioncriteriaandappraisalguidelinesforthosedata,reviewsimportanttechnologicaladvancesandrelatedopportunities,andproposesanewstrategytoensureaccesstothedatabyfuturegenerations.
![Page 52: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/52.jpg)
Page13
2TheChallenge:PreservationandUseofScientificDataWeadvanceourunderstandingofthephysicaluniversebybuildingoncurrentandpaststudiesinindividualdisciplines,bycollectingandanalyzingnewtypesofdata,andbyusingpastobservationsinentirelynewwaysnotenvisionedwhenthedatawereinitiallycollected.Themorecompletetherecordofscientificdataandinformation,themorenewunderstandingandknowledgewecanextractfromit.Observationsofnaturalphenomenatypicallyrepresentarecordofeventsthatwillneverberepeatedinadynamicuniversethatcontinuallychangesintimeandvariesinspace.Newscientificadvanceshavehadsignificant,sometimesprofound,societalandeconomicimpactsandmaybeexpectedtobeequallyimportantinthefuture.Scientificdataandinformationareattheheartoftheseadvancesandareessentialfornewdiscoveries.Therefore,theyconstituteapreciousnationalresource.
Thesectionsthatfollowdescribebrieflythetwomajortypesofdatathatareofcriticalimportanceinthephysicalsciencesexperimentallaboratorydatainphysics,chemistry,andmaterialssciences,andobservationaldataintheearthandspacesciences.Ineachofthesebroadareastheprogressthathasbeenmadetodateintermsoflong-termpreservationandaccessibilityischaracterized,andthekeyissuesidentified.Morecomprehensivedescriptionsofthestatusoflong-termdataretentioninthevariousphysicalsciencedisciplineareasareinthevolumeofworkingpaperspreparedasbackgroundforthisreport(NRC,1995).
ExperimentalLaboratoryData
![Page 53: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/53.jpg)
Theexperimentalscienceshaveprogressedoverthecenturiesbybuildingontheconcepts,theories,andfactualinformationresultingfromeachgenerationofscientificinquiry.TheobservationsofTychoBrahewereusedbyKeplertodevelophislawsofplanetaryorbits,andNewton'sformulationofmechanicsdrewuponthepreviousworkofGalileo,Kepler,andothers.AcenturyofmeasurementsonpropertiesofthechemicalelementsprovidedtherawmaterialneededforMendeleevtoconstructhisperiodictable.Thehistoryofscienceisrichinexampleswheretheintroductionofnew,oftenrevolutionary,conceptsrestedondatathathadbeenpreservedfrompreviousscientificinvestigations.Furthermore,thetechnologyoftomorrowisoftenbasedonthelaboratorydataoftodayoryesterday.
Theexplosivegrowthofscienceinthiscenturyprovidesmanyotherexamplesofthekeyroleofdatafrompreviousexperiments.WhenTownesandSchawlowpublishedtheirlandmark1958paperthatdemonstratedthetheoreticalpossibilityofbuildingalaser,intensiveeffortswerestartedtofindareal
![Page 54: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/54.jpg)
Page14
physicalsystemthatwouldmeetthenecessaryrequirements.Dataonatomicspectra,someofthem60to70yearsold,providedthekeytocreationofthefirstworkinggaslaser.Ifithadbeennecessarytomakenewmeasurementsoneveryconceivablesysteminordertoselectthemostpromisingfortrial,theinventionofthelaserandallthenewtechnologyandeconomicbenefitsthatithasbroughtwouldhavebeendelayedformanyyears.
ThecrashprogramtoimproverocketpropulsionsystemsfollowingthelaunchofthefirstSovietSputnikprovidesanotherexample.Dataonthethermodynamicpropertiesofawiderangeofsubstanceswereessentialtotheeffortstooptimizerocketengineperformance.Aconcertedgovernmentprogramwasstartedtobuildadatabaseofthermodynamicpropertiesforrocketenginedesign.Althoughsomenewlaboratorymeasurementswererequired,manyoftheneededdatawereinthescientificliterature,somepublishedasearlyas1880.Theavailabilityoftheseolderdatasignificantlyaidedtherocketengineprogram.
Datageneratedbyscientistsandengineersinthefieldsofphysics,chemistry,andmaterialssciencehavetraditionallybeenpublishedinresearchjournals,whichservebothacurrentdisseminationandanarchivalfunction.Thisjournalsystemhasservedsciencewellfor300years.Manyscientificlibrariesthroughoutthecountryprovideaccesstothesejournals.Becausebackvolumesarekeptinlibrariesinmanydifferentplaces,thereislittledangerofirreparablelossfromanaturalcatastrophe.Manyscientificsocietiesalsohavedepositorysystemsthatallowauthorstosubmitvoluminousdatasetsthatcannotbepublishedinthejournalsbecauseoflackofspace.Thesocietiesmaintainthesearchives,generallyonmicrofilm,andsupplycopiesonrequest.
Whilethegrowinguseofelectronicrecordingandstoragetechniques
![Page 55: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/55.jpg)
isalreadyaffectingthetraditionaljournalsystem,wecanexpectpublisherstotakeadvantageofthenewtechnologytomeetnewneeds.Scientificsocietiesarebeginningtoimplementelectronicarchivesforpreservingdatathataretoovoluminoustopublishinpaperformats.Forexample,theAmericanChemicalSocietyrecentlybegantomakedatafrompapersinitsleadingjournal(JournaloftheAmericanChemicalSociety)availableontheInternet.Itisanaturalstepfromthepaperandmicrofilmarchivesthatsuchsocietiesnowmaintaintotheelectronicarchivesofthefuture.Clearly,theseprivatesectorarchivesmustbeanintegralpartoftheoverallconceptofa"NationalScientificInformationResource."
Electronicallyrecordeddatainthelaboratoryphysicalsciencesareoftwoforms,originalexperimentalmeasurementsandevaluatedcompilationsofpublisheddata.Theseareexaminedhereinturn.
OriginalExperimentalMeasurements
Recentdecadeshaveseensignificantchangesintheformof"originaldata."Arawexperimentalresultwas,inthepast,typicallyameasuredvaluesuchasavoltageordistance.Theinvestigatorreadthesemeasurementsfrominstruments,wrotetheminanotebook,treatedthemarithmeticallytoobtainthedesiredscientificvariablefromtherawmeasurement,andinterpretedthem.Theoriginalmeasurementswereeventuallydiscardedinmostcases.Today,manyrawdataareacquiredandprocessedelectronicallyassoonastheyareenteredintothecomputer,sothatonlytheprocesseddataexistlongenoughforanyonetolookat.Withrapid,automateddataacquisitionandmanipulation,theoptionexiststokeepelectronicdataandreanalyzethemasrequired.However,automateddatacollectionoftenresultsinlargevolumesofinsignificantdata,sothatinmanyexperimentsthedatastreamisscreenedandmostofthedataarediscardedinrealtimebyacomputerprogramorbytheexperimenter.Forexample,spectroscopistsusedtokeep,atleasttemporarily,thephotographic
![Page 56: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/56.jpg)
platesorrecorderchartsfromwhichtheyhadtakenmeasurements.Nowthespectralfeaturesmaybeanalyzedelectronicallyimmediatelyuponmeasurement,andonlytheattributesofrelevantfeaturesarerecorded.Thefractionoftherawdatathatissavedafterinitialprocessingmaybesmall,sometimeslessthanonepartin10,000.Invirtuallyallcases,thereisnojustificationforpreservingtherawdata,becausetheexperimentcanberepeatedinthoserareinstancesinwhichanunanticipatedfutureinterestappears.
![Page 57: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/57.jpg)
Page15
Whenconsideringlaboratorydataofthiskind,itisusuallybesttorecognizethatnooneknowsasmuchabouttheoriginaldataastheoriginalexperimenter.Iftheexperimenterdoesnotfindtherawdataworthpreserving(andworthdocumenting),thenthedataareprobablynotgoingtobeofusetoanyoneelse.Becausethenumberofstagesofprocessing(e.g.,replication,averaging,coordinatetransformations,applyingcorrections,andsoon)differforeverytypeofmeasurementandundergocontinualevolutionasnewtechniquesareintroduced,itwouldbefruitlesstotrytoformulategenericretentioncriteriaforalltypesoflaboratorydata.
However,therearecertainclassesoflaboratorydata(where''laboratory"isusedinabroadsense)thatshouldbecandidatesforpreservationifproperlydocumented,becauseitwouldbeimpossibleorimpracticaltoreproducethemeasurements.Someofthedatatakeninlargeplasmaphysicsfacilitiesfallinthiscategory,becausereproductionofthefacilitieswouldbeextremelycostly.Amorestrikingexampleisthespectroscopicandothermeasurementsfromnucleartestsintheatmosphere,whichitishopedwillneverbereproduced.Onamoremundanelevel,propertiesofengineeringmaterials,measuredasapartoflargegovernmentresearchanddevelopmentprograms,providemanydataofpossibleinterestinthefuture.Suchdataareacquiredasasmallstepinalargerprogramandusuallyarenotpublishedinthescientificliteratureordisseminatedbytheusualchannels.Theywouldbecostlytoreproducebecausemanyofthematerialswerespeciallypreparedwithuniquefabricationtechnology.ExamplesincludepolymerandsensordatafromtheStrategicDefenseInitiative,engineeringdatafromtheNationalAeronauticsandSpaceAdministration(NASA),andthesuperconductingmaterialsmeasurementscarriedouttodevelopmagnetfabricationtechniquesforthecanceledSuperconductingSuperCollider.Eventhoughthisprojectwillnotbecompleted,the
![Page 58: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/58.jpg)
materialsmeasurementsshouldbesaved,becausetheymaywellbeapplicabletofutureengineeringprojects.
EvaluatedCompilations
Compilationsresultingfromthecriticalanalysisofalargebodyofdatafromthescientificliteratureareaseparateareaforconsideration.Well-knownexamplesincludethermodynamicpropertycompilationssuchastheNationalInstituteofStandardsandTechnology'sJointArmy-Navy-AirForce(JANAF)tablesandthethermophysicalpropertiesdisseminatedbytheDepartmentofDefense'sCenterforInformationandDataAnalysisandSynthesisatPurdueUniversity(seethePhysics,Chemistry,andMaterialsSciencesDataPanelreportintheNRC(1995)reportforadetaileddiscussionoftheseexamples).TheDepartmentofEnergyoperatesseveraldataevaluationcentersinnuclearphysicsandchemistry.Insuchcenters,thedataandbackupdocumentationarenotimpossibletoreplace;theysimplyrepresentsomucheffortandexerciseofspecializedscientificjudgmentthatitwouldbeextremelycostlytoredothework.Thecostofnothavingthedataavailable,althoughusuallydifficulttomeasureotherthananecdotally,canbemuchhigherthanthecostofpreservingthem.Inparticular,ifitbecomesnecessaryinthefuturetoexpandorextendthecompilation,thefulldocumentation(e.g.,dataextractedfromreferences,fittingprograms,notesontheanalysistechniques,andthelike)willprovideavaluablebaseforthenewwork.Amajorconcerninconsideringthesedatacollectionsishowthedataandtheunderlyingdocumentationcanbepreservedandmadeaccessibleifthecentersproducingthemlosetheirfundingorexpertpersonnel.Thisconcernincreasesasgovernmentagenciesdownsizetheiractivities.
ObservationalDataInThePhysicalSciences
Overthepasttwodecades,theNationalResearchCouncilandothergroupshaveissuednumerousreportsthathaveaddresseddatamanagementissues,includinglong-termretentionrequirements,for
![Page 59: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/59.jpg)
digitalobservationaldataintheearthandspacesciences(NRC,1982,1984,1986a,b,1988a,b,1990,1992b,1993;GAO,1990a,b;Haasetal.,1985;NAPA,1991).Mostofthesereportshavefocusedquitenarrowlyonthedatamanagementorarchivingproblemsofspecificdisciplinesoragencies,and
![Page 60: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/60.jpg)
Page16
nonehasaddressedcomprehensivelytheissuesassociatedwiththelong-termretentionofobservationalandexperimentaldatainthephysicalsciences.
MajorCharacteristicsofObservationalData
Observationaldatasets,likelaboratorydata,includedigitalinformation(inbothwrittenandelectronicform),graphicalrecords,andverbaldescriptions.Therecordsexistasinkonpaper,punchedpaper,film(includingmicroforms),magnetictapeofmanytypes(includingvideotape),magneticdisk,anddigitalopticalmedia(includingCD-ROM).Overthepastthreedecades,however,thedominantformofdatacollectionandstoragehasbeenelectronic.
Observationaldatacanbecharacterizedbythecollectionandmanagementpracticesappliedthroughoutthelifecycleoftheirexistence.Onemightcharacterizetwomajorpracticesdrivenbythefundingmodelsforconductingtheunderlyingscience.The"bigscience"fundingmodelcreatesafundingumbrellaformultipleindividualsandinstitutionstoconductcoordinateddataacquisition,investigation,andpublication.Often,theselargeprogramsadoptastandardapproachforlife-cycledatamanagement.However,thereisusuallylittlestandardizationamongthebigscienceprograms.ExamplesofsuchprogramsincludetheWorldOceanCirculationExperiment,theWorldClimateResearchProgram,andNASA'sMissiontoPlanetEarth(CENR,1994).Theotherfundingmodel,"smallscience,"fundsindividualsorsmallgroupsofindividualstoconductindependentdataacquisition,analysis,andpublication.Typically,theseinvestigatorsplan,design,andimplementtheirowndatamanagementstrategywithlittleinteractionwiththerestofthescientificcommunity.Thedatageneratedunderbothmodelshavelong-termvalue,bothforscienceandforthebroaderinterestsofthenation.
![Page 61: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/61.jpg)
Specificsubdisciplinesalsoimposedifferentrequirementsonlong-termdatamanagement.Forinstance,whilethereisgeneralagreementwithinthephysicaloceanographycommunityonthedefinitionofstandardobservationvariablesandtheprocessesofmeasuringthosevariables,thesamecannotbesaidforbiologicaloceanography.Becauseofdifferencesinmeasuringtechniques,lackofcommunityagreementonnamingstandards,andthescientificprocessbywhichbiologyprogresses,datamanagementforbiologicaldatasetsisinherentlymorecomplexthaninphysicaloceanography.Thedatafromthesetwosubdisciplineswillhavetoaccommodatemultiplenamingschemesandalternatetaxonomies.Therefore,datamanagersandarchivistshavetodealwithdifferingapproachesandvocabulariesamongdisciplines,evolutionofdisciplineresearchparadigmsovertime,anddivergingconceptsandmethodswithinadiscipline.
Scientificresearchleadstothecreationofdatathatcanbeprocessedandinterpretedatdifferentlevelsofcomplexity.Typically,eachlevelofprocessingaddsvaluetotheoriginal(level-0)databysummarizingtheoriginalproduct,synthesizinganewproduct,orprovidinganinterpretationoftheoriginaldata.Theprocessingofdataleadstoaninherentparadoxthatmaynotbereadilyapparent.Theoriginalunprocessed,orminimallyprocessed,dataareusuallythemostdifficulttounderstandorusebyanyoneotherthantheexpertprimaryuser.Witheverysuccessivelevelofprocessing,thedatatendtobecomemoreunderstandableandoftenbetterdocumentedforthenonexpertuser.Onemightthereforeassumethatitisthemosthighlyprocesseddataproductsthathavethegreatestvalueforlong-termpreservation,becausetheyaremoreeasilyunderstoodbyabroaderspectrumofpotentialusers.Infact,justtheoppositeisusuallythecaseforobservationaldata,foritisonlywiththeoriginalunprocesseddatathatitwillbepossibletorecreateallotherlevelsofprocesseddataanddataproducts.Todoso,however,requirespreservationofthenecessaryinformationaboutprocessingstepsandancillarydata.
![Page 62: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/62.jpg)
Anotherimportantcharacteristicofobservationaldataistheirvolume.Inthisrespect,observationaldatacanbedividedintotwodifferentclasses:small-volumeandlarge-volumedatasets.Themajorityoftraditionalground-based,insituobservationsformsmall-volumedatasetsbecausetheyarebasedonindividuallyconductedmeasurementsorsamplecollections.Satelliteandotherremotelysensedobservationsgenerallyformlarge-volumedatasets.
![Page 63: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/63.jpg)
Page17
Thecommitteedefinessmall-volumedatasetsasthosewithvolumesthataresmallinrelationtothecapacityoflow-cost,widelyavailablestoragemediaandrelatedhardware.ThehardwareandsoftwaretowriteandproduceCD-ROMsarenowgenerallyavailableforlessthan$10,000,andpersonalcomputerscapableofreadingCD-ROMsarebeingmarketedashome-use,consumeritems.Forexample,thetotalvolumeofthesmall-volumeoceanographicdataisprojectedtobelessthan50gigabytesby1995,andthustheentirehistoricaldatasetforallobservationscouldbestoredonfewerthan100CD-ROMs.Thisisfewerdiskettesthanmanypeoplehaveintheircompactdiskmusiccollections.
Issuessuchasarchivingcost,longevityofmedia,andmaintenanceofthedataholdingsarenotthedominantconsiderationswithregardtoretainingsmall-volumedatasets.Rather,themajorissuewithrespecttothisclassofdataisthecompletenessofthedescriptiveinformation,ormetadata.Ifadatasethasbeenproperlypreparedanddocumented,theoperationsrequiredtomigratethedatashouldbeamenabletosignificantautomationandthereforeposeonlyaminorchallengetothelong-termmaintenanceofthearchive.Further,thesedatamaybewidelydistributedwithsimplereplicationofthemedia.Forexample,thevariousNOAAandNASAdatacentershaveprovidedcopiesoftheirdatasetstomanyusersforanumberofyears.
Adifferentproblemisposedbylarge-volumedatasets.ThebiggestdatasetstypicallycomefromEarthobservationsatellitesensorsandspacesciencemissions,andarechallengingtosomecontemporarystoragedevices.However,itisclearthatforthedatasettoexistatall,anadequatestoragemediumcapableofcapturingandmaintainingthedataforsometimeperiodmustexistwhenthedataaregenerated.Further,thetimeperiodforreliable,initialstorageshouldatleastcoverthelifetimeofthedatasetattheorganizationacquiringandusingthedatabeforetherecordsneedtobemigratedtonewmediaor
![Page 64: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/64.jpg)
transferredtoanotherorganization,suchasNOAAorNARA.Inaddition,duringtheinitialstorageperiod,therearelikelytobemajorincreasesinthedensityofmassstorageaccompaniedbysignificantdecreasesinthecostofstorageofthedata.Thus,datasetsthatarechallengingtodaywillgraduallybetransformedto"small-volume"statusinthefuture,asadvancingtechnologyincreasesthecapacityandlowersthecostofstoragedevices.Nevertheless,itisimportanttonotethatthelargestdatasets(e.g.,largerthatoneterabyte)canpresentsignificantorganizationalandmanagementproblemsthatrequirespecialanalysisofthedataflow,volume,access,andtimingcharacteristics.
ObservationalDataintheSpaceandEarthSciences
AstronomyandAstrophysicsData
Astronomyandastrophysicsareobservationalsciences;thatis,theyarebasedonwhattheskyprovidesandwecollect.Therefore,inmanyastronomicalinvestigationsthereisnosuchthingas"repeatinganexperiment"withtheexpectationofgettingthesameresults.Manyobjectshavepropertiesthatchangewithtimeeitherbecauseoftheirintrinsicnature(e.g.,variablestars),evolution(e.g.,starsgoingsupernova),orreasonsyetunknown.Ithappensquitefrequentlythatahighlyvariableobjectisfoundinsatellitedataandsubsequentarchivalresearchinopticalplatesallowsitsidentificationasagiventypeofstar.
Astronomyandastrophysicsdataareacquiredbybothground-basedandspace-basedobservatories.Ground-basedobservatories,whichareoperatedbyuniversitiesorothernonprofitorganizations(e.g.,AssociationofUniversitiesforResearchinAstronomy,theSmithsonianInstitution)andfundedbytheseorganizationsorbytheNationalScienceFoundation(NSF),havetraditionallybeenusedtostudytheskyatvisiblewavelengths.SincethesecondWorldWar,astronomershaveusedimprovingtechnologiestoobserveatradioand
![Page 65: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/65.jpg)
infraredwavelengths.Consortiaofuniversities,includingbothU.S.andforeigninstitutions,areconstructingnewtelescopes,whichuseadvancedtechnologytobuildlargermirrorsthatwillallowustolookdeeperintotheuniverse.Radioobservatoriesrangefromsmalleronesoperatedbyuniversitiestolargernationalfacilities,suchastheNationalRadioAstronomyObservatory,fundedby
![Page 66: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/66.jpg)
Page18
NSF.Mosttelescopesareforindividualobservingprograms,butsomearededicatedtosystematicskysurveys.
Datafromgroundobservationshavetraditionallybeenthepropertyoftheobserver;therefore,observatorieshavenostandardpoliciesfordataarchiving.Theexceptionsaresomebigprojects,suchasthePalomarSkySurvey,wheredataeitheraremadepublicandsoldorarearchivedwithintheuniversityorobservatory.Somecenters,suchastheNationalRadioAstronomyObservatory,theNationalOpticalAstronomyObservatories,andtheHarvard-SmithsonianCenterforAstrophysics,havebeguntoarchivemostdataobtainedfrommajortelescopes.Thesedataarevaluedandusedbroadlybyastronomers.Nevertheless,archivalactivitiesremainofgenerallylowpriority.
Althoughtheolderastronomicaldataconsistofphotographicplatesandotheranalogdata,virtuallyalldatatodayarecollecteddigitally.Therealsohavebeenmajoreffortstodigitizeoldphotographicdatatoallowtheiranalysisbycomputer.Anexampleofthisisthedigitizationofawhole-skysurveybytheSpaceTelescopeScienceInstitute,andthissurveyisnowavailableforsaleonCD-ROMfromtheAstronomicalSocietyofthePacific.Recently,theastronomicalcommunityadoptedastandardformatfortransfersofdigitalfiles(FITS).Withtheadventofdigitaldata,therealsohasbeenanevolutionfromindividualdataanalysispackagestoafewwidelydistributedpackages(e.g.,IRAF,AIPS,VISTA,XANADU),whichprovidestandardtoolsforbaselineanalysis.
BecauseofthefilteringanddistortionproducedbytheEarth'satmosphere,theamountofenergyemittedbycelestialbodiesthatcanbedetectedonthegroundislimitedsignificantly.Observationsfromspaceabovetheatmosphereremovesuchlimitations.Fromitsinception,spaceastronomyandastrophysicshavebeenmostlyunderNASA'spurview,althoughsomeimportantexperimentshavebeen
![Page 67: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/67.jpg)
financedbytheDepartmentofDefense.Thedataarecollectedthroughtelescopesanddetectorsplacedonairbornedevices(balloonsorplanes),rockets,NASA'sSpaceShuttle,andorbitingsatellites.Thelargestvolumeofdataiscollectedbysatellites,andmostofthesemissionsareinternationalcollaborations.TheU.S.portionhasalwaysbeenhandledbyNASA.
WithinNASA,spaceastronomyandastrophysicsareorganizedindifferentwavelength-baseddisciplines,reflectingtheorganizationinthescientificcommunity.Thesedisciplinesincludetheinfrared,whosemaindatacenteristheInfraredProcessingandAnalysisCenterinPasadena,California,wherethedatafromtheInfraredAstronomySatellitemissionarearchived;theopticalandultraviolet,withdatacentersattheSpaceTelescopeScienceInstituteinBaltimore,Maryland,wheretheHubbleSpaceTelescopedataarearchived,andattheNASAGoddardSpaceFlightCenterinGreenbelt,Maryland,wheretheInternationalUltravioletExplorerarchiveresides;andhigh-energyastrophysics,whichmaintainsx-raydataattheEinsteinObservatoryDataCenterinCambridge,Massachusetts.
Table2.1providesarepresentativesampleofNASAAstrophysicsArchives.TheearlierNASAastrophysicsprojectswereso-called"principalinvestigator"missions,whereacontractwasawardedtoagroupofprincipalinvestigators,whobuiltthehardware,receivedthedatafromtheexperiments,andanalyzedandinterpretedthem.Theseprincipalinvestigatorshadnoclearlystatedguidelinestopreparedataforarchiving,otherthantodeliverthereduceddatatotheNASAdatadepositoryattheNationalSpaceScienceDataCenter(NSSDC)attheNASAGoddardSpaceFlightCenter.Documentationgenerallywasminimal,andthedata,whichoftenwerenotwell-documentedorwell-organized,weredifficulttoretrieveforscientificuse,eveniftheywereadequatelyphysicallypreserved.
Ithasbecomefullyapparent,however,thattheuniquenessandhigh
![Page 68: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/68.jpg)
acquisitioncostofthesespacedatamaketheireffectivepreservationandarchivingahighpriority.Evenaftertheactiveoperationofaspaceobservatoryhasended,thedatatypicallyareretrievedandusedbyscientistsformanymoreyears.Asaresult,thesituationhasimprovedconsiderablyattheNSSDCinrecentyears.Moreover,NASAnowfundswavelength-specificscientificdatacenterstoprocessthedata,eliminateanomaliesinthedata,andprovidesoftwareforscientificanalysis.
![Page 69: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/69.jpg)
TABLE2.1ARepresentativeSampleofNASAAstrophysicsArchives,bySatelliteMission
HighEnergyAstrophysicalObservatory2
InternationalUltravioletExplorer
InfraredAstronomicalSatellite
HubbleSpaceTelescope
Datatype X-raydata Ultravioletdata Infrareddata Optical/Ultravioletdata
Yearoflaunch
1978 1978 1983 1990
Duration 2.5years Ongoing 300days Ongoing
Totaldatavolume(gigabytes)
~100 ~100 ~150 ~5500byyear2005
Datacenter EinsteinObservatoryDataCenter,Cambridge,Massachusetts
NationalSpaceScienceDataCenter,Greenbelt,Maryland
InfraredProcessingandAnalysisCenter,Pasadena,California
SpaceTelescopeScienceInstitute,Baltimore,Maryland
![Page 70: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/70.jpg)
Page20
PlanetaryScienceData
Planetarydataalsoareacquiredbybothground-basedandspace-basedobservations.Planetarydataincludeobservationsoftheentirephysicalsystemandforcesaffectingaplanetorotherbody,includingthegeologyandgeophysics,atmosphere,rings,andfields.Thesensorsusedcollectdataacrossmuchoftheelectromagneticspectrum.Currently,mostplanetaryobservationsaresupportedbyNASA,eitherasthedirectresultofplanetarymissionsorasground-basedobservationsthatsupportamission.Overthepastthreedecades,NASAhassentroboticspacecrafttoeveryplanetinthesolarsystemexceptPluto,totwoasteroids,andtoacomet.MenhavewalkedontheMoon,performedexperimentsthere,andreturnedsamples.Theknowledgewehaveaboutthebodiesinthesolarsystem,withtheexceptionofourownplanet,comesmostlyfromspacemissions.Insomecases,suchasthegasgiantsJupiter,Saturn,Uranus,andNeptune,roboticspaceprobeshaveprovidedmostofourcurrentknowledge.Manyofthesatellitesoftheotherplanetswerenomorethanpointsoflightwithminimalspectralandlight-curvemeasurementsbeforetheVoyagermission.Noweachisrecognizedasaseparateworldwithhighlyindividualcharacteristics.
Thescientificandhistoricalimportanceofspace-basedplanetaryobservations,therealizationthatadditionalmissionscannotreplicatetheoriginalobservations,andtheexpenseofplanetarymissionsallpromptedNASAtocreatethePlanetaryDataSystem(PDS)toimprovetheacquisition,archiving,anddistributionofplanetarydata.ThedevelopersandcurrentstaffofthePDSrecognizethatthedatafromplanetarymissionsmakeupthescientificcapitaloftheagency'splanetaryexplorationprogramandthatthesedataareanationalresource.ThePDStriestoacquireallexistingplanetarydatafromNASA'smissionsandevenfrominternationalventures,inordertohaveacompletearchiveofourexplorationofthesolarsystem.In
![Page 71: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/71.jpg)
additiontothespace-basedmeasurements,thePDSacceptsrelevantground-basedobservationsandlaboratorymeasurementsthatsupportplanetarymissionsbyprovidingbaselineorcalibrationdata.Abasicconditionforacceptanceisthatthedatasetmustbeproperlydocumentedandincludeallrelevantancillarydata,includingplanetandspacecraftephemerides,calibrationtables,andexperimenternotesabouttheshortcomingsofthedata.MembersofthePDSscientificstaffandscientistsinthecommunitywhohaveexpertisewithintherelevantdisciplinespeer-revieweachdataset.
OneofthemoreimportantcontributionsofthePDS,especiallywithregardtotheongoingpreservationofdatainausefulform,istheelectronic"publication"ofthemajorityofthedatafrommanyplanetarymissionsintheformofCD-ROMs.Theseincludenotonlythedata,butalsodocumentation,formatspecifications,ancillarydata,andeven,insomecases,displayandanalysistools.
SpacePhysicsData
Spacephysicsinvolvesthestudyofthelargeststructuresinthesolarsystemtheplasmaenvironmentsoftheplanetsandotherbodiesandthesolarwind.Thoseenvironmentsconsistofplasmasrangingfromlowenergies(thethermalcomponent)tochargedparticlesofhighenergies,includingcosmicraysacceleratedbygalacticprocesses.Theyalsoconsistofthemagneticfields(iftheyexist)ofplanetsortheSun,aswellaselectrostaticandelectromagneticfieldsgeneratedfromnaturalinstabilitiesinplasmasandcharged-particlepopulations.Furthermore,inmanylocales,suchascometsandtheEarth'sionosphere,dustandneutralgasesplayanimportantroleinmediatingthebehaviorofplasmasandelectromagneticfields.Asaconsequence,thefieldofspacephysicsrequiresabroadarrayofsensorsandinstrumentsatalllevelsofcomplexity.
Manyinstrumentsmakeinsituobservations,butnoveltechniquesenableremotesensingofvariousplasmaregimes.Becausesomeof
![Page 72: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/72.jpg)
themostapparentmanifestationsofspacephysicsprocessesresultinthenorthernlightsandinplanetary-scalemodificationsoftheterrestrialmagneticfield(andsubsequentcatastrophiceffectsonpowergridsandcommunications),spacephysicsreliesheavilyonawidearrayofground-basedobservations,includingmagnetometers,ionosphericsounders,incoherentradarfacilities,
![Page 73: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/73.jpg)
Page21
all-skycameras,andphotometers.Inaddition,abroadrangeofground-basedandspace-basedsolarmonitorshasbecomecrucialtostudythecorrelationsbetweenvariousdisruptionsintheterrestrialplasmaenvironmentandsolaractivity,includingsunspots,flares,andprominences.
Formanyreasons,itisessentialtopreservespacephysicsdataforlongperiodsoftime.TheSundrivessolar-terrestrialrelationships,andmanystudiesrequireobservationsover22-yearsolarcycles.DuringthiscycletheSunreversesitsmagneticpolaritytwiceandgoesthroughperiodsofincreasedactivitywithsunspotsandassociatedflares.Atsolaractivityminimum,flareandsunspotactivitydecreases,butexpandedcoronalholesappear.Longintervalsofrecordsarerequiredbecauseeachsolarcycleisdifferentfrompreviousonesandbecausetherearelong-termdeviations,suchastheMaunderminimum,from"normal"patterns.Fromtheterrestrialpointofview,therearemotionsofthemagneticdipoleandevenmagneticfieldreversalsontimescalesofthousandsofyears.
Becausemanyspacephysicsobservationsaretakeninsitu,modelsofthemagnetosphereneeddatacollectedbymanyspacecraft,havingdifferentkindsoforbitsandtrajectories.Tomakesenseoutofdatafromoneofthesemissions,itisimportanttobeabletoexaminewhatanotherspacecraftinadifferentorbitfound.Onlybypreservingthedatafromnumerousmissionsdoweacquireasufficientarchive.
Spacephysicshasgeneratedabout50gigabytesofdataperyearoverthelast30years.ThefieldhasenjoyedthisextraordinaryproductivityprimarilybecausemostmissionswereinEarthorbitandweretrackedcontinuouslyforyears.Manyofthesedatasetswere"archived"bysendingthetapesandsometimestherelevantdocumentationtotheNSSDC.Copiesofthedataonmicrofilmoronothermediaweresentthereaswell.Unfortunately,foreverywell-prepared,thoroughly
![Page 74: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/74.jpg)
documentedspacephysicsdatasetattheNSSDC,thereareseveralpoorlypreparedandimproperlydocumenteddatasets.Fortheearliestspacemissions,thearchivingtechniqueswereundeveloped,andarchivingwasnotdeemedahighpriority.Thus,therearemanydataattheNSSDCthatmostscientistswouldfinddifficulttousewithonlytheinformationoriginallysupplied.GiventherecentemphasisontheproperpreservationofdataandtheimportanceofarchivingpromptedinpartbytwoGeneralAccountingOfficereports(1990a,b)andalsobyaheightenedawarenessanddesireforhigh-qualityarchivesbythecommunitymanyrecentlyarchiveddatasetsareinbetterconditionthantheirpredecessors.EventhoughtheSpacePhysicsDataSystemhasbeeninexistenceonlysince1993,themoreadvanceddataactivitiesinotherdisciplineshaveinfluencedthespacephysicscommunityfavorably.Hence,itisbecomingmorelikelythatthedatanowbeingsubmittedareofahigherquality,havemoreadequatedocumentation,andaremorecompletethanearlierdatasets.
NOAA,NSF,theDepartmentofDefense,privateandeducationalinstitutions,andforeignorganizationstypicallysupporttheground-basedobservations.Mostofthesedata,notmanagedbyNASA,eventuallycomeunderthepurviewoftheNationalGeophysicalDataCenter,operatedbyNOAAatBoulder,Colorado.Thecenter'sholdingsconsistofover300digitalandanalogdatabases,someofwhichareverylarge.However,manyimportantdatasetsstillresidesolelyinthehandsoftheoriginalinvestigators,themilitary,orforeignsources.
AtmosphericScienceData
Atmosphericsciencedatasetsarediverseandpresentavarietyofproblemsfordistribution,archiving,andlaterinterpretation.Somedatasetsontheatmospherestandoutasthelargestinanyscientificdiscipline,particularlythosefromremotesensingbysatelliteorradar;othersconsistofcontributionsfromthousandsofindividualsallover
![Page 75: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/75.jpg)
theworld,andtheprovenanceofthosedataissometimesuncertain.Manydatasetsspandecades,andafewspanmorethanacentury,withaccompanyingproblemsduetolackofhomogeneityinmeasurementtechniquesandsamplingstrategies.ThelargestatmosphericsciencedataholdingsintheUnitedStatesarethoseofthefederalgovernment.However,significantamountsofmaterialareavailableonlyfromstateorprivatesources.
![Page 76: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/76.jpg)
Page22
Notallatmosphericdatasetsarelargeandconspicuous;manyaresmall.Therearehundredsofdatasetsofonlyafewmegabytesorless.Therearealsomanymedium-sizeddatasetsthatrangefromperhaps100megabytestotensofgigabytes,aswellasverylargedatasets,manyterabytesinvolume.Table2.2providesasamplingofsomeofthelargerdatasets.Datavolumedoesnotdrivethecostofarchivingsmall-sizedandmedium-sizeddatasetsifpropertechnicalchoicesaremade.Rather,itisthelabor-intensiveprocessofreadyingadatasetforindefinitepreservationthatcanbecostly.
Manyatmosphericdatasetsaredynamic,continuallygrowingorbeingotherwisemodified.Becauseweatherkeepsoccurring,observationaltimeseriesfromoperationalmeteorologicalactivitiesarenever"complete."Incontrast,fieldprogramsusuallyhavefiniteextent,andtheresultingdatasetshaveadefiniteend.However,manyrecentlarge,complexfieldprogramshavespawnedassociatedmonitoringactivitiesthathavecontinuedaftertheinitialphasesoftheproject.Despitethefrequentusageoftheterm"experiment"todenotefieldprograms,theseintensiveeffortsareobservational,ratherthanexperimental,exercises.Sometrulyexperimentaldataexist,includingafewdatasetsthatincludetheresultsfromsuchworkassensordevelopmentandtests,fluiddynamicsexperiments,thermodynamicmeasurements,andlaboratorychemicalstudies.Nevertheless,thevastmajorityofatmosphericsciencedatadescribeobservationsofever-changingphenomena,andthustheyareunique,valuable,andirreplaceable.
Formuchmeteorologicalandclimateresearch,aswellasformanyapplications,itisessentialtohavearchivesofglobaldata.ThisgoalhasbeenlargelyachievedintheUnitedStates,althougholderdatasetsstillneedtobedigitized.Collectively,U.S.archiveshavethebestsetsofglobaldataofanynation,particularlyfordatasincetheearly1950s.However,manyvaluabledatastoredinothernationsare
![Page 77: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/77.jpg)
inaccessibletoU.S.scientists(andinsomecasesareinaccessibletothosenations'scientistsaswell).
Meteorologicalandotheratmosphericdataareusedforvaryingpurposesondifferenttimescales.Itisconvenienttodelineatethree:(1)real-timeorcurrent,(2)recentpastorshort-termretrospective,and(3)distantpastorretrospective.Comparedwithotherdisciplines,meteorologicaldataareprobablyusedbyawidersegmentoftheU.S.populationthanotherscientificdata,becausetheyrelatedirectlytopractical,dailyconcerns.Thereisalargelayaudienceforweatherandclimateinformation.
Thereal-timeorcurrentuseofmostdatasetsusuallymotivatesdecisionsoncollectionstrategiesandthereforequality.Forexample,theprimaryreasonforcollectingmostmeteorologicaldataisforoperationalweatherforecastingandwarning,includingforecastingforaviationoperations.Thesedataareperishable,andtimelinessandspatialresolutionaremoreimportantthanabsoluteaccuracyandcontinuity.
Therearemanyrecentpastorshort-termretrospectiveusesofmeteorologicaldatathatcanbeofgreatsignificance.Inthiscontext,shorttermtypicallymeansfromyesterdaytoafewweeks,oroccasionallyafewmonths,ago.Agoodexampleofsuchusageofdataisinmonitoringthedevelopmentofadrought,asignificantfunctionforpredictingcropyields.Thetransportationindustryusespastdataforverificationofweatherconditionsfordelayclaims.
Mostretrospectiveusesrequiredatafromseveralmonthsoldthroughthetraditional(thoughnowsuspect)30-yearaveragingperiodsusedforclimatenormals.TheNationalClimaticDataCenterhandlesover100,000datarequestsperyear.Thestateclimatologistsandregionalclimatecentersalsoprocessaboutthismany.Legalproceedingsandinsuranceclaimsoftenrequireaccuratemeteorologicalrecordsforcorroborationofwitnesstestimony,criminalinvestigations,and
![Page 78: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/78.jpg)
validationsofweatherclaimsrelatedtoaccidentsandpropertydamage.Farmersandagronomistsneeddatacoveringmonthstoyearsforstudiesofpesticideresidueandtoxicology,decisionsaboutpesticidespraying,planningoffertilizerusage,andcropselection.Architectsandbuildingengineersrequiresite-specificdataonheatingandcoolingneeds,windstresses,snowloads,andsolaravailability.Airportdesignersneedprevailingwindpatterns.Utilityplannersneedaggregateheatingandcoolingloadsfortheirareas.
Long-termretrospectiveusesofatmosphericdataaretheprimaryconcerninthisstudy.Theseusesarehighlydiverse,difficulttopredict,andmakegreatdemandsonthedataandtheirassociatedmetadata.
![Page 79: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/79.jpg)
Page23
TABLE2.2VolumeofSelectedDataSetsinAtmosphericSciences
TypeofDataSet Comments DatesYearsVolumeAtmosphericInSituObservations
Worldupperair Twotimesperday,1,000stations
1962-1993
32 25GB
Worldlandsurface Every3hours,7,500stations
1967-1993
27 60GB
Worldoceansurface Every3hours(40,000observationsperday)
1854-1993
139 15GB
WorldobservationsduringFirstGARPGlobalExperiment
Surfaceandaloft,butnotsatellite
1978-1979
1 10GB
U.S.surface Daily,now9,000stations 1900-1993
94 15GB
SelectedAnalyses(mostlyglobal)
MainNationalMeteorologicalCenteranalyses
Twotimesperday,increasingat4GB/year
1945-1993
48 50GB
NationalMeteorologicalCenteradvancedanalyses
Fourtimesperday,increasingat19GB/year
1990-1993
4 58GB
NationalCenterforAtmosphericResearch'soceanobservationsandanalyses
Thirty-eightdatasets 8GB
EuropeanCenterforMediumRangeWeatherForecastingadvancedanalyses
Fourtimesperday,increasingat8GB/year
1985-1993
9 76GB
SelectedSatellites
NOAAgeostationarysatellites Half-hour,visibleandinfrared
1978-1993
16 130TB
NOAApolarorbitingsatellites 1978-1993
15
![Page 80: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/80.jpg)
Sounders(TIROSOperationalVerticalSounder)
15 720GB
AdvancedVeryHighResolutionRadiometer(4-kmcoverage,5channel)
15 5TB
NASAEarthObservingSatellite-AM
Indevelopment,88TB/year,level-1data
1998-
U.S.RadarData
Domainsof30to60km 1973-1991
19 1GB
NextGenerationRadarSystem(NEXRAD)a
650GBperradareachyear,104TB/yearfor160-sitesystem
1997- 100sTB
Notes:Manyotheratmosphericdatasetshavevolumesofonly1to500MB.
1MB(megabyte)=106bytes;1GB(gigabyte)=109bytes;1TB(terabyte)=1012bytes.
aFirstradarsweredeployedin1993.
Mostoftheusesdiscussedabovedonotneeddatacoveringmorethanafewdecades.Severaloftheseapplications,however,requirethelongesttimeserieswecanprovide.
Whentechnologyadvancesandaltersthemethodofdatacollection,thereisastrongimpetustoscrapthedatacollectedby"obsolete"technology.However,theseolddatamaybecomecriticalinthefuture.Anotableexampleinvolvesupperairwindprofiles.Thesewereoriginallycollectedbykitesandlaterbyradiosondescarriedonballoons.Withtheonsetofthespaceprogram,therewasanurgentneedfordetailedlow-altitudewinddataforanalysisofstressesonrocketsatlaunch.
![Page 81: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/81.jpg)
Appropriatedatacouldnot
![Page 82: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/82.jpg)
Page24
beobtainedfromradiosondes,becauseoftheirhighascentrate,butolderkite-baseddata,whichhadbeenscheduledfordisposal,wereavailable.Fortunately,theyhadnotyetbeendestroyedwhentheywereagainneeded.
Therehavebeendramaticretrospectiveusesformilitarypurposes(e.g.,Jacobs,1947).PlanningfortheD-dayinvasionofFrance,bombingrunsoverJapan,andtherecentdesertwarinIraqallrequireddetailedclimaticinformation,somelongthoughtuselessbutnotyetdiscarded.Suchunexpectedusesrequiretheretentionofmanytypesofdatafrommanyplacesforalongtime.Sincethefirstflightsofmeteorologicalsatellitesin1959,wealreadyhavehadseveralexamplesofimportantretrospectiveusesofsatellitedatasets.Forinstance,acombinationofreprocessedNimbus-7satellitedataandolddatafromtheDobsonnetworkhelpedtoconfirmtherecurringseasonallossofstratosphericozoneovertheAntarcticintheearly1980s.
Ifmeteorologistsaretostudypastweatherevents,suchasseverehurricanes,damagingwinterstorms,oroutbreaksoftornadoes,theymusthaveattheirdisposalalldatafortheperiodsoftimeandgeographicalareasinvolved.Hurricanetrackrecordsspanningmorethanacenturyarestillregularlyusedforbothresearchandoperationalpurposes.
Anincreasinglysignificantuseofmeteorologicaldataisthemonitoringoftheclimateoftheplanet.Althoughbarelytwodecadesagothestudyofclimatewasnotaveryhighpriority,todayclimateresearchissuesareprominent;someofthenation'sleadingscientistsspecializeinclimatestudies,andpolicymakersseekinformationonlikelyclimaticconditionsofthefuture.Theimportanceofoldatmosphericdatahasbecomeclear,butthereanalysisoftheseolddatainthesearchfortrendshasoftenfoundtheminadequateandpoorly
![Page 83: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/83.jpg)
documented.Thegrowinginterestinglobalclimatechangeandthedifficultieswithhistoricaldatathatithelpeduncoverhavestronglymotivatedearthscientiststotakeaseriousinterestinthelong-termpreservationofatmosphericdata.Similarly,studiesoflong-termwaterandlandusagerequiretimeseriesofmanydecades,ormore.Suchdataneedsalsoapplytoplanningaquiferusageandstudiesondeforestationanddesertification.
Somehistoriansexamineconnectionsbetweenenvironmentalconditionsandhumanevents.Thetimescalesstudiedcanrangefromtheimmediate,suchastheinfluenceofweatheronbattles,totheverylongterm,suchastheriseordeclineofacivilizationaffectedbywateravailability.Workersinthisfieldoftensearchthroughtheoldestexistingdataandhaveevenprovidedmeteorologicalinformationtoatmosphericscientistsfromunconventionalsourcessuchasdiariesandagriculturalrecords.
Contemporaryarrangementsforthestorageandarchivingofatmosphericdataarediverse,complex,andpresentmanyproblems.Someofthesearrangementscouldbeimproved.Atmosphericdataareinmanylocations,andtheyhaveabroadrangeoflifecycles.Difficultproblemsariseinpreparingmetadata,packagingdataforextendedarchiving,motivatingresearcherstopreparetheirdataforusebyothers,andsimplydealingwiththelargesizeofsomeoftheatmosphericdatasets.Criteriaforidentifyingdatasetstosaveindefinitelyarenotnecessarilyobvious.Finally,anyproposedsolutionsmustbemadeinfullrecognitionoftheirimpactonbudgetsandotherresources.
GeoscienceData
Spatially,thedomaincoveredbythegeosciencesextendsfromtheEarth'scoretothesurfaceandintospace.Temporally,itcoversbroadtrendsfromtheremoteoriginsoftheEarthtopossiblefuturescenarios,butitalsoisconcernedwithrapidlyvarying,oftenshort-
![Page 84: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/84.jpg)
livedphenomena.Datainthegeosciencesfallintotwobroadcategories.Oneistheobservationanddescriptionofuniqueevents,suchasearthquakes,volcaniceruptions,andfloods.Inmostcases,suchdataneedtobearchivedforalongtimeperiod,regardlessoftheirquality.Theothercategoryconsistsofobservationsofquantitiescontinuousinspaceandtime,suchasgravityandtheEarth'smagnetismandstructure,seismicsampling,andgroundwaterdistribution.
![Page 85: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/85.jpg)
Page25
Thevolumeofgeosciencedataobtainedwithpublicfundinghasincreaseddramaticallyoverthepastfewdecades.Thisincreaseistheresultofseveralconvergingfactors,includingtheextremelyvariedtypesofobservationaldatacollectedbythescientificcommunity;thelargevolumesavailablethroughbettermeasurementtechniques,moresophisticatedinstrumentation,andadvancingcomputertechnology;andincreasingdemandfromnotonlythescientificcommunitybutalsothegeneralpublic,includingengineers,lawyers,andstatisticians.Nongovernmentalandcommercialinstitutionsalsoaremajorcollectorsandsourcesofpertinentdata.
TwoexamplestheLandsatdatabaseandthenation'sholdingsofseismicdataillustratemanyofthecharacteristicsandissuesinherentinthelong-termarchivingofgeosciencedata.OtherexamplesareprovidedintheworkingpaperoftheGeoscienceDataPanel(NRC,1995).
TheLandsatdatabaseconsistsofmultispectralimagesoftheEarth'ssurface,whichhavebeenaccumulatingsincethelaunchofLandsat1inJuly1972.Thearchiveincludesdigitaltapesofmultispectralimagedatainseveralformats,black-and-whitefilm,andfalse-colorcompositesofsynopticviewsoftheEarth'ssurface,allfrom700kminspace.ThisdatabasethusconstitutesanimportantrecordoftheevolvingcharacteristicsoftheEarth'slandsurface,includingthatoftheUnitedStates,itsterritories,andpossessions.Therecorddocumentsnotonlytheresultsofvariousfederalgovernmentpoliciesandprograms,butalsothoseofmanystateandlocalgovernmentsandprivateprogramsandactivities.Itfurtherprovidesdocumentationoftheimpactofvariouslarge-scaleepisodicevents,suchasfloods,storms,andvolcaniceruptions,andisofgreatvaluetobothcurrentandfuturepublicandprivateactivities.
Landsatdataarecurrentlyavailableineitherimageordigitalform
![Page 86: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/86.jpg)
fromtheEarthResourcesObservingSystem(EROS)DataCenterinSiouxFalls,SouthDakota.TheLandsatsatelliteswereoriginallyunderthecontrolofNASA.However,in1980theybecametheresponsibilityofNOAA.ThecurrentlyoperationalLandsat4and5spacecraftwereplacedundercontroloftheEOSATCompanyin1985.UnderEOSAT'scontrol,thedataarenotinthepublicdomain,aresignificantlymoreexpensive,andcarryproprietaryrestrictionsontheiruse.BeginningwiththelaunchofLandsat7,responsibilityfortheLandsatsystemwillpassbacktoNASA,whichwillbuildandlaunchthesatellitethelate1990s.NASAwilloperatethesystemsanddeliverthedatatotheEROSDataCenterfordistribution.Thedatawillonceagainbeinthepublicdomain,althoughtheEROSDataCenterstillplanstochargemorethanthemarginalcostofreproductioninfulfillinguserrequests.ItisnowwidelyrecognizedthattheshifttoprivatecontroloftheLandsatsystemsignificantlyreducedtheaccesstoanduseofthedata.
AsofJanuary1993theLandsatdatabasecontainedmorethan100,000tapesofvaryingdensityandformats,andover2,850,000framesofhardcopyimagery.DigitalLandsatdataareusuallydeliveredtousersasmagnetictapes.Othermedia,suchasCD-ROMsandstreamingtapes,alsomaysoonbeused.Datarequestsoccurmostfrequentlyinreferencetoaparticulargeographiclocation,commonlyexpressedaslatitudeandlongitude,foraparticulartimeoftheyear,andmeetingcertaincloudcoverlimitations.
Landsatdataareusedwidelyacrossthespectrumofgeoscienceapplicationsinbothcivilianandmilitaryoperationsandresearch.Theseincludesuchapplicationsastheimpactofhumanactivitiesontheenvironment,land-useplanningandresource-allocationdecisions,disasterassessment,measurementandassessmentofrenewableandnonrenewableresources,andmanyothers.TheyareusedalsobythegeneralpublicinanycontextwhereviewsoftheEarth'ssurfaceareneeded.Examplesincludesuchdiverseapplicationsasvisualaidsin
![Page 87: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/87.jpg)
elementaryandsecondaryeducation,backgroundforhighwaymaps,andillustrationsformagazinearticlesaboutvariousregionsoftheworld.
TheLandsatdatabaseisuniquebecausedatafromanygivenareamaybeavailableatsampledinstantsoveraperiodofmorethan20years,thusmakingpossibleforthefirsttimethestudyofslowlyvaryingphenomenaonEarth.Eventhoughdatafromtheearly1970smaynowhavealowfrequencyofuse,theirpotentialvalueremainshighandtheyrepresentasignificantarchivalrecord.
![Page 88: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/88.jpg)
Page26
IncontrasttotheLandsatdatabase,seismicdataarebroadlydistributedratherthanconcentratedinonedatacenterorsystem.Thisexamplefocusesprimarilyonseismicdatafromearthquakesandexplosions,bothnuclearandchemical.Somefederalagencies,notablytheU.S.GeologicalSurvey(USGS)andNOAA'sNationalGeophysicalDataCenter,collectandarchiveimportantseismicexplorationdata.Inaddition,theDepartmentofDefense(DOD),DepartmentofEnergy(DOE),U.S.NuclearRegulatoryCommission(USNRC),USGS,andNOAAhavebeenandcontinuetobeengagedinthecollectionandarchivingofearthquakeandexplosiondata.Theseagencyprogramsarecarriedoutindependentlyofoneanotherwiththeresultthateachagencyhasitsowndatamanagementandarchivingpoliciesandpractices.Consequently,thesedataholdingsaregreatlydistributedamongtheagenciesinfundamentallydifferentformsandformats.
Globalearthquakedatahavebeenacquiredsystematicallysincetheearly1960s,whentheU.S.CoastandGeodeticSurveyoftheDepartmentofCommercedeployedaglobalseismicnetworkofabout130stationscalledtheWorld-WideStandardizedSeismographicNetwork(WWSSN)andproducedanarchiveofphotographicfilm''chips"ofthe24-hour/dayrecordingsatallstations.Researchersandotherapplicationscouldobtaincopiesoftheseanalogdataatmodestcost.Thesuccessofthisprecursortotoday'sglobaldigitalnetworkcannotbeoverestimated,becausetheavailabilityofaglobaldatasetinstandardformatfromwell-calibratedinstrumentspermittedpreviouslyimpossiblestudiesofglobalseismicitypatterns,earthquakesourcemechanisms,andtheEarth'sstructure.ThesestudieshaveledtoavastlyimprovedunderstandingofthedynamicsoftheEarthasawhole,includingtectonicplatemovements,generationofnewoceanfloor,evolutionoftheEarth'scrust,andoccurrencesofdestructiveearthquakesandvolcaniceruptions.
![Page 89: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/89.jpg)
TheUSNRChasfundedtheoperationofregionalseismicnetworksovermuchoftheUnitedStates,somesincetheearly1970s,insupportofprogramsforthesitingandsafetyofnuclearpowerplants.USGSalsohasco-fundedorseparatelyfundedregionalnetworksforearthquakehazardassessmentsinseismogenicareasoftheUnitedStates.However,changesinthefundingprioritiesofUSGSandUSNRCinrecentyearshaveresultedintheinterruptionordiscontinuationofsomeofthesenetworks,particularlyintheeasternUnitedStates.Thishasadverselyaffecteddataflowandseismicresearch.Seismicdatahavebeenarchivedinabroadlydistributed,nonuniformmodebytheorganizationsmostlyuniversitiesthatcollectedthedatafromthevariousnetworks.Manyofthesedatahavelong-termvalueforcharacterizingindetailthetectonicactivityofseismogenicareasintheUnitedStates.
Inadditiontothefederalagencies,severalprivatesectororganizationsnowcollect,distribute,andarchiveseismicdatasetsoflong-termsignificance.TheIncorporatedResearchInstitutionsforSeismology(IRIS),anot-for-profitconsortiumofuniversitiesandprivateresearchorganizations,isengagedinamajordevelopmentofaglobaldigitalseismicnetworkofabout100continuouslyrecordingstations(theGlobalSeismicNetwork)incooperationwithUSGS.Theprojectalsoincludesaversatile,portabledigitalseismicarrayofupto1,000stationsthatcanbedeployedforvarioustimeintervalsforspecialseismologicalstudies.DatasetsfromtheglobalandportablearrayarebeingpermanentlyarchivedattheIRISDataManagementCenter(DMC)inSeattle,Washington.TheDMCalsoservesastheInternationalFederationofDigitalNetworks'centerforcontinuousdigitaldata,whichaddsobservationsfrommanyadditionalstationstothearchive.IRISfundingforthisactivitycomesprimarilyfromNSFandDOD.Finally,individualuniversities,suchastheCaliforniaInstituteofTechnology,theUniversityofCaliforniaatBerkeley,theUniversityofAlaska,theUniversityofWashington,Columbia
![Page 90: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/90.jpg)
University,MemphisStateUniversity,andSt.LouisUniversity,alsomaintainarchivesoftheseismicdatathattheycollect.
ThevolumeofdigitaldatacurrentlyheldandanticipatedtobeacquiredbytheIRISDMCissummarizedinTable2.3.Althoughsomedatasetshavebeencompletedbecausetheyareproject-orprogram-specific,mostofthecurrentoperationscontinuetoaddlargeamountsofnewdataandimplementnewtechnologyforrecording,storage,retrieval,anddistribution,therebycreatingadynamic,highlydistributedarchivewhoseholdingsandaccessprotocolschangewithtime.Forexample,theIRIS
![Page 91: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/91.jpg)
TABLE2.3SummaryofActualandProjectedDataVolumesArchivedintheIRISDataManagementCenter
NumberofInstruments
ProjectedDataVolumes(gigabytes/year)
1994 1995 1996 1997 1998 1999
GSN 100 1,159 2,359 3,959 6,003 8,047 10,091
FDSN 146 370 670 1,070 1,530 2,050 2,670
JSParrays 5 1,095 2,190 3,650 5,475 7,300 9,125
OSN 30 0 0 15 58 218 498
PASSCAL-BB 500 1,318 2,277 3,556 5,154 7,073 9,312
PASSCAL-RR 500 542 885 1,341 1,912 2,597 3,397
Regional-Trig 500 150 290 490 730 1,030 1,390
Total 1,781 4,634 8,671 14,081 20,862 28,315 36,483
Note:Abbreviationsareasfollows:
GSN GlobalSeismicNetwork(IRIS)
FDSN FederationofDigitalSeismicNetworks
JSP JointSeismicProgram(withtheformerSovietUnion)(IRIS)
OSN OceanSeismicNetwork
PASSCAL-BB
ProgramforArrayStudiesoftheContinentalLithosphereBroadband(IRIS)
PASSCAL-RR
ProgramforArrayStudiesoftheContinentalLithosphereRegionalRecordings(IRIS)
Regional-Trig
RegionalTriggeredRecordings
![Page 92: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/92.jpg)
aProjectednumbersbyyear2000.
Source:IRISDataManagementCenter,privatecommunication,1994.
DMCrecentlybeganprovidingbotharchivedandnear-real-timedataontheInternet,therebygreatlyfacilitatingrapidaccess.
SignificantvolumesofexploratoryseismicdataobtainedbygeophysicalcontractorsareheldbytheDepartmentofInterior.Thesedataareusedbythefederalgovernmentandbypetroleumcompaniesinpreparingforoilandgasexplorationactivities.Thereare,however,variousproprietaryrestrictionsonaccesstothesedatabyotherusers.
Insummary,thesourcesofseismicdataarediverse,thearchivingishighlydistributed,andthedataareinmanydifferentformatswithdifferentmetadatastructures.Moreover,datasetswithlong-termscientificandhistoricalvalueresideinbothfederalandnongovernmentalorganizations,althoughinmostofthelattercasesfederalfundshavepaidatleastinpartfortheiracquisition,archiving,anddistribution.
Theusersofseismicdataaremanyanddiverseaswell.Theyincludefederalandstategovernmentagencies,universities,andprivateindustry,particularlythepetroleumindustry.Thousandsofindividualsaredirectorindirectusersofseismicdata.Certainly,thepublicasawholeisanenduserofhistoricalseismicdataandinformation,includingthelocation,magnitude,anddamageassociatedwithearthquakesaroundtheworld.
Mostseismicdatasetshavebeenorarenowusedbothforoperationalpurposesandforresearch,althoughforoperationalactivitiesthedataareusedprimarilyimmediatelyfollowingtheircollection.Examplesoftheiruseforoperationalactivitiesincludetsunamiwarningandtherapiddeterminationofthemagnitude,location,andfaultmechanismofdestructiveearthquakesandtheiraftershocks,bothtoinformthepublicandtoassistinemergencyresponseandspecialmonitoring.Onalongertimescalethedataareusedforhazardreductionandseismicsafetyinseismogenicregions,includinglocalzoningdecisionsforfuturedevelopment,andsitingandsafetyofcriticalfacilitiessuchasnuclearpower
![Page 93: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/93.jpg)
plants.Dataareobtainedandusedforcontinuousglobalmonitoringofearthquakeactivityandofthresholdorcomprehensivetestbansonundergroundnuclearexplosions.Ofcourse,therealsoisabroadspectrumof
![Page 94: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/94.jpg)
Page28
researchthatuseshistoricalseismicdata,includingstudiesofthephysicsofearthquakeandexplosivesources,propagationeffectsonseismicsignals,imagingoftheEarth'sstructuresatallscales,seismicitypatterns,andearthquakepredictionorhazardestimation.Olderdataareimportantandarecommonlyusedformostofthesetypesofresearch.Forexample,establishingtherecurrencerateforlarger-magnitudeearthquakesrequiresdecadestocenturiesofobservations,eveninthemostseismicallyactiveareas.
Inconclusion,mostoftheseismicdatahavelong-termvalueforscientificresearch,disastermitigation,andvarioussocioeconomicuses.Thedataarearchivedinabroadlydistributedmanner.However,onlyafractionofthearchiveddataareunderthedirectcontroloffederalgovernmentagencies,anditappearsthatmanyofthesedatasetsarenotconsideredofficialfederalrecords.Exceptformostcommercialexploratoryseismicdata,federalfundshavepaidformuchoftheinstrumentation,stationoperationandmaintenance,collection,storage,anddistributionofseismicdata.Theseimportantseismicdatasetsshouldbekeptindefinitelyinaformaccessibletoboththescientificcommunityandotherusers.
OceanScienceData
Theoceansandatmosphereareturbulentfluids,constantlychangingovermanyspatialandtemporalscales.Thenumeroustypesofdatathatdescribetheoceansareoftenunrelatedtooneanother,andeventhosethatarerelatedfrequentlyhavenonlinearandpoorlyunderstoodinteractions.Forexample,temperaturedatafromaspecificpointandtimeintheNorthAtlanticcannotbeaccuratelypredictedfromdatacollectedinthesameplacetheyearbefore,oreventheweekbefore,orfromdatacollectedatthesametime1,000kilometersoreven100kilometersaway,orfromsalinitydatacollectedatthesameplaceandtime.Eachdatumcontributesuniqueinformationaslongasitis
![Page 95: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/95.jpg)
accurate,correspondstoadifferentphysicalquantity,isobtainedfromadifferenttimeandplace,andcannotbeaccuratelycomputedfromotherexistingdata.
Onesourceofoceanographicdataisthefieldprogram.Largeandsmallfieldprogramsconductedinsupportofspecificresearchprojectsaretheprimecontributorsofinsituandinvitroobservationaldatasetsforalltheoceandisciplines.Insitudatasetsarethosethatarederivedbyprocessingthemeasurementsfromsensorsimmerseddirectlyintotheoceanenvironment.Processingofinsitudataislargelyautomated,andsothedatasetsarerelativelydense.Invitrodatasetsareproducedbylaboratoryanalysesofsamplescollectedfromtheoceanenvironment.Theselaboratoryanalysescombinesophisticatedmeasurementequipmentwithlabor-andtime-intensiveprocedures.Therefore,invitrodataaretypicallysparse.Remotelysensedobservationsalsomaybeassociatedwithfieldprogramdatabysynchronizinginsitusamplingwiththeuseofremotesensingplatforms.
Theharshandremotenatureoftheworldoceanenvironmenthasinhibitedtheestablishmentofaroutinedatacollectionsystem.Althoughseveralremotesensingplatformsdoprovidedailymonitoringofoceansurfaceconditionsonaglobalbasis,continuousmeasurementofsubsurfaceconditionswithadequatetimeandspaceresolutionforeffectivemonitoringisnotareality.Thelackofcontinuousandcomprehensiveoceanographicdatamaycontributemosttotheinconsistentdatamanagementpracticesandlackofcommunity-widestandardsfordatareportingandexchangeintheoceandisciplines.Becauseoftheneedfordailyglobalprediction,suchstandardsandpracticesaremuchmorehighlydevelopedintheatmosphericcommunity.TheestablishmentoftheGlobalOceanObservationSystempresentsanopportunitytoengagetheoceancommunityintheidentificationandimplementationofappropriatestandards.
![Page 96: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/96.jpg)
Likeotherobservationaldata,oceanographicdataextendbeyonddirectlyorremotelymeasuredobservationsoftheenvironment.Thedataproductsbasedontheanalyses,interpretations,andpresentationsofaggregatesofobservationsalsomustbeconsideredinthedesign,implementation,andmaintenanceofanydatamanagementandarchivingmechanism.Themoretraditionalproducts,suchasparametergridsandoutputfromoceanmodels,willsurelybesupplementedfrominnovativesources
![Page 97: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/97.jpg)
Page29
likelytoemergefromtheinteractivescientificcollaborationandvalue-addedservicesthatarebecomingincreasinglyavailablethroughelectronicnetworks.
TheprincipalfederalagencyoceandataholdingsareattheNOAANationalOceanographicDataCenter(NODC),theNASAPhysicalOceanographyDistributedActiveArchiveCenter(PO.DAAC)attheJetPropulsionLaboratory,andatseveralNavycenters,whichholdmostlyclassifieddatasets.Inaddition,significantamountsofdataareheldbytheuniversities.
LocatedinWashington,D.C.,theNODCarchivesphysical,chemical,andbiologicaloceanographicdatacollectedbyotherfederalagencies,includingdatacollectedbyprincipalinvestigatorsundergrantsfromtheNationalScienceFoundation;stateandlocalgovernmentagencies;universitiesandresearchinstitutions;andprivateindustry.ThecenteralsoobtainsforeigndatathroughbilateralexchangeswithothernationsandthroughthefacilitiesofWorldDataCenterAforOceanography,whichisoperatedbytheNODCundertheauspicesoftheNationalAcademyofSciences.TheNODCprovidesabroadrangeofoceanographicdataandinformationproductsandservicestothousandsofusersworldwide,andincreasingly,thesedataarebeingdistributedonCD-ROMsandontheInternet.Table2.4presentsasummaryoftheNODC'sdataholdings.
ThePO.DAACisamajorfederallysponsoredoceanographicdatacenter,whichisoperatedbytheCaliforniaInstituteofTechnology'sJetPropulsionLaboratoryinPasadena,California.AsoneelementoftheNASAEarthObservingSystemDataandInformationSystem,themissionofthePO.DAACistoarchiveanddistributedataonthephysicalstateoftheoceans.UnlikethedataattheNODC,mostofthedatasetsatthePO.DAACarederivedfromsatelliteobservations.Dataproductsincludesea-surfaceheight,surface-windvector,
![Page 98: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/98.jpg)
surface-windstressvector,surface-windspeed,integratedwatervapor,atmosphericliquidwater,sea-surfacetemperature,sea-iceextentandconcentration,heatflux,andinsitudatathatarerelatedtothesatellitedata.ThesatellitemissionsthathaveproducedthesedataincludetheNASAOceanTopographyExperiment(TOPEX/Poseidon,doneincooperationwithFrance),Geos-3,Nimbus-7,andSeasat;theNOAAPolar-OrbitingOperationalEnvironmentalSatelliteseries;andtheDOD'sGeosatandDefenseMeteorologicalSatelliteProgram.
SummaryOfMajorIssues
Theresultsofscientificresearcharedisseminatedinthiscountrythroughahybridsystemthatincludesprofessionalsocietyandothernot-for-profitpublishers,thecommercialsector,andthegovernment.Theformaljournalsarepublishedlargelybytheprofessionalsocietyandcommercialsectors,whilegovernmentagenciesmanagelessformalreports(grayliterature).Secondaryservices,suchasabstractingandindexing,provideaccesstothisliterature,increasinglybyelectronicmeans.Whiletherearestrainsinthissystembecauseofrisingcosts,increasingworkload,andissuesrelatedtotheprotectionofintellectualproperty,ithasservedU.S.sciencewellandhasbeenaninvaluablelinkintheprocessoftranslatingscientificadvancesintofurtheradvances,usefultechnology,andeconomicbenefits.
Thecurrentsystem,however,isnotwellsuitedtohandlethescientificelectronicdatabasesthatarethefocusofthisstudy.Thecostsofmaintainingthesedatabasesaretypicallytoogreattobecoveredbyuserfees;instead,thesedatabasesmustbeconsideredpartofthenationalscientificheritage.Somegovernmentagencieshaveacceptedresponsibilityformaintaininganddisseminatingdataresultingfromtheirownresearchanddevelopment.Insomecases,thissystemisworkingreasonablywell,butinothersthereareproblemsevenwithprovidingcurrentaccess.Archivingforthelongtermraisesquestionsinallcases,however.
![Page 99: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/99.jpg)
Ageneralproblemcommontoallscientificdisciplinesisthelowpriorityattachedtodatamanagementandpreservation.Experienceindicatesthatnewexperimentstendtogetmuchmoreattentionthanthehandlingofdatafromoldones,eventhoughthepayofffromoptimalutilizationofexistingdatamaybegreater.Forinstance,accordingtofiguressuppliedbyNOAA,NOAA'sbudgetforitsNationalDataCentersinFY1980was$24.6million,andtheirtotaldatavolumewasapproximatelyoneterabyte.In
![Page 100: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/100.jpg)
Page30
TABLE2.4NationalOceanographicDataCenterDataHoldings(asofOctober1994)
Discipline Volume(megabytes)Physical/ChemicalDataMasterdatafiles
Buoydata(wind/waves) 9,679
Currents 4,290
Oceanstations 1,645
Salinity/temperature/depth 1,557
BTtemperatureprofiles 872
Sealevel 125
Marinechemistry/marinepollutants 89
Other 68
Subtotal 18,325Individualdatasets,forexample
Geosatdatasets 12,841
CoastWatchdata 60,000
LevitusOceanAtlas1994datasets 4,743
Other(estimated) 11,000
Subtotal 88,584
TotalPhysical/Chemical 106,909MarineBiologicalDataMasterdatafiles
Fish/shellfish 115
Benthicorganisms 69
![Page 101: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/101.jpg)
Intertidal/subtidalorganisms 30
Plankton 32
Marinemammalsighting/census 21
Primaryproductivity 7
Subtotal 274Individualdatasets,forexample
Marinebirddatasets 52
Marinemammaldatasets 4
Marinepathologydatasets 4
Other(estimated) 200
Subtotal 260
TotalBiological 534
TotalDataHoldings 107,443
Source:NOAA,privatecommunication,1994.
![Page 102: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/102.jpg)
Page31
FY1994,thebudgetwasonly$22.0million(notadjustedforinflation),whilethevolumeoftheircombineddataholdingswasabout220terabytes!Duringthissameperiod,theoverallNOAAbudgetincreasedfrom$827.5millionto$1.86billion.
Withregardtolaboratorydata,governmentprogramshaveexistedsincethe1960stocompileresultsfromtheworldscientificliterature,tocheckthedatacarefully,andtopreparedatabasesofcriticallyevaluateddata.Forinstance,theNationalInstituteofStandardsandTechnologyoperatesitsStandardReferenceDataProgram,whichcoversabroadrangeofdatainphysics,chemistry,andmaterialsscience.TheDepartmentofEnergyalsosupportsanumberofdatacentersofthistype.Despitechronicunderfunding,theseprogramshaveproduceddatabasesoflastingvaluetothenation.Tociteoneexample,theMassSpectralDatabasemanagedbytheNationalInstituteofStandardsandTechnology,theNationalInstitutesofHealth,andtheEnvironmentalProtectionAgencycontainsspectraofover60,000compounds.Ithasbeeninstalledinmanythousandsofmassspectrometersthatarebeingusedformonitoringenvironmentalpollution,designingdrugs,characterizingnewmaterials,andmanyotherapplications.Thegovernmentinvestmentincreatingandmaintainingthisdatabasehasbeenrepaidmanytimesover.
Intheareaofobservationaldatabases,thesituationismixed.Federalagenciescollectlargeamountsofobservationaldata,whichinmanycasesarecontinuouslyaddedtotheavailablerecordofEarthandspaceprocesses.Thedatasetsresultingfromtheseactivitiessometimesarewell-documentedandmaintainedinreadilyaccessibleform;butinmanyothercases,theyareexceedinglydifficultorimpossibletoaccessoruse,andthusareeffectivelyunavailable.Ingeneral,theagenciesandotherorganizationsdoagoodjobofmakingdataandinformationavailabletothescientists(primaryusers)duringtheactivestagesofprojectsandforsometimeafterward.Examplesof
![Page 103: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/103.jpg)
notablesuccessesincludetheNASAPlanetaryDataSystem,wherethepremisehasbeenthatthedatahavelong-termvalueandmustbeaccessibleindefinitelyintothefuture,andtheNOAANationalDataCenters,wherethepolicyistomigratearchiveddatatonewmediaevery10years.
Technologicaladvanceshavekeptpacewiththelargegrowthindatavolumesinscientificdisciplinessuchthatthelong-termretentionofallornearlyallofthedatacollectedisfeasible.Indeed,inmostfieldstheentirecollectionofdatafromthepastisnotlargeincomparisonwiththecurrentandanticipateddatavolumesthatwillbecollectedduringonlyayearortwo.However,significantfractionsoftheolderdataaredifficultorinsomecasesimpossibletoaccess,becausetheyhavenotbeentransferredtonewstoragemedia.Thistransferoftenhasreceivedlowprioritybecausemanydatamanagementanddataretentionactivitiesarechronicallyunderfundedandjusthandlingthecurrentdataflowusesnearlyalloftheavailableresources.Thus,manyvaluabledatasetsarestoredonlow-densityroundtapesoronspecializedmagnetictapemediarequiringhardwarethatisnowobsoleteorinoperable.Forexample,alargevolumeoftheearlyLandsatcoverageoftheEarthresidesontapesthatcannotbereadbyanyexistinghardware.Recentdata-rescueeffortshavebeensuccessfulingettingolderdataintoaccessibleform,buttheseeffortsaretime-consumingandcostly.Thereasontheseeffortshavebeenundertaken,particularlyintheobservationalsciences,istherecognitionthatretrospectivedataarevitaltounderstandinglong-termchangesinnaturalphenomena.Giventheextraordinarilyrapidadvancesincomputingandstoragetechnologyinrecentyears,plannedperiodicmigrationofdatatonewmediawillbeincreasinglyimportantinallscientificdisciplinestoensurelong-termaccesstoourscientificdataresources.
Itisaxiomaticthatadatabasehaslimitedutilityunlesstheauxiliaryinformationrequiredtounderstandanduseitcorrectlythemetadatais
![Page 104: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/104.jpg)
includedintherecord.Anunambiguousdescriptionofthestorageformatisobviouslyessentialforinterpretationofanelectronicdatabase.Therequirementisevenmorestringenttosupportmeaningfulaccesstodataoverthelongterm,becausethehardware,software,andeventhelanguagebywhichformatsaredescribedwilllikelybedifferentdecadesandcenturiesfromnow.Thesameistrueregardingthescientificdetailsofthecontentofthedata.Auxiliaryinformationsuchasenvironmentalconditions(e.g.,temperatureandpressure),methodofcalibratingthe
![Page 105: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/105.jpg)
Page32
instruments,anddataanalysistechniquesmustbegiventobeabletofullyandcorrectlyusethedata.Providingthisinformationistimeconsumingandcostlyifdoneretrospectively,butmuchlesssoifitispreparedatthetimethedataarecollected.Documentationthatisinadequateforunderstandingandusingthedatagreatlydiminishesthevalueofthedata,particularlyforsecondaryandtertiaryusers.
Anothermajorprobleminhibitingaccesstodataisthelackofdirectoriesthatdescribewhatdatasetsexist,wheretheyarelocated,andhowuserscanaccessthem.This,too,isespeciallyaproblemforpotentialsecondaryandtertiaryusers.Inmanycasestheexistenceofthedataisunknownoutsidetheprimaryusergroups,andevenifknown,therefrequentlyisnotenoughinformationforapotentialusertoassesstheirrelevanceandusefulness.Thisrealizationhasresultedinaninteragencyeffort,ledbyNASA,tobuildaMasterDirectoryofGlobalChangeDataandInformation.ThisMasterDirectoryisintendedtoinformusersofwheredatasetsofpotentialinterestresideandhowtoaccessthem.Similardirectoriesareneededinotherscientificdisciplines,aswellasacrossalldisciplines.Thelackofadequatedirectoriesadverselyaffectstheexploitationofournationaldataresourcesandcommonlyleadstounnecessaryduplicationofeffort.
Asignificantfractionofthearchivedscientificdataisheldbythefederalagenciesthatcollectedthedataaspartoftheirmission.However,alargeamountofvaluablescientificdatagatheredwithfederalfundsisneverarchivedormadeaccessibletoanyoneotherthantheoriginalinvestigators,manyofwhomarenotgovernmentemployees.Inmanyinstances,theorganizationsandindividualsthatreceivegovernmentcontractsorgrantsforscientificinvestigationsareundernoobligationtoretainthedatacollected,ortoplacetheminapubliclyaccessiblearchiveattheconclusionoftheproject.Atbest,scientistsinthesamefieldmaybeabletoobtaindesireddatasetson
![Page 106: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/106.jpg)
anadhocbasisbycontactingtheoriginalinvestigatorsdirectly;secondaryandtertiaryuserstypicallyareunawareoftheexistenceofthedataandhavenomechanism(otherthanpersonalcontact)toaccessthedata.Thus,datasetsthatcommonlyaregatheredatgreatexpenseandeffortarenotbroadlyavailableandultimatelymaybelost,squanderingvaluablescientificresourcesandmuchofthepublicinvestmentspentinacquiringthem.Clearly,thereisagreatneedfortheagenciestogetmorereturnontheirinvestmentinsciencebythesimpleexpedientofmakingthedatacollectedundertheirauspicesaccessibletoothers.
Asseenfromthediscussioninearliersectionsandaddressedindetailintheindividualdisciplinepanelreports(NRC,1995),thereisalargeanddiversecollectionofscientificdataandinformationextantinfederalagenciesandnonfederalorganizations,includingstateandlocalagencies,universities,not-for-profitinstitutions,andtheprivatesector.Ataminimum,thosedatathatareacquiredwiththesupportoffederalfundingshouldberegardedaspartoftheNationalScientificInformationResource.
Finally,NARA'sholdingsofscientificandtechnicaldatainelectronicoranyotherformareverysmallincomparisontothedataholdingsoftheseotherorganizations.Moreover,NARA'sbudgetforitsCenterforElectronicRecords,whichhasformalresponsibilityforarchivingalltypesoffederalelectronicrecords,wasonly$2.5millioninFY1994,abudgetlowerthanthatofmanyoftheindividualagencydatacentersreviewedbythecommitteeinthisstudy.GivenNARA'scurrentandprojectedlevelofeffortforarchivingelectronicscientificdata,itisobviousthatNARAwillbeunabletotakecustodyofthevastmajorityofthescientificdatasetsthatrequirearchiving.Therefore,acoordinatedeffortinvolvingNARA,otherfederalagencies,certainnonfederalentities,andthescientificcommunityisneededtopreservethemostvaluabledataandensurethattheywillremainavailableinusableformindefinitely.Thechallengeistodevelopdata
![Page 107: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/107.jpg)
managementandarchivinginfrastructureandproceduresthatcanhandletherapidincreasesinthevolumesofscientificdata,andatthesametimemaintainolderarchiveddatainaneasilyaccessible,usableform.Animportantpartofthischallengeistopersuadepolicymakersthatscientificdataandinformationareindeedapreciousnationalresourcethatshouldbepreservedandusedbroadlytoadvancescienceandtobenefitsociety.
![Page 108: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/108.jpg)
Page33
3RetentionCriteriaandtheAppraisalProcessTheNationalArchivesandRecordsAdministrationappraisesandretainsrecordsonthebasisoftheirinformationalandevidentialvalue.Itisconcernedwithrecordsoflong-termvaluethoserecordsthatwillprobablyhavevaluelongaftertheyceasetohaveimmediate,orprimary,uses.Althoughscientificdatabasescanprovideevidenceoftheresearchconductedbyanagency,theirvalueisprimarilyinformational;itisbasedonthecontentoftherecordsratherthanontheirdescriptionofactivitiesbytheagencythatcollectedorcreatedthem.
Specialproblemsariseinappraisingscientificdatafortheirlong-termvalue,particularlybeyondthecommunityofresearchscientistsworkinginthespecificfieldtowhichthemeasurementsrefer.Scientificdataarevoluminous,constantlyincreasing,andoftendifficultforthoseinotherfieldstouseintheiroriginalformats.Thedatatypicallyareexpensivetocollect,providebaselinesforfutureobservations,enhanceunderstandingofotherdata,andareofimmenseimportanceforadvancingscientificknowledgeandforeducatingnewscientists.Thedataalsoareimportanttoanunderstandingoftheworldinwhichwelive;thedata(ortheconclusionsdrawnfromthem)maybeimportanttoeconomists,historians,statisticians,politicians,andthegeneralpublic.Atthesametime,itisdifficulttopredictthefullvalueofthedatatoresearchersandotherusersdecadesorcenturiesfromnow,althoughpastexperiencehasshownthatscientificdatacollectedmanyyearsagoprovideuniquecontributionstonewunderstandingofourphysicaluniverse.
![Page 109: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/109.jpg)
RetentionCriteria
Thecriteriathatfollowaretobeusedduringtheappraisalprocesstodetermineretentionofphysicalsciencedata.Theyshouldbeappliedbythoseresponsibleforstewardshiptoallphysicalsciencedata,whethercreatedbysmallindividualprojectsorinthecourseoflarge-scaleresearchprograms.Similarcriteriaandguidelinesmustbedevelopedfordatainotherdisciplines.ThisisatopicofprimaryconcernnotonlytoNARA,NOAA,andNASA,buttoallscientists,datamanagers,andarchivistswhoworkwithsuchrecords,andwasprovidedinthechargetothecommitteeasacentralissue.Althoughthecommitteefoundthatmanyretentioncriteriaapplytoboththeobservationalandthelaboratorysciences,significantdifferencesarenotedbelow.Themetadatarequirements,whichtendtobeeitherpoorlyunderstoodorignored,aregivenparticularemphasis.Additionaldetailsanddistinctionsarediscussedintheworkingpapersofthedisciplinepanels(NRC,1995).
![Page 110: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/110.jpg)
Page34
CriteriaCommontoBothObservationalandLaboratorySciencesUniquenessofdata.DootherauthenticatedcopiesofthedataunderconsiderationalreadyexistinanaccessiblerepositorythatmeetsNARAstandardsofpermanenceandsecurity?Ifso,aretheyadequatelybackedup?Iftheanswersareyes,thedatasetneednotnecessarilyberetained.Accessibilityadequacyofdocumentation.Thoughwemightwishthatalldatasetswereofhighqualityandaccompaniedbydetailedmetadata,thatisnotalwaysthecase.Ataminimum,themetadatashouldbesufficientforascientistworkinginthedisciplinetomakeuseofthedataset.Ifdocumentationislackingorissopoorthatadatasetisnotlikelytobeofvaluetosomeoneinterestedindataofthattype,orthedataaremorelikelytomisleadthantoinform,thatdatasetshouldhavealowpriorityforarchiving,orperhapsshouldnotbearchivedevenifresourcesareavailable.Nevertheless,thecommitteedoesnotbelievethatmanydatasetsshouldbepurgedbecausetheylacksufficientdocumentation.Thevastmajorityofdatasetsnowmeetminimumstandardsofdocumentation,whichmeansthataskilledusereitherisgivensufficientinformationorcanfigureitout.Adequacyofdocumentationisthusbutonecriteriontoconsiderintheappraisalofdataforlong-termretention.Metadatarequirementsarediscussedingreaterdetailbelow.Accessibilityavailabilityofhardware.Isthehardwareneededtoaccessthedataobsolescent,inoperable,orotherwiseunavailable?Ifso,thedataarenotusable.Decisionsonwhethertokeepsuchdatashouldbebasedonthefeasibilityofbuildingoracquiringthenecessaryhardware,theusabilityofthedataiftheywereaccessible,andthenatureofthedataset,ifknown.Toavoidthissituation,migrationofdatatocurrentstoragemediashouldbepartofthenormalroutinetomaintainthearchive.Costofreplacement.Couldthedatabereacquiredifafuturenationalneedforthedataweretoarise?Ifso,wouldreacquisitionofthedatabemorecostlythantheirpreservation?Fortheobservationalsciences,the
![Page 111: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/111.jpg)
answerisalmostalwaysthatthedatacannotbereacquired.Theexceptioniswithadatasetinadisciplineinwhichthechangesofnaturearesoslowthatthedatacouldberecapturedatanothertime.Forexample,dataonthefossilrecordofevolutioncontainedinstratigraphicrockunitscouldbereacquired.Thelaboratorysciencesgeneratedatathatcan,inprinciple,bereacquired.Thequestioniswhetherthedatacanbereproducedatanacceptablecost.Datasetsinthelaboratorysciencesthatarecandidatesforlong-termpreservationcanbeclassifiedintothreegenerictypes:(1)massiverecordsanddatafromanoriginalexperiment,particularlyacostly"mega-experiment,"thatthereisnorealisticchanceofreplicating(e.g.,dataobtainedfromexpensivefacilitiessuchasplasmafusiondevices,ordataofinterestinphysicsandchemistryderivedfromspecialeventssuchasnucleartests);(2)unique,perhapssample-dependentorenvironment-dependent,engineeringdata,manyofwhichneverreachthepublishedliterature;and(3)criticallyevaluatedcompilationsofdatafromalargenumberoforiginalsources,togetherwiththebackupdataanddocumentationonselectionofrecommendedvalues,thatrepresenttremendousaccumulatedeffort.Peerreview.Hasthedatasetundergoneaformalpeerreviewtocertifyitsintegrityandcompleteness,oristheredocumentedevidenceofuseofthedatasetinpublicationsinpeer-reviewedjournals?Haveexpertusersprovidedevidencethatthisdatasetisasdescribedinthedocumentation?Formalreviewofdatasetsisnotnowcommon.Itshouldbeencouraged,however,especiallyintheobservationalsciences.AgoodmodelisthepeerreviewsystemforNASA'sPlanetaryDataSystem.Inthelaboratorysciences,thecriticallyevaluatedcompilationsofdatareferredtoinChapter2haveundergoneextensivepeerreview.
DifferencesBetweentheObservationalandtheLaboratorySciences
Dataderivedfromlaboratoryexperiments,suchasthehardnessofsteelproducedinaparticularmelt,differfromdatabasedonobservationsoftransientnaturalphenomena,suchastherecordsofthe1993
![Page 112: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/112.jpg)
transientnaturalphenomena,suchastherecordsofthe1993
![Page 113: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/113.jpg)
Page35
midwesternfloods.Thus,theystimulatedifferentquestionsrelatedtodatapreservationissues.Ashasalreadybeennoted,onedifferencearisesfromthefactthattransientnaturalphenomenaarenotreproducible;thefactthattheresultingobservationaldataare"snapshotsintime"sometimesmeansthatthedatahavehistoricalorevidentialvalueinadditiontotheirinformationalvalue.Observationaldatasetsthatprovideacontinuoustime-seriesrecordofthephysicaluniverse,orofhumanimpactuponit,areimportanttofuturegenerationsforcomparisonandtheidentificationoftrends.Inaddition,manyobservationaldatasetsrepresentmajorengineeringorworker-intensivecollectionactivitiesthatwarrantdocumentationandcouldnotfeasiblybecarriedoutagain.
Experimentershavegoodreasontobelievethatifandwhentheirdataarerecreatedinthefuture,instrumentswillbebetter.Inmanyexperiments,rawdata(e.g.,theinitialsensorreadingsbeforeanytransformations,conversions,averaging,orcorrectionsaremade)mayexistonlyforafleetinginstantbeforetheyarediscardedorfurtherprocessed.Evenwhenraw(level-0)dataareacquiredandsaved,principalinvestigatorsfrequentlyfailtoprovideappropriatedocumentationbecausetheydonotexpectanyoneelsetousethesedata.Instead,theprocesseddatasetsaremorelikelytohaveadequatemetadataandmeetthecommittee'sothercriteriaforretention.
Quitetheoppositesituationseemstoprevailfortheobservationalsciences,wheremanysecondaryscientificusersfeeltheyneedtobeabletogetbacktothelevel-0dataandarebecomingmoreactiveindemandingthatthecollectorsofthedataprovideadequatemetadata.
SpecialIssuesintheRetentionofObservationalData
Allobservationaldatathatarenonredundant,reliable,andusablebymostprimaryusersshouldbepermanentlymaintained.Thisjudgmentisbasedonthecommittee'sbeliefthatadvancingtechnologiesand
![Page 114: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/114.jpg)
betterdatamanagementpracticesmakeitpossibletostayaheadofthegrowingdatavolumes,asdiscussedinChapter4.Italsoislikelythatitwillbemoreexpensivetoreappraisedatasetsthansimplytokeepthem.Ifthecommitteeiswrongonthesetwocounts,itmaybepossiblethatthevolumeofthedatacanbereducedthroughsamplingtechniquesandthroughintelligentselectionofthedatasetsofhighestpriority,asexplainedbelow.
Datasamplingissuesariseinmeasurementsystemsandinconsideringarchivalstrategiestoprovidereadyuseraccess.Evenbeforeadatamanagerfacesarchivingdecisions,manysamplingratedecisionsalreadyhavebeenmade.Forexample,intheatmosphericsciences,wecouldeasilysampletemperaturesensorsandwindgauges100timesperminute,butthatfrequencyisunnecessaryfornearlyalluses.Ingeneral,itisnecessarytokeeponlydataproperlysampledintimeandspace;thatis,thesamplingintervalmustbesuchthatthemost-rapidly-varyingcomponentisnotaliased.AtleasttwosamplespercyclearerequiredaccordingtotheSamplingTheorem.Thusreductionofoversampleddatatotheminimumsamplingrateneeded,coupledwithlosslessdatacompression,cansignificantlyreducedatavolumeswithnolossofscientificcontent.However,ifthephenomenaofinterestareslowlyvarying,thenmorerapidfluctuations,whichmighthavevalueforotherpurposes,canbefilteredoutandthedatareducedtoretainthedesireddataunaliased;thistechniquecanfurtherreducethedatavolumeattheexpenseoflosinghigher-frequencydata.Thearchivingofonly"representative"subsetsofourlargestdatasetsisoftensuggested,butthenotionraisesdifficultissuesinstatistics,datamanagementphilosophy,andbudgeting.Inconcept,theremaybeacceptableproceduresforthelong-termarchivingofrepresentativesubsetsoflargedatasets,butnoeffectivemethodologyexiststodaytochoosethosethatwouldsatisfytheneedsoffutureusers.
Anexampleoftheapproachtodecidingwhichobservationaldatasets
![Page 115: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/115.jpg)
toretaincomesfromtheatmosphericsciences.Inthisfieldthevalueofadatasetaspartofalongtimeseriesisanimportantcriterionforarchivingdecisions.Thetemperaturerecordforagivenyearfromastationoperatingoveracenturyismuchmorevaluablethanasimilarrecordfromanearbystationwithashorterlifetime.Studiesofclimatechangeandothertypesofenvironmentalchangefindlongtimeseriestobeessential.For
![Page 116: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/116.jpg)
Page36
example,confirmationoftheseasonalstratosphericozonedepletionovertheAntarcticinthe1980srequiredreferencebacktotheDobsoncolumnozonedatafromthefirsthalfofthiscenturyforcomparativepurposes.TheU.S.HistoricalClimateNetworkdataareahighpriorityforarchivingbecausetheyrepresentalongtimeseriesofhigh-qualitydata,withexcellentmetadata;thiscombinationofattributesofdataofacommontypemakestheoveralldatasetexceptionallyvaluable.
MetadataIssues
Thecommitteehasarrivedatseveralrelatedconclusionsconcerningtheimportanceofdocumentation,ormetadata,totheeffectivearchivingofscientificdata.Theseincludethefollowing:Effectivearchivingneedstobeginwheneveradecisiontocollectdataismade.Originatorsofdatashouldpreparetheminitiallysotheycanbearchivedorpassedonwithoutsignificantadditionalprocessing.Thegreatestbarriertocontemporaryandfutureuseofscientificdatabyotherresearchers,policymakers,educators,andthegeneralpublicislackofadequatedocumentation.Adatasetwithoutmetadata,orwithmetadatathatdonotsupporteffectiveaccessandassessmentofdatalineageandquality,haslittlelong-termuse.Fordatasetsofmodestvolume,themajorproblemiscompletenessofthemetadata,ratherthanarchivingcost,longevityofmedia,ormaintenanceofdataholdings.Lackofeffectivepolicies,procedures,andtechnicalinfrastructureratherthantechnologyistheprimaryconstraintinestablishinganeffectivemetadatamechanism.
Thissuiteofconclusionsledthecommitteetorecommendthat''adequacyofdocumentation"beacriticalevaluationcriterionfordatasetretention.Thefollowingdiscussionilluminatesthemultiple
![Page 117: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/117.jpg)
setretention.Thefollowingdiscussionilluminatesthemultipleperspectivesofmetadata,theessenceoftheproblem,andimportantelementsofanymetadatasolution.
PerspectivesonMetadata
Thetermmetadataoftenisusedtodenote"dataaboutdata,"thatis,theauxiliaryinformationneededtousetheactualdatainadatabaseproperlyandtoavoidpossiblemisinterpretationofthosedata.Thetermisusedinmanyscientificdisciplines,butnotalwayswithpreciselythesamemeaning.Somecommentsondifferenttypesofmetadatamaybehelpful.
Themostbasicclassofmetadatacomprisestheinformationthatisessentialtoanyuseofthedata.Anobviousexampleistheunitsinwhichphysicalquantitiesareexpressed.Ifunitsarenotspecified,thenumbersareambiguous;atbest,theusermustattempttodeducetheunitsbycomparisonwithotherdatasources.Indealingwithobservationaldata,thecoordinatesandthecoordinatesystem(spatialandtemporal)obviouslymustbespecified.Laboratorydataareoftensensitivefunctionsofsomeenvironmentalconditionsuchastemperatureorpressure.Forexample,theboilingpointofaliquidvarieswithpressure,sothataboilingpointvaluehasnomeaningunlessthepressureisspecified.Althoughthisiswellknown,manymistakesoccurwhenauserassumesavaluetakenfromacompilationtobeaboilingpointatnormalatmosphericpressure,whileitactuallyreferstoareducedpressure.
Asignificantprobleminplanningalong-termdataarchiveissimplecarelessnessonthepartofthecreatorsandcustodiansofthedata.Currentpractitionersinascientificfieldmayimplicitlyunderstandwhattheunitsorenvironmentalconditionsare.Shortcutsaretakenbytheauthorsthatcausenoproblemincommunicatingwiththeircontemporarycolleagues(althoughtheymaybeconfusingtothoseinadifferentdiscipline),butpracticesandlanguagecanchangeoveragenerationortwo.Foralong-termarchive,eventhemostobviousmetadatashouldbespecifiedindetail.
![Page 118: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/118.jpg)
metadatashouldbespecifiedindetail.
![Page 119: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/119.jpg)
Page37
Beyondthisbasictypeofmetadata,thereisauxiliaryinformationthatisnotneededbythemajorityofusers(presentorfuture),butisofinteresttoafewspecialists.Includedherearetheparametersthathaveonlyaslightinfluenceonthedatainquestion,sothatmostusersdonotneedtoknowaboutthem.Forexample,thetypicaluserofadatabaseofatomicspectraisconcernedonlywiththewavelengthandaroughvalueoftheintensityofeachspectralline.However,afewuserswhoaretryingtoextractfurtherinformationfromthedatamaywanttoknowtheconditionsunderwhichthespectrumwasrecorded,suchasthecurrentdensity,typeofelectrode,andgaspressure.ReferringtotheJANAFThermochemicalTables,whicharediscussedinthePhysics,Chemistry,andMaterialsSciencesDataPanelreport(NRC,1995),mostusersareperfectlycontentwiththevaluesgiven(alongwiththeconfidencethatthecompilersdidagoodjobofselectingthemostreliablevalues).Aminorityofusers,however,willwantmoredetailsonhowthedatawereanalyzed,suchaswhethertheheatcapacityvalueswerefittedtoafifth-degreepolynomialoracubicspline,andsoforth.
Perhapsthemostpervasiveformofmetadataistheaccuracyofthevalues.Toapurist,nonumberhasmeaningunlessitisaccompaniedbyanestimateofuncertainty.Specifyingtheuncertaintyofeachdatapointincreasesthesizeandcomplexityofthedatabase,butsometimesmaybenecessary.Ataminimum,themetadatashouldincludegeneralcommentsonthemaximumexpectederrors,evenifaquantitativemeasuresuchasstandarddeviationcannotbegiven.Finally,thetermmetadataissometimesunderstoodtoencompassthefulldocumentationnecessarytotracethepedigreeonthedatabase.Forlaboratorydata,thisincludescitationstoalltheprimaryresearchpapersrelevanttothedatabase.Acriticalevaluationofespeciallyimportantquantities(suchasthefundamentalphysicalconstantsorkeythermodynamicvalues)mayendupwithonlyafewhundreddata
![Page 120: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/120.jpg)
points,butincludemassivedocumentationandcitationstoahundredyearsofliterature.Insuchcasesthemetadataoccupyfarmorespacethanthedatathemselves.
Fromthisdiscussion,itisevidentthatmetadatacanspantherangefromafewsimplestatementsaboutthedatatoveryextensive(andexpensive)documentation.Itisdifficulttogivegeneralguidelinesontheamountofmetadataneeded;eachcasemustbeconsideredinthecontextofhowfutureusersmayusethedataandwhatauxiliaryinformationtheywillneed.Someguidancemaybeobtainedfromformaleffortstosetmetadatastandardsforexperimenterstofollowinpreservingtheirdata.Inchemistry,forexample,manyorganizationshavedevelopeddetailedrecommendationsonreportingdatafromspecificsubfields.Thesehavebeencollectedinarecentbook,ReportingExperimentalData(ACS,1993).TheAmericanSocietyforTestingandMaterialsCommitteeE49onComputerizationofMaterialPropertyDatahasanambitiousprogramtodevelopconsensusstandardsformetadatarequirementsfordatabasesofpropertiesofengineeringmaterials.Thesedocumentsemphasizethatmetadatarequirementsmustbeapproachedonacase-by-casebasisandmustinvolveexpertsineachfield.
Theconclusionisthatmetadata,whatevertheparticularform,arecrucialtotheuseofalmosteverydatasetandmustbeincludedinanyarchivingplan.Thenecessarymetadatausuallyaddverylittletothestoragerequirements,butmayrequireconsiderableintellectualefforttoprepare,especiallyiftheyareassembledretrospectivelyratherthanwhenthedataarefirstcollected.
Theprecedingdiscussiondefinesmetadatafromtheperspectiveoftheresearchscientist.Anadditional,andsomewhatoverlapping,perspectiveisprovidedbythecomputersciencecommunity.Inthiscommunity,thetermmetadatareferstothespecificationofelectronicrepresentationofindividualdataitems,thelogicalstructureofgroups
![Page 121: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/121.jpg)
ofdataitems,andthephysicalaccessandstoragemediaandformatsthatholdthedata.Tothecomputerscientistordatabaseadministrator,thecontextualdatathattheresearchscientistreferstoasmetadataencompassotherdataentities.Infact,divergencecanexistevenamongresearchscientistsastothedifferencesbetweendataandmetadata.Whatismetadataforonemaybedatafortheother.
Inviewofthisconfusion,thecommitteehaschosentokeepthetermmetadataandtoexplicitlydefineitsfundamentalcomponents.Assuch,thecommitteeviewsmetadataasrepresentinginformationthat
![Page 122: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/122.jpg)
Page38
supportstheeffectiveuseofdatafromcreationthroughlong-termuse.Itspansfourancillaryrealms:content,formatorrepresentation,structure,andcontext.Thecontentrealmidentifies,defines,anddescribesprimarydataitemsincludingunits,acceptablevalues,andsoforth.Therepresentationrealmspecifiesthephysicalrepresentationofeachvaluedomain,oftentechnologydependent,andthephysicalstoragestructureofaggregateddataitems,oftenarbitrary.Thestructurerealmdefinesthelogicalaggregationofitemsintoameaningfulconcept.Thecontextrealmtypicallysuppliesthelineageandqualityassessmentoftheprimarydata.Itincludesallancillaryinformationassociatedwiththecollection,processing,anduseoftheprimarydata.Onthebasisofthisexplicitdefinition,thefollowingsectiondescribesmetadataobjectives,implementationissues,andpotentialfordefiningastandardizedframework.
AnalysisofMetadata:FromChallengetoSolution
Theproblemofdatasetdocumentationisreceivingincreasedattentioninthecontextofscientificdatamanagement.Intheearthsciences,globalclimatechangeresearchandgeneralenvironmentalconcernshaveignitedinterestinamoreinterdisciplinaryandlong-termapproachtoconductingscience.Interdisciplinarycollaborationrequiresmoreeffectivesharingofdataandinformationamongindividualresearchers,disciplines,programs,andinstitutions,allofwhichmayoperateunderdifferentparadigmsorhavedifferentterminologyforsimilarconcepts(NRC,inpress).Further,long-termresearchrequiresthatresearchersbeabletoaccessandcomparedatasetsthatwerecreatedbypastresearchersandcollectedindifferentcontextsbydifferenttechnologies.Therefore,tosupporttheinterdisciplinarysharingandlong-termusefulnessofdata,adequatemetadatamustbeincludedwithinaframeworkthataccomplishesthefollowingobjectives:providesmeaningfulselectioncriteriaforaccessingpertinentdata;supportsthetranslationoflogicalconceptsandterminologyamong
![Page 123: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/123.jpg)
communities;supportstheexchangeofdatastoredindifferingphysicalformats;andenhancestheassessmentofdatasetsbyconsumers.
Acriticalquestionishowtomotivatetheusercommunitytoparticipateintheprocessofmetadatapreparationandstandardization.Theissueofmotivationisbestaddressedbythevaluesystemofthecommunityitself.Itmaybearguedthattheproblemwillnotbesolveduntiltheproductionofverifieddatasetsandtheirprovisiontoscientificcolleaguesbecomemorehighlyvaluedactivities.Developmentssuchasthepeer-reviewedpublicationofdatasetsshouldcontributetothisshiftinvalues.However,untiltheseactivitiesareassimilatedintothefabricofcareeradvancement,suchasbeingincorporatedintocriteriafortenureinacademicinstitutions,progresswillcontinuetobeslowanduneven.
Nevertheless,thereareanumberofspecificactionsthatcanbetakentopromotethepreparationandstandardizationofmetadata.Fundingagenciescouldhelpfacilitatechangebyrequiringandenforcingminimaldocumentationofdatasetscreatedundertheirgrants(aswellasotherdesirabledatamanagementandarchivingpracticesdiscussedelsewhereinthisreport).Thiswillnotbeaneffectivemechanism,however,unlesstheminimalstandardsforconsistencyandcompletenessareprovidedasatargetforgranteesandasameasuringstickforthefundingagent.Tobeeffective,thesestandardsmustbecreatedthroughthecollaborationofresearchers,datamanagers,librarians,archivists,andpolicymakers.
Individualsandinstitutionsinthescientificcommunitycouldcontributebyrecognizingthatdatamanagementandtheprovisionofappropriatedocumentationofdataareanessentialscienceinfrastructurefunctionspanningalldisciplines.Greatercost-effectiveness,consistency,andqualitycanbeachievedifthemanydiversedatamanagementactivitiesarebettercoordinated.Theessentialrequirementformakingthesevaluesystemchangesanddevelopingeffectivesolutionsistherecognitionthat
![Page 124: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/124.jpg)
systemchangesanddevelopingeffectivesolutionsistherecognitionthatall
![Page 125: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/125.jpg)
Page39
segmentsofthescientificcommunityneedtobeeducatedonthisissue.Fundingagenciesandthescientificcommunitythusmustmoveforwardtogetherinthedevelopmentofacoherentstrategyforend-to-endmanagementthatfocusesonmetadatarequirementsasamajorelement.
Theultimatesolutionformetadatahandlingwillincludeanapproachthatnotonlysupportsthedocumentationofadatasetthroughoutitslifecycle,butalsosupportsevolutionarydocumentationrequirements.Forexample,earlyinthedevelopmentanduseofaninstrumentsystem,thescientificcommunitymaynotbeabletospecifycompletelywhatmetadatawillbeimportantfortheeffectiveuseoftheobservationsproducedbythissystem.Inthiscase,someofthedocumentationmayincludefree-formnarrativeswithoutthebenefitofcontrolledvocabularies.Documentationofthisnatureisusefulonlytoalimitedaudiencethatunderstandsthespecializedvocabularyofthesourceinstrument,project,discipline,orinstitution.Inaddition,itisstilldifficulttomakethesedescriptionsusefultoanautomatedagentperformingasearchonbehalfofauser.Asinstrumentusebecomesmoreroutine,thisdocumentationcouldevolvetoamorestructured,butnotcumbersome,form.Onepotentiallyusefulapproachconstrainsthetextualdescriptionstoawell-defined,controlledvocabulary.Ifthevocabularyisclearlyspecifiedandmadeeasilyavailablewiththedataandassociateddocumentation,usersbeyondthosecloselyassociatedwiththecreationofthedatasetmaybeabletousethisinformationtoassessitsrelevance,significance,andreliability.Eventually,thismorestructuredalternativewillevolveintothespecificationofstructuredrecordswithappropriatelydefinedfields,standardvaluedomains,andrelationshipswithdatasetrecords.Thecommitteealsoexpectsthatimprovementsinsoftwarefornaturallanguageunderstandingwillenabletheautomatictranslationoffree-formnarrativesintoeasilysearchedmetadatafields.
![Page 126: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/126.jpg)
Anequallyimportantcomponentofthemetadatasolutionistheidentificationanddetaileddefinitionofclassesofinformationthatarecriticaltothecompleteandconsistentdocumentationofdatasets.Informationmodelingtechniquescanbeusedtodeveloptheseclassesofinformation,someofwhichwillhaveclear,concisedefinitionsandasetofdefinedattributes,whileotherswillbeidentifiedbutwillnothaveclearlydefinedattributesorboundarieswithotherclasses.Theresultinginformationmodelshouldpresentatechnology-independentdescriptionofmetadataentitiesandtheirrelationshipswiththeprimarydata.Themodelshouldidentifymetadatathatmaybegeneralizedacrossallclassificationsofdatasetsandusagepatterns,aswellasaccommodatespecializedneeds.Suchamodelshouldprovidethebasisforintelligentinformationpolicies,datamanagementpractices,andmetadatastandards.Theinformationpolicies,however,mustnotsaddledataproviderswithlong,cumbersome"forms"tofillout.Thatwoulddiscouragethecontributionofthedatathemselves,andthecommitteerecognizesthatdatawithincompletedocumentationarebetterthannodataatall.Nevertheless,appropriatelyestablishedmetadatastandardsdonotnecessarilyneedtobedifficultorcostlytoapply,andthereforeneednotbeoneroustothedataprovider.AnexampleofageneralizedmetadataframeworkintheobservationalsciencesispresentedintheworkingpaperoftheOceanSciencesDataPanel(NRC,1995).
OtherElementsOfTheAppraisalProcess
Adatamanagementplanshouldbecreatedforanynewresearchprojectormissionplan,consistentwiththerequirementsofOMB(1994)CircularA-130.AgoodexampleofthisistheProjectDataManagementPlanoftheNASANationalSpaceScienceDataCenter(NASA,1992).Ataminimum,thoseindividualswhohaveresponsibilityforimplementingthedatamanagementplanandensuringaccessibilityandmaintenanceofthedatashouldplayakeyroleinthesubsequentappraisalprocess.
![Page 127: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/127.jpg)
Mostindividualinvestigatorsandpeerreviewersdonotrecognizetheirrolesasappraisersforarchivalpurposes,buttheviewsoftheseexpertsshouldweighheavilyinthedecisionsrelatingtolong-termvalueorpermanencyofthedataobtained.Theprincipalinvestigatorsandprojectmanagerswhocollectandanalyzethedataclearlyhavethebestsenseofhowlongthedatawillbevaluablefortheirownscientificpurposes.Primaryusersalsocanprovideadetailedunderstandingregardingtheusesofthe
![Page 128: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/128.jpg)
Page40
datafortheirowndiscipline,buttheymaynotcomprehendthelong-termvalueofthedataforapplicationtootherresearchornationalproblems.Becausesuchprimaryusersandotherdatacollectorssometimesdonotthinkbeyondtheirownneeds,theagenciesshouldworkwithNARAtoprovidegooddocumentationattheinceptionofscientificprojects,especiallydocumentationthatwouldbeusefultosecondaryandtertiaryusers.Althoughprovidingmoreextensivedocumentationoftenmaybeviewedasanextraburdenbytheprincipalinvestigatorsanddatamanagers,thelaborandexpensecanbeminimizedifitisplannedattheinceptionofaproject,whereasitisextremelydifficultaftertheprojectiscompleted.Properdatamanagementpracticescanbepromotedbyconsideringdatamanagementintheevaluationofaninvestigator'spastperformance.
Becausemanyscientificendeavorsrequireparticipationbyanumberofagenciesandorganizations,itisimportanttocoordinatedatamanagementactivitiesandassignresponsibilitiesforthemaintenanceofthedataduringperiodsofprimaryuse.NARAiscurrentlyresponsibleforthefinalappraisaloffederalrecordsandthedeterminationoftheirvalueasaccessionstothepermanentnationalcollectionunderitsstatutorymandate.However,NARAshouldtakeadvantageoftheexpertiseoftheotherparticipantsinvolvedthroughoutthelifecycleofthedata.
Thecommitteebelievesthatallstakeholdersscientists,researchmanagers,informationmanagementprofessionals,archivists,andmajorusergroupsshouldberepresentedinthebroad,overarchingdecisionsregardingeachclassofdata.Theappraisalofindividualdatasets,however,shouldbeseenasanongoing,informalprocessassociatedwiththeactiveresearchuseofthedata,andthereforeshouldbeperformedbythosemostknowledgeableabouttheparticulardataprimarilytheprincipalinvestigatorsandprojectmanagers.Insomecases,theymayneedtoinvolveanarchivistor
![Page 129: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/129.jpg)
informationresourcesmanagertohelpwithissuesoflong-termretention.Althoughthecommitteebelievesthatformalappraisalsshouldbekepttoaminimum,appraisalsshouldbeperformedaccordingtothedatamanagementplanestablishedforeachproject.
Althoughthecommitteewasnotexpresslychargedwithadvisingonclassifieddata,thereisanobviousneedtosaveclassifiedscientificdataaswell.Thecompleterecordsoftheatmosphericatomicbombtestsareaclearexample.Itismoredifficulttoprovideandassessmetadataforaclassifieddataset,anditcostsmoretomaintainclassifieddata.Also,thereisatrade-offbetweenthevalueofthedatafornationalsecurity,therisktonationalsecurityifthedataaredeclassified,andthepotentialvaluetosocietyofhavingthedatadeclassified.Thus,itishighlybeneficialandcost-effectivetohavemechanismsinplacethatconsidertheseissuesperiodicallyforanygivenclassifieddatasetandthatpromotedeclassificationwhenappropriate.
Recommendations
Thecommitteemakesthefollowingrecommendationsregardingtheretentioncriteriaandappraisalprocessforphysicalsciencedata:
Asageneralrule,allobservationaldatathatarenonredundant,useful,anddocumentedwellenoughformostprimaryusesshouldbepermanentlymaintained.Laboratorydatasetsarecandidatesforlong-termpreservationifthereisnorealisticchanceofrepeatingtheexperiment,orifthecostandintellectualeffortrequiredtocollectandvalidatethedataweresogreatthatthelong-termretentionisclearlyjustified.Forbothobservationalandexperimentaldata,thefollowingretentioncriteriashouldbeusedtodeterminewhetheradatasetshouldbesaved:uniqueness,adequacyofdocumentation(metadata),availabilityofhardwaretoreadthedatarecords,costofreplacement,andevaluationbypeerreview.Completemetadatashoulddefinethecontent,formatorrepresentation,structure,andcontextofadataset.
![Page 130: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/130.jpg)
Theappraisalprocessmustapplytheestablishedcriteriawhileallowingfortheevolutionofcriteriaandpriorities,andbeabletorespondtospecialevents,suchaswhenthesurvivalofdata
![Page 131: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/131.jpg)
Page41
setsisthreatened.Allstakeholdersscientists,researchmanagers,informationmanagementprofessionals,archivists,andmajorusergroupsshouldberepresentedinthebroad,overarchingdecisionsregardingeachclassofdata.Theappraisalofindividualdatasets,however,shouldbeperformedbythosemostknowledgeableabouttheparticulardataprimarilytheprincipalinvestigatorsandprojectmanagers.Insomecases,theymayneedtoinvolveanarchivistorinformationresourcesprofessionaltoassistwithissuesoflong-termretention.
Classifieddatamustbeevaluatedaccordingtothesameretentioncriteriaasunclassifieddatainanticipationoftheirlong-termvaluewheneventuallydeclassified.Evaluationoftheutilityofclassifieddataforunclassifiedusesneedstobedonebystakeholderswiththerequisiteclearancestoaccesssuchdata.
![Page 132: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/132.jpg)
Page42
4TheOpportunities:TheRelationshipofTechnologicalAdvancestoNewDataUseandRetentionStrategiesRapidprogressininformationtechnologycontinuallyaltersboththequantityandthequalityofscientificinformationandperiodicallystimulatesfundamentalmodificationofdatamanagementandarchivingstrategies.Recenttechnologicaladvanceshaveenablednewmethodsandstrategiesfordatastorageandretrievalandhavecreatedbetterwaysofconnectinguserstodataresourcesandtoeachother.Moreover,theevolvingtechnologiesarecatalystsforrevisingorganizationalstructurestomanagescientificdataarchivesmuchmoreeffectivelyinadistributedmanner.Assumptionsabouteffectivemanagementofscientificdatathathavebeenlongandfirmlyheldarebeingdirectlychallengedbynewinformationtechnology.Theseassumptionshavebeenbasedonexperiencewithmanagementofpaperrecords,generallyindomainsoutsideofscience.Someoftheoutdatedassumptionsthatarerapidlylosingtheirrelevanceincludethefollowing:Physicalpossessionofthedataisessentialtotheirmanagementandarchiving.Thisprinciplehasoutliveditsusefulnessinthecontextofelectronicphysicalsciencedataandhasmadeaccessdifficultforlegitimateusers.Electronicinformationiseasilycopiedanddisseminated.Thisfeatureremovesconstraintsimposedbythelimitedphysicalaccess.Becausemostgovernmentphysicalsciencedataareconsideredtobeinthepublicdomain,theconstraintsofcopyrightandfeecollectiontothefreemovementofdataareremovedaswell.Costofanarchiveincreasesinproportiontocollectionsizeanduse.Physicalarchivecostisafunctionofspace,aswellascataloging,repair,andaccessefforts.Improvedinventorytechnologyhaseasedsomeofthecostburdenoverthelastseveralyears,but,fundamentally,archiveswithlargephysicalholdingsoperateintraditionalwayswithlinearly
![Page 133: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/133.jpg)
withlargephysicalholdingsoperateintraditionalwayswithlinearlyscalingcosts.Suchcostsactuallydiscourageuse,sincephysicalhandlingofitemsscaleswithuse,whereasbudgetsreflectusageindirectly.Incontrast,electronicinformationstorageandmanagementcostshavedeclinedasrapidlyasthecostsofcomputertechnologyandprocessingoverthelast30years.Thereisnoforeseeableendtothisprocess.Storingandusingthenextbytewillbecheaperthanstoringandusingthemostrecentbyteforalongtimetocome.Onlyarchivistsandlibrarianshavethecapabilitiestomanagearchiveddata.Whilelibrariansandarchivistsareimportantadvisorsandparticipantsinscientificdatamanagement,thedominantmanagementresponsibilityfallstothescientificcommunityanditsdesignatedscientificdatamanagers(whoareablendofscientist,computerscientist,andlibrarian/archivist).Ifpracticingscientistsdonotparticipateinthemanagementofscientificinformation,suchdatawillfallintoobscurityorobsolescence.
![Page 134: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/134.jpg)
Page43Thelocatorinformation(catalog)aboutthemanagedobjectsissimpleandcompact.Findingrelevantscientificinformationoftenrequiressearchingthefullcontentandthiscontentgenerallyisnotintheconvenientlycompressedformoftext.Forexample,tosearchforalldatasetswherethestratosphericozoneconcentrationislessthansomeadhocthresholdinsomeregion,onewouldneedtoexecuteacomplexalgorithmoneverydatasamplecoveringtheregioninquestion.Queriessuchasthisbecomeevenmorecomplexiftheregionofinterestisdeterminedafterretrieval(e.g.,howmanydaysinarowwasthearealextentoftheozoneholeoveropenoceangreaterthan5,000squarekilometers?).Theselectionanduseofscientificdatatosolvecomplexproblemscanbesimplifiedthroughtheuseoftheconceptofbrowsinginformationbasedoncontent.Browsingofteninvolvesexaminationoflargenumbersofsamplesanddatavolumes.Specialized"browsingproducts"canbedefinedtolocaterecordsofinterest.Forthequeryexamplesabove,low-resolutionozonemapscouldbeusedtofindcandidatedatasetswithhighprobabilityofrelevance.Informationabouttheprocesses(includingsensorcharacteristics,computerprogramcapabilities,andcalibrationpoints)usedtodevelopthedatasetisneededforitsproperuse.Suchinformationincreasesthesizeandcomplexityofthelocatorservice.
Theremainderofthischapterdescribeshowadvancinginformationtechnologiesenablethedatamanager,librarian,andarchivisttodealwiththechallengesofscientificdatamanagementinacollaborativefashionwiththescientificusercommunity.
EnablingTechnologiesAndRelatedDevelopments
Table4.1providesasummaryofaspectsofscientificdatamanagementchangedbynewtechnologiesandrelateddevelopments.Thesesixareasarediscussedinmoredetailbelow.
High-PerformanceComputerNetworks
![Page 135: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/135.jpg)
High-PerformanceComputerNetworks
Therapidexpansionofcomputernetworksandtheiruseforelectronicmailanddatabaseaccesshaveobviatedtheneedforresearchersandotherusersofscientificandtechnicaldatatobeinphysicalproximitytocolleagues,informationresources,andevenadvancedtechnicalfacilities.Thishaspresentedamenuofchoicesaboutthebestmeanstodistributedataandtheresponsibilityofmanagingthem.
Aworldwide,"virtual"libraryisbeingcreatedontheInternet.ApplicationprogramssuchasMosaicaredemonstratingthepoweroffreeandsimplenavigationacrossanoceanofavailableresources.Improvingnetworkcapacity,reliability,performance,andsecuritymeasuresarehelpingtomaketheseresourcesmorewidelyaccessibleanduseful.
High-performancenetworksalsosupportmovementofinformationfornewapplications(e.g.,forproducingsafelymanagedbackupcopies,"profiling"informationforindividualuser'sneeds,orstagingdatathroughanumberofrefinementstepsindifferentlocationsforfocusedresearch).Networkssupportcollaborativeworkandresearchprojectsthatspantraditionalresearchboundaries.Suchworkrequireseasyaccesstoavarietyofdatasourcesatonce.
High-performancenetworksenablescientificdataresourcestobewidelydistributedandmanagedbygroupsofscientists.Usersthusarefreedtoconcentrateonthemosteffectiveuseofthedata,ratherthanontheirowndatamanagementissues.Networkscanprovideavehicleforregularlydistributingbackupcopiesofdataandmetadatatoensuresafestorage.Distributionofdatatouserscanbedoneviathenetworkinadditionto,orinsteadof,viaphysicalmediasuchastapesandCD-ROMs.Datacanbelinkedtogethertohelpusersnavigateamongrelateditems.ThiskindoflinkingisattheheartoftheWorldWideWebconceptandbroughttousersbyMosaic.Thepopulationofinformationproviders(e.g.,peoplewhocancontributetotheknowledgebase)hasnowgrowntoincludeallnetworkedmembersofauser
![Page 136: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/136.jpg)
nowgrowntoincludeallnetworkedmembersofauser
![Page 137: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/137.jpg)
Page44
TABLE4.1NewTechnologiesandRelatedDevelopmentsThatEnableaNewStrategyfortheManagementofScientificandTechnicalData
NewTechnologyTrendsandRelatedDevelopments
KeyFeatures WhatIsEnabled?
High-performancecomputernetworks
Distributedfunctions;rapiddeliveryoflargedatavolumes
Locationofdatabasesandarchiveswherebestmanaged;collaborativework;distributedorganizations;distributedresponsibility
Lowanddecliningcostofstorage
Inexpensivebackup;continuallydecliningcost;easeofmigration
Deferralofarchivingdecisions;trustindistributedmanagementduetosafestoragebackup
Advanceddatamanagement
Abilitytorigorouslyandformallymanagediversedatatypes
Morecomplexdatastructures(otherthan"flatfiles")handledinarchives,withgreatpotentialadvantages
Changingrequirementsforinformationtechnologyprofessionals
Abilityofpersonnelwithlowertechnicalskillstosucceedindatamanagementroles
Abilitytoentrustscientificdatamanagementinadistributedenvironment
Highreliabilityoftechnologycomponents
Availabilityofbettercomponentsandconnections;reducedprocurementandoperationscosts
Reducedcostandeffortindatamigration;trustedconnectionsforcommunicationandcollaboration
Developmentandacceptanceofstandards
Agreementonterms,interfaces,media,procedures
Reducedefforttocommunicateandapplyresultsofothers;abilitytoconcentrateonmissionissuesandnotontechnologysupport
![Page 138: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/138.jpg)
population.Suchcontributionscanbeassimpleasanannotationonanexistingitem,orascomplexasafullyprocessedandpeer-reviewednewitem.Mostprofoundly,theevolvingnetworkinfrastructureenablesnewconceptsfordistributionoffunctionsandresponsibilityinorganizations(NRC,1994).
Althoughnetworkscanprovideaquickandeasymeanstodistributedata,itmustbenotedthatCD-ROMshavebeenusedtodistributedataforseveralyearsandhavebeenverysuccessful.CD-ROMsnotonlypermituserstohaveahugelocallibraryofdata,buttheyoftencomewithabettersetofdataaccesstoolsthanarenormallyavailable.Somedatasetsarelargeenoughthatthemostcost-effectivemethodtodeliverthemisonmediasuchasExabytetapes(8mm).
LowandDecliningCostofStorage
Asformostaspectsofcomputerhardware,thecostofstoragehasdeclinedcontinuouslyandrapidlyforthe30yearsofthemoderncomputerage.Newstoragetechnologyisalsoincreasinglycompactandsupportsevergreateraccessspeeds(Gelsingeretal.,1989).Thehistoricaltrendsareexpectedtocontinueforupto20years.Already,laboratoryengineeringresultsconfirmthisprojectionforatleastthenextdecade.Themostsignificantimplicationisthatthedecisionsaboutsamplingordiscardingscientificdatacangenerallybedeferred,particularlyfordatasetsforwhichthenecessarymetadataexistandwhosequalityhasbeencertified.Forrelativelysmallerdatasets,thedeliberationregardinglong-termretentionmaywellcostmorethantherecurringactsofmigration.Thecostofstorageissmallinrelationtooverallmissionorinvestigationcostsandthereforeshouldnotbeadecisiondriver.Experience
![Page 139: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/139.jpg)
Page45
suggests,however,thatthefundstomeetthesecostsneedtoreceivespecialprotectionintheannualagencybudgetcycles.Thesupportforthedatamanagementaspectsofscientificmissionshastypicallyhadalowerprioritythanthedatacollectionaspects.Thelowcostofstoragealsoimpliesthattheincrementalcostofsupportingaremotesafecopyofdataandmetadataalsowillbesmall,exceptfortheverylargedatasets.Therefore,overthenextfewdecades,datareceivedandstoredmaybeexpectedtobecheaplyandquicklymigratedtonewtechnologieswhenstoragemediareachtheirnominallimitsofreliabilityorforconvenienceofimprovedaccess.
Itisimportantnottoexpectaperpetualadvantagefromthistechnologicaldiscontinuity.Thefactthatdatarequiresignificanttimeperiodsfortheirmigrationmustbeconsidered.Thecostdecaytrendwillslowdownatsomepointinthefuture,causingtheoverallcostofstoragetoreturntosomethingclosertothelinearrelationshiptovolume.Wealsomustberealisticandexpectthatfundswillnotalwaysbeavailabletosaveandbackupeverydataset.Decisionsonretentionorsamplingwillhavetobemade.
Nevertheless,thealreadylowandcontinuallydecliningcostofstorageallowsaprioridecisionstobemadeincertaincircumstancestokeepscientificdatasetsindefinitely.Backuporsafestoragecopiesofdataarebecomingmoreaffordableasdatamigrationbecomeslessexpensivewithsmaller,faster,andcheaperstoragedevices.Reliabilityalsoisimprovingwithnewsoftware-basedarchivesystems(includingmigrationandbackupfeatures).However,thereisanenhancedneedforongoingtechnologymonitoringbyanappropriatebodyformedia,standards,andmigrationautomation.Suchmonitoringshouldbeincorporatedinanyscientificdatamanagementandarchivingstrategy.
Therapidchangeofstoragetechnologiessuggeststhateffortsto
![Page 140: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/140.jpg)
protecttoday'sscientificdatalegacymustbeaccelerated.Theobsolescenceofmediatypesandrecorders/playersisoccurringwithinshorterandshortertimeperiods.Thisimpliesthat"salvage"activitieswillbeincreasinglydifficultfordataleftoutofmigrationstonewmedia.This"joinorbeleftbehind"by-productofrapidtechnologicalchangeintensifiesshort-termbudgetpressuresonarchives.Itdemandsinresponseastrongmanagementcommitmenttoprovideresourcesandsaveimportantdatasets.
Ifdigitaldataaretosurvive,itisoffundamentalimportancetomanageandconstrainthecostsofarchivemaintenance.Theproblemisthatnewdatawillbecomingin,olddatawillneedtobemigratedtonewmedia,thebuildingwillneedtoberepaired,andthereusuallywillnotbealotofextramoneyfornewequipmentoraddedstaff.Toavoidproblems,thedatamigrationprocessinthesystemdesignmustbealmosttotallyautomated.Thisrefinementoftenhasnotbeenachieved,anditcancauseunnecessarybudgetdifficulties.Finally,itisessentialforagenciestopreserveallthehardwareandsoftwarenecessarytoaccessalltheirdatauntilthedatahavebeensuccessfullymigratedorotherwisedisposedof.
AdvancedDataManagement
Therearesignsthatdatamanagementtechnologyisbeginningtoaddressand,perhaps,tocatchupwiththecomplexitiesoftheverylargevolumesofscientificdata.Improvementshaveoccurredindatabasemanagementsystems,hierarchicalfilesystems,datarepresentationstandards,queryoptimizers,datadistributiontechniques,specializedaccessmethods,anddatasecuritytools(Silberschatzetal.,1991).Further,investmentinstandardsandcooperativeapproachesisaccelerating,fueledinpartbythedemandsofmedicine,education,entertainment,journalism,financialservices,andothercommercialapplications.Whilecompetingapproachesandinconsistentvocabularycreatenear-termconfusion,theattentionand
![Page 141: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/141.jpg)
investmentlevelsbodewellforthelonger-termcapabilitytogobeyond"flatfile"representationsofdatathatneedtobearchived.Thenewtoolsandtechniquesaremoredescriptiveofthedata,theirheritage,theprocessesthathaveworkeduponthedata,andtherelationshipsofdatatoeachother.
Newdatamanagementtechnologywillenableeasierrepresentationofmorediversetypesofscientificdata.Becauseoftherigorthatnewtechniquesrequire(e.g.,forself-documentationorforprecisedefinitionofaccessmethods),long-termarchiveswillbenefitfromdatastructuresotherthanflat
![Page 142: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/142.jpg)
Page46
files.Thenewtechnologyalsoimpliesthatthecreationofarichersetofmetadatawillbeeasiertoimplementandthatthesedatawillbeofhighscientificvalueforcontent-basedretrievals.Torealizethepotentialofthisenabledfacilitywithmetadata,thescientificcommunitywillhavetoacceptandsupporteffortstodevelopandapplynewmetadatarequirements.
TheChangingRequirementsforInformationTechnologyProfessionals
InformationtechnologyprofessionalswithhighskilllevelscannowbefoundinallpartsoftheUnitedStatesandaroundtheworld.Butastheybringtheinformationtechnologyindustrytohigherlevelsofmaturity,theeffectistoreducethecomplexityofmajortasksinmanaginginformation.Suchtaskspreviouslyrequiredtheirskilleduseofsophisticatedassemblylanguageorjobcontrollanguage(JCL)programming.JCLprogrammingreferstothestepsintheolddaysthatoneusedatthesystemconsoletogetprogramstorun,attachtherightfiles,printtotherightprinter,andsimilarfunctions.Today,muchofthisworkismasked,madeautomatic,andcontrolledthroughiconsandothermeans.Thesetaskscannowbeperformedbycompetentscientistsorprofessionalswithlowertechnicalskills,ratherthanbyhighlytrainedspecialists.Becausemorefunctionscanbecompletelyhandledbymachines,managementofthedatacanbegreatlyautomatedandoperatedbylessskilledindividuals.Thedatathemselvescanbewidelydistributedwithoutfearofloss,particularlywithabackupcopyinsafestorage.
Overthenext5to10years,thecostsforinformationtechnologyprofessionalsatindividualscientificdatacentersandarchivescanbedramaticallyreduced.Thereasonsforthereductionincostsincludemoreautomaticprocessesforstoragemanagement,rudimentarylearningcapabilityinsystems,servicesperformedbyendusersbased
![Page 143: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/143.jpg)
ontheirpreferences,improvedsystemsmanagement,highercomponentreliability,improvedapplicationofstandards,andvendorconsistencywithstandards.
Althoughthedominanttrendwillbeforasmaller,lesstechnicallyskilledstafftomanagethephysicalaspectsofthearchive,therewillbeapressingdemandforfewer,highlyskilledpeoplewhoblendtheskillsofphysicalscientist,computerscientist,andarchivist.Thesepeoplemustbeabletohandletheintellectualchallengesofbridgingthesedisciplineswhileprovidingthecoachinganddirectiontohelpdevelopdataandoperationsstandardsforscientificcommunities.
HighReliabilityofTechnologyComponents
Microprocessors,newstoragemediatechnologies,maturesoftware,errorcorrectioncapabilities,improvedpackaging,andreducedpowerconsumptionhaveallmadesignificantcontributionstothereliabilityofcomputersystemsandnetworks.Whatwasrecentlyconsideredunreliable,requiringconstantattentionandexpensiverepair,isnowregardedasreliableandnotworthyofefforttorepair.Althoughprecautionshavealwaysbeentakentoprotectagainstlossofvaluabledata,manyoftheseprecautionsarenowbuiltintothebaseofmaturesoftwareorareincreasinglyfamiliarpartsoffacilities'operatingprocedures.
Highreliabilityoftechnologysupportsacapacityforhighlevelsoftrustandtheabilitytowidelydistributefunctionsanddatabases.Thesedistributedsystemscanachievethesamelevelsofqualityandtrustascentralizedarchivesthroughtheuseofthesameunderlyinghardwareandsoftwaretechnology,operatingprocedures,safestorageofcopies,andhigh-quality(error-corrected)telecommunicationconnections.HighreliabilityhasenablednewapplicationssuchastheWorldWideWeb,inwhichcontextswitchingfromonemachinetothenextonaworldwidebasisisreadilyaccomplished.Increasedreliabilityalsohasallowedcomputingtechnologytobeputintothehandsof
![Page 144: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/144.jpg)
businessmanagers,consumers,andshopclerks.Withoutsuchreliability,maintenanceeffortwouldoutweighproductivitybenefit.Asaresult,powerfulorganizationaloroperationalframeworkscanbebuilt,muchasnewmaterialsenablenewarchitectureornewmachines.
![Page 145: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/145.jpg)
Page47
DevelopmentandAcceptanceofStandards
Thedevelopmentofeffectivestandardshasbeenpivotaltopromotingthewidespreaduseofelectronicinformation.CommunicationprotocolssuchasTCP/IPhavefueledthegrowthoftheInternet.Otherformatstandardsfordocumentssupporttheirinterchange.Forexample,theStandardGeneralizedMarkupLanguage(SGML)providesauniformwayofformattingtextualdocumentssothattheycanbereadbydifferentdocumentprocessingtools.TheHyperTextMarkupLanguage(HTML)isastandardusedtorepresentandlinkdocuments;itisusedtodescribepagesviewedwithInternetviewerssuchasMosaic.Hardwareandsoftwarestandardssuchastheinstructionsetarchitecturesformicroprocessor-basedcomputers,modemprotocols,mediaformats,andquerylanguagesalsohaveplayedcriticalroles.
Standardscansimplifymanyofthetraditionaldatamanagementjobs.Forexample,thetimethatwouldbeusedtodecipheratapeformatissavedandthejobofinstallinganewapplicationisfacilitated.Havingeffectivestandardsinplacereducestheleveloftedious,nonproductiveeffortandfreesuptimefornewtasksforthearchivist.Standardsdeterminednowwilltypicallybeineffectforlongperiodsoftime,perhapsadecadeormore,withsomesmallevolutionaryaugmentations.Thismeansthatabaselineofappropriatestandardscanbeselectedforabodyofinformationwithsomereasonableexpectationthattheywillnotbequicklyreplaced.Whenitappearsthattheexistingstandardsbaselineneedstobeupdated,theinformationcanthenbemigratedtoanewone.Adeliberatedatamigrationstrategybasedonstandardstrackingispossible.
Theroleofstandardscertainlyisnotlimitedtothegeneralcomputingcommunity.Scientificteamsanddisciplinegroupscontinuouslyworktocodifybestpractices,definitions,andalgorithms.Thesearepropagatedascommunitystandards.Standardsdevelopedbythescientificcommunityareoftenthemostimportanttopromoteandapply.If
![Page 146: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/146.jpg)
communityareoftenthemostimportanttopromoteandapply.Ifproperlypromulgated,theycanenableimprovedunderstanding,broadercollaboration,andfacilitationofthedatamanagementandrelatedresearch.
Finally,itshouldbeemphasizedthatstandardsandguidelinestosupportlong-termarchivingmustnotinhibitinnovation,ortheevolutionofinformationsystemsandtechnology.Oftenthebeststandardsandguidelinesarethosethatareindependentoftechnology.
OpportunitiesForNewOrganizationalStructures
Withrapidtechnologicalimprovementsandnewlyenabledcapabilities,itissometimeseasytoforgettheimportanceoflong-termcommitmentbymanagerstopolicyandresourcerequirements.Notechnologicalchangeswillbythemselvesreplacethebasic,unsungeffortsofhigh-qualityscientificdatamanagement.Infact,althoughtechnologyitselfcanimprovetheavailabilityofdata,trulyaccessibleandusefulscientificinformationwillbeachievedonlythroughsuchmanagementcommitment.Thiscommitmentmustbebasedonacoherentstrategyforlife-cyclemanagementofdata,includingtechnologyacquisition,dataandinformationmanagementpractices,andtechnology-independentstandardstoensurethattheminimumlevelsofdatacontentandconsistencyforresearchusesaremet.Further,suchacomprehensivestrategywillbesuccessfulonlywiththeactiveandcommittedinvolvementofthescientificcommunityitself.Thelevelofeffortandchangethatmayberequiredtoachievethiscommunityinvolvementcannotbeunderestimated,andfundamentalchangetothevaluesystemofthecommunitymayberequired.
Nevertheless,asdiscussedabove,technologicaladvancesallowthecreationofnewinfrastructure,challengingexistingorganizationalassumptions.Effectiveorganizationaldesignsbasedonnewallocationsofresponsibilityareenabled.Forscientificdatamanagement,thetechnologicalchangessupportorganizationswiththefollowingattributes:
![Page 147: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/147.jpg)
attributes:Widelydistributedresponsibility.Newtelecommunications,datamanagement,andstandardstechnologyallowsforhighlevelsoftrustindistributeddatamanagement.Physicalpossessionofdataby
![Page 148: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/148.jpg)
Page48
archivistsisnolongeressential.Thewideavailabilityofinformationtechnologyprofessionalsandotherskilleddatamanagers(alongwiththelowertechnicalskilllevelsactuallyneeded)enhancestheabilitytodistributethedatamorebroadlyandincreaseuserparticipation.Suchdistributionofdataandtheirownership(whetheractualorimplied)byusergroupsimprovestheutilityofthedataandhelpscreateimportantsupportforlong-termretention.High-valuepeer-to-peercommunication.Withaccesstodataandtopeopleonline,avarietyofnewcollaborativerelationshipscandevelop.Informationcanbebroadcasttointerestedindividualsinatimelyfashion.Datacanbeprovideddirectlytofieldresearcherstofocusnewdatacollection.Physicalproximityandformallinesofcommunicationarenolongervitaltoeffectiveorganizationaloperation.Indeed,closed,highlystructuredorganizationsoftenwillbeuncompetitiveorfailtotakefulladvantageofinnovation.Specializeddatacenters.Distributionofresourcesimpliesthatsomespecificlocationscanspecializeandyetstillcontributeeffectivelytoall.Specializedgroupsorinstitutionscouldbecreatedinascientificdisciplineorinsomeaspectofdatamanagement,archives,orstandards.Designationofsuchspecializedcenters,inadditiontothosealreadyinexistence,isasignificantmechanismforachievingeconomiesofscale,reducingoverallcostswhileenhancingtheeffectivenessofcertainfunctionsforthebenefitofall.Explicitlong-term(technology)strategies.Along-termtechnologystrategyneedstobedeveloped.Therapidlychangingbaseoftechnologyrequiresthatadeliberatesequenceofphasesbeselected,throughwhichdataanddatamanagementwillmigrate.Theconstantevolutionofinformationtechnologiesdemandsthatanorganizationalelementtakeonthis''technologynavigation"function.Measurementasavitaltool.Inafast-paced,and,perhaps,widelydistributedeffort,metricsareimportanttoclearlycommunicateexpectationsofperformance,registerresults,andhelpindetectingweak
![Page 149: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/149.jpg)
spotsforcorrectiveaction.Inparticular,metricscouldbeestablishedtodeterminedatasetuseandtosupportarchivingstrategydecisions.Metricsalsocouldbedevelopedtohelpensurehigh-qualityserviceandproperdataprotection.
![Page 150: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/150.jpg)
Page49
5ANewStrategyforArchivingtheNation'sScientificandTechnicalDataThescientificandtechnicaldataheldbyfederalgovernmentagenciesandbyotherinstitutionssupportedbyfederalfundsconstituteanextremelyvaluablenationalresource.Unfortunately,inmanycasesthisresourcecanbeexploitedonlywithgreatdifficultybecausekeyelementsoftheinfrastructureforbroadandeasyaccesstoitareincompleteormissing.
Currently,themostimportantdevelopmentwithinthefederalgovernmentforimprovingthemanagementandlong-termretentionofscientificandtechnicaldataistheNationalInformationInfrastructure(NII)initiative.TheNIIfocusesontheapplicationofpublic,private,andacademicresourcestodefine,implement,andmaintainanevolvingnetworkofknowledgeresources(IITF,1993).Thisinfrastructurewillbethefoundationforinformation-centeredenterprisesofthenextcentury(NRC,1994).Thescientificcommunity,whoselifebloodiswidelyavailabledataandinformation,mustbecomefullyengagedinthisnationaleffort.Acoherentstrategyneedstobedefinedandimplemented,tocombinenewtechnologicalcapabilitywithanewwayofdoingbusinessthroughoutallphasesofthescientificinformationlifecycle(observation,measurement,analysis,interpretation,application,dissemination,andeducation).
Aneffectiveinformationinfrastructuremustbuildonenablingtechnologiestocreateanintegratedandadaptivesystemthatiseasilyaccessibletoallpotentialusers.EachusercommunitywillhaveitsownviewofwhattheNIImeanstoitsenterpriseandhowtheNIIcanbestserveitsusersbecausetheNIIwillbemadeupofmanyseparate
![Page 151: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/151.jpg)
"enterpriseinformationinfrastructures."Theexistingscientificandtechnicaldatacentersandarchivesalreadyconstituteaseparateenterpriseinformationinfrastructure,whichmustbecomefullyintegratedintotheNII.
Inthediscussionthatfollows,thecommitteelaysoutathree-partstrategyforthelong-termretentionofscientificandtechnicaldata.TheelementsofthisstrategyarebasedonthetechnologicaladvancesoutlinedinChapter4andontheissuesraisedinChapter2,whichprovidethecontextandtheneedforaction.
Thestrategybeginswithasetoffundamentalprinciplesforthelong-termretentionofscientificandtechnicaldata.Thesecondmajorelementoutlinesthecommittee'sproposaltoformaNationalScientificInformationResourceFederation,whichwouldprovideacoordinationmechanismforend-to-endmanagementofnetworkedscientificandtechnicaldatafacilities.ThefinalsectionshighlightsomespecificrecommendationsforNARAandNOAAintheirlong-termretentionofscientificandtechnicaldata.
![Page 152: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/152.jpg)
Page50
FundamentalPrinciplesForLong-TermDataRetention
Inordertorespondadequatelytotheimperativesforpreservingdataaboutthephysicaluniverseandeventuallytocreateanintegrated,adaptive,andaccessibleinfrastructure,thefederalgovernmentshouldhelpestablisheffectiveandaffordableprocessesforprovidingreadyaccesstothevastnationalresourceofscientificandtechnicaldataandrelatedinformation.Theprocessmustsupporttheneedsofdataoriginators,users,andcustodiansacrossallphasesofthedatalifecycle,fromorigintousebyfuturegenerations.Thecommitteebelievesthatthefollowingprinciplesshouldguidetheeffortofthegovernmentagenciesinthelong-termretentionofscientificandtechnicaldata:Dataarethelifebloodofscienceandthekeytounderstandingthisandotherworlds.Assuch,dataacquiredinfederalorfederallyfundedendeavors,whichmeetestablishedretentioncriteria,areacriticalnationalresourceandmustbeprotected,preserved,andmadeaccessibletoallpeopleforalltime.Theoriginalcollectionandanalysisofscientificandtechnicaldatatraditionallyhavebeenusedprimarilytosupportthescholarlypublicationofscientificinterpretationbyindividualinvestigators.Theavailabilityofcompleteandconsistentdatasetsforbroaderuses,bothwithinandoutsidethescientificcommunity,wouldsignificantlyincreasethereturnontheinvestmentmadeinobtainingthosedataandprovideinsightsnotattainableiftheoriginaldatawerelostorunusable.Thevalueofscientificdataliesintheiruse.Meaningfulaccesstodata,therefore,meritsasmuchattentionasacquisitionandpreservation.Technologycanmakedataavailablethroughfastcomputers,large-bandwidthnetworks,massivestoragecapabilities,andportablemedia.However,ifthepathstodataareobscure,orthereisnowayforausertodeterminewhatissignificantandrelevant,thenthedatabecomeinaccessibleandareeffectivelylost.Adequateexplanatorydocumentation,ormetadata,caneliminateoneoftoday'sgreatestbarrierstouseofscientificdata.Theproblemof
![Page 153: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/153.jpg)
today'sgreatestbarrierstouseofscientificdata.Theproblemofinadequatemetadataisamplifiedwhenusersareremovedfromthepointoforiginbybeinginadifferentdiscipline,byhavingadifferentlevelofexpertise,orbytime.Addressingthisproblemcomprehensivelywillmakedatausefulinthebroadestpossiblecontext.Asuccessfularchiveisaffordable,durable,extensible,evolvable,andreadilyaccessible.Thesetermsmayappeartobevaguetargets,buttheyimplybasicgoals.Thecostsofdeveloping,operating,andusinganarchivemustnotbeexcessive.Thearchivemustenduretheravagesoflong-termuse,anditmustbeabletoextendbroadlytheservicesitoffersandtherecordsitmanages.Itmustevolvetosupporttheassimilationofnewtechnology,policies,procedures,anduses.Finally,anarchiveisnoteffectiveifabroadpopulationofuserscannotuseit.Thearchivingsystemthusshouldprovidemultiplelevelsofaccesstoanysubsetofitsholdings,althoughholdingsnotaccessedoftenmaynotrequireasophisticatedaccessmechanism.Theonlyeffectiveandaffordablearchivingstrategyisbasedondistributedarchivesmanagedbythosemostknowledgeableaboutthedata.Archivecentersgenerallyshouldbeattheagenciesorinstitutionsthatcollectthedata,andtheyshouldberesponsibleforarchivingandprovidingaccesstothedataaslongastheagency'sorinstitution'smissionandscientificcompetencecontinuetoencompassthesubjectfield.Physicaltransfersofthedatashouldbeavoidedifpossible,soagenciesandinstitutionswillneedtoallocateadequateresourcestotheentirelifecycleoftheirdataholdings.Planningactivitiesatthepointofdataoriginmustincludelong-termdatamanagementandarchiving.ThisprincipleisrecognizedintheOfficeofManagementandBudgetCircularA-130onthe"ManagementofFederalInformationResources"(OMB,1994).Thescientificinformationmanagementspectrumspansdatacollectedfromasensortothescholarlypublicationsthatreportscientists'interpretationsofthedata.Scientists,informationtechnologyprofessionals,datamanagers,librarians,andarchivistsmustunifytheirexpertiseintheestablishmentofacoherentstrategyforend-to-enddataandinformationmanagement.
![Page 154: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/154.jpg)
ofacoherentstrategyforend-to-enddataandinformationmanagement.Althoughthesecommunitiestraditionallyhavenotworkedcloselytogether,
![Page 155: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/155.jpg)
Page51
theircombinedknowledgeandeffortarenowrequired.Thebenefitofincorporatingplanningatthepointoforiginisthatitischeaperandmoreeffectivetoplanforretentionthantoreconstructdatasetslater.
TheProposedNationalScientificInformationResourceFederation
ThecommitteebelievesthatthefederalgovernmentshouldcreateaNationalScientificInformationResourceFederationanevolutionaryandcollaborativenetworkofscientificandtechnicaldatacentersandarchivestotakeonthechallengeofprovidingeffectiveaccesstoandpreservationofimportantscientificandtechnicaldataandrelatedinformation.Suchaninitiativewouldbegintoexploitmorefullyournation'ssignificantinvestmentinthephysical(andother)sciencesandthedataacquiredwiththatinvestment.Inthediscussionthatfollows,thecommitteereviewsthebasicelementsofafederatedmanagementstructure,describessomenotableexamplesofexistingfederalgovernmentorganizationsforlarge-scaledistributeddatamanagement,andoutlinesthemostimportantaspectsoftheproposedNationalScientificInformationResourceFederation.
ElementsofaFederatedManagementStructure
Severalcriticalconceptsmustgovernanyfederatedmanagementstructureforittofunctionproperly.Theseincludethenotionsofsubsidiarity,pluralism,standardization,theseparationofpowers,andstrongleadershipatalllevels(Handy,1992).
Subsidiaritymeansthatpowerisassumedtoliewiththesubordinateunitsofanorganizationandcanberelinquished,butnottakenaway.Thesubordinateunitstypicallyarebestqualifiedtomakeoperationaldecisionsthatdirectlyaffectthemandthattheywillbeimplementing.Thecentralmanagementisallowedonlythosepowersneededtoensurethatthesubordinatesdonotdamagetheorganization.Forexample,theConstitutionoftheUnitedStatesreservesonlyspecifiedpowersforthe
![Page 156: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/156.jpg)
ConstitutionoftheUnitedStatesreservesonlyspecifiedpowersforthefederalgovernment,withanyunstatedpowersbelongingtothestates.Appliedtothesituationathand,itisclearthatthestrengthsofthecurrentsystemformanagingscientificandtechnicaldataandinformationintheUnitedStatesaredistributedamonganumberofdiversedatacentersandarchives,bothwithinandoutsidethegovernment.Asuccessfulfederationoftheseexistinginstitutionswouldrecognizethattheyarethelocationsofexpertiseontheirrespectivedataholdings.Thusthecentralorganizationshouldbesmallandshouldnotmicromanagetheday-to-dayoperationsofthesubsidiaryorganizations.
Pluralismmaybedefinedasinterdependenceofthemembers.Inafederation,theindividualsubsidiaryorganizationsrecognizetheadvantagesofbelongingtothefederation,becauseofproductsorservicesthatcanbeobtainedfromotherelementsinthefederation.Asnotedinthepreviouschapter,theexistenceofmanyspecializeddatacentersandarchives,aswellasthepossibilityofcreatingnewonesinanetworkedenvironment,canoffersignificanteconomiesofscaleandimprovedsharingofideasandexpertise.Whatisgoodforthesubsidiaryelementalsoshouldbegoodforthewhole.Pluralism,coupledwithsubsidiarity,guaranteesameasureofdemocracyinthefederation.
Interdependence,inturn,requiresstandardizationoflanguages,communications,basicrulesofconduct,andunitsofmeasurement.Theseelementsmaybesummarizedastechnicalandproceduralstandardization.ThistoowasdiscussedinChapter4,regardingthedevelopmentofstandardsinsoftware,hardware,anddatamanagement.Standardsthataredevelopedbyconsensusofthesubsidiaryelements(e.g.,theparticipatingdatacenters,archives,andresearchers)arewidelyrecognizedasessentialtothesuccessfulmanagementofdata.
Aseparationofpowers(responsibilities),withasystemofchecksandbalances,isnecessarytoensurethatthecentralauthoritydoesnottakeonunnecessarypower.Thisprinciplemustbeincorporatedintothefederation'sorganizationalstructure.
![Page 157: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/157.jpg)
Finally,afederationrequiresstrongleadershipthatiseffective,yetnotoverbearing.Thecentralcoordinatingelementorexecutiveofficemustactasthestandardbearer,promotingthefederation's
![Page 158: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/158.jpg)
Page52
establishedgoalsandobjectiveswhileremindingthesubsidiaryorganizationsoftheimportanceofcarryingouttheirresponsibilities.
ExamplesofDistributedDataManagementOrganizations
Successfulexamplesofafederatedmanagementstructurearenumerousintheprivatesector(Handy,1992).Morespecifically,however,therealreadyaretwolarge-scale,federalgovernment,distributeddatamanagementgroupsthatembodymany,thoughnotall,ofthefederatedmanagementattributesoutlinedabove.ThesearetheInteragencyWorkingGrouponDataManagementforGlobalChangeandtheFederalGeographicDataCommittee.
InteragencyWorkingGrouponDataManagementforGlobalChange
In1990,CongressformallyestablishedtheU.S.GlobalChangeResearchProgram(GCRP),"aimedatunderstandingandrespondingtoglobalchange,includingthecumulativeeffectsofhumanactivitiesandnaturalprocessesontheenvironment,[and]topromotediscussionstowardinternationalprotocolsinglobalchangeresearch"(CENR,1994).TheactivitiesoftheGCRParecoordinatedbytheCommitteeonEnvironmentandNaturalResources(CENR),underthePresident'sNationalScienceandTechnologyCouncil.
Thetimelyavailabilityofabroadspectrumofscientificdataandinformation,frombothgovernmentalandnongovernmentalsources,isfundamentaltomeetingthegoalsofthisprogram.AGlobalChangeDataandInformationSystem(GCDIS)isbeingcreatedtofacilitateaccesstoanduseofthedataandinformationnecessarytosupportglobalchangeresearch.ThefederalorganizationsinvolvedintheGCDISplanningincludetheDepartmentsofAgriculture,Commerce,Defense,Energy,Interior,andState,aswellastheEnvironmentalProtectionAgency,theNationalAeronauticsandSpaceAdministration,andtheNationalScienceFoundation.
![Page 159: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/159.jpg)
AccordingtoTheU.S.GlobalChangeDataandInformationSystemDraftImplementationPlan(CENR,inpress),theGCDISisbuildingontheresourcesandresponsibilitiesofeachparticipatingagency,linkingthedataandinformationservicesoftheagenciestoeachotherandtotheusers.Thesystemthusiscomposedlargelyoftheseparatelyfundedcomponentscontributedbytheparticipatingagencies.Itissupplementedbyaminimalamountofcrosscuttingnewinfrastructurethroughtheuseofstandards,commonmanagementapproaches,technologysharing,anddatapolicycoordination.NeitheraleadagencynoraseparatelyfundedbudgetfortheGCDISisplanned;rather,implementationofthesystemisbeingcoordinatedthroughtheInteragencyWorkingGrouponDataManagementforGlobalChange(IWGDMGC).Decisionmaking,therefore,isdonethroughaconsensusprocessbasedonthecommoninterestsofallparticipants.
PlansfortheGCDISrecognizethattheglobalchangedatamustbeavailableforaverylongtime,regardlessofthechanginginterestsoftheresearcher,group,oragencythatoriginallycollectedandanalyzedtheobservations.AlthougheachagencyparticipatingintheGCDISisexpectedtomanage,store,andmaintainthedatasetsunderitspurview,theplandoesallowanagencytodesignateanotherGCDISagencytoarchivesomeofitsdata.Theparticipatingagenciesareexpectedtoadheretogovernmentstandardsformedia,storage,andhandlingasprescribedbyNARAandtheNationalInstituteofStandardsandTechnology.TheagencyarchivesassociatedwiththeGCDISaccesssystemwillbestaffedbyprofessionalswhounderstandthedataandtheirsources.TheIWGDMGCexpectstodevelopguidelinesforpreparingdatasetsandassociateddocumentationforlong-termretentionattheparticipatingagencies.Ideally,theGCDISarchivesalsowillbeassociatedwithresearchgroups,bothwithinandoutsidegovernment,who,asprincipalusersofthosedata,willverifyqualityanddocumentationofthedata.
![Page 160: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/160.jpg)
![Page 161: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/161.jpg)
Page53
TheGCDISplangiveseachagencyresponsibilityforitsowndata-purgingpolicies,althoughinteragencycoordinationprocedureswillbedevelopedtopreventthelossofimportantdatasets.Beforeanydatasetsarepurged,however,anagencywillberequiredtonotifytheIWGDMGCofitsplansatleastoneyearinadvance,andtoallowotherGCDISagenciestoindicatetheirrequirementsforthosedata,ortoagreetoassumeresponsibilityforthearchivingofthosedata.Intheeventthatnoagreementcanbereachedonthedispositionofadatasetidentifiedforpurging,existingNARAprocedureswillapply(CENR,inpress).
FederalGeographicDataCommittee
Theothermajorfederaldatacoordinationentityimportanttothelong-termmanagementofobservationaldata(includingsomedatafromthebiologicalandsocialsciences)istheFederalGeographicDataCommittee(FGDC).TheOfficeofManagementandBudget(OMB)establishedtheFGDCin1990todevelopaNationalSpatialDataInfrastructure(NSDI)toworktowardthecoordinateddevelopment,use,sharing,anddisseminationofgeographicdata(OMB,1990).ParticipatinggovernmentorganizationsincludetheDepartmentsofAgriculture,Commerce,Defense,Energy,HousingandUrbanDevelopment,Interior,State,andTransportation,aswellastheEnvironmentalProtectionAgency,FederalEmergencyManagementAgency,LibraryofCongress,NationalAeronauticsandSpaceAdministration,NationalArchivesandRecordsAdministration,andTennesseeValleyAuthority.Infulfillingitsmandate,theFGDCcarriesoutthefollowingactivities,amongothers:promotesthedevelopment,maintenance,andmanagementofdistributeddatabasesystemsthatarenationalinscopeforgeographicdata;encouragesthedevelopmentandimplementationofstandards,exchangeformats,specifications,procedures,andguidelines;promotestechnologydevelopment,transfer,andexchange;andpromotesinteractionwithotherexistingfederalcoordinating
![Page 162: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/162.jpg)
mechanismsthathaveinterestinthegeneration,collection,use,andtransferofspatialdata(FGDC,1994).
TheFGDChasreceivedauthorityandsomelimitedfundingtopursuetheseobjectives.Specifically,ExecutiveOrder12906on"CoordinatingGeographicDataAcquisitionandAccess:TheNationalSpatialDataInfrastructure,"assignstotheFGDCtheresponsibilitytocoordinatethefederalgovernment'sdevelopmentoftheNSDI.ThatExecutiveOrderalsoinstructstheFGDCtoinvolvestateandlocalgovernmentsinitsNSDIactivities,andtousetheexpertiseofacademia,professionalsocieties,theprivatesector,andothersasnecessarytoassisttheFGDC.
TheFGDChasestablishedamatrixofsubcommitteesandworkinggroupsaccordingtodiscipline-relateddatacategoriesandinterests.Theworkinggroupissuesincludeaframeworkfordata,aclearinghousefordata,standards,technology,anddataarchiving.TheFGDCplansfordataarchivingarestillbeingdeveloped,however.
CreationoftheNationalScientificInformationResourceFederation
Thetwoexamplescitedaboveindicatethatafederatedmanagementstructureforhighlydistributedscientificdatacanbecreated.Infact,betweenthesetwogroups,thelife-cyclemanagementofmanyofthedatathatarethetopicofthisreportisbeginningtobesystematicallyapproached.Nevertheless,asdiscussedinthisreportandinthevolumeofworkingpapers(NRC,1995),manyimportantgapsandinadequaciesremaininthemanagementandretentionofournation'sscientificdataandrelatedinformation.ThecommitteebelievesthatthesedeficienciescanbestbeaddressedbyacomprehensivefederatedsystemaNationalScientificInformationResource(NSIR)Federationthatbuildsonthesuccessesof
![Page 163: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/163.jpg)
![Page 164: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/164.jpg)
Page54
theexistinggroupsandhelpscoordinatethemwithotherdatamanagemententitiesthatstillneedimprovement.
Therearemanyreasonswhyitisnowpropitioustoestablishasystemoffederateddatamanagement,withanemphasisonlong-termretention.Fromapolicyperspective,itwouldbeconsistentwiththegoaloftheNationalInformationInfrastructuretodistributeinformationresourcesbroadlythroughoutoursociety,withthefederalgovernmentactingasfacilitatorforsuchactivities.Thetechnologyisavailabletomakeafullynetworked,buthighlydistributed,systemofdatacentersandarchivesbothfeasibleanddesirable.Suchasystemwouldbeefficientinprovidingaccesstoscientificdataandinformationtoalargenumberofpotentialusersandwouldmaximizethegovernment'sreturnonthesignificantinvestmentthatinitiallywentintoacquiringthosedata.Fromanorganizationalstandpoint,afederatedmanagementstructurewouldallowthedisparateelementstocontinuetospecializeinwhattheyeachdobestandtofulfilltheirindividualorganizationalmandates,whileprovidingsomeefficienciesofscaleandpoliticalleverageinaddressingthemostpressingissues.Moreover,thistypeofapproachisespeciallytimelyandimportantinaneraoffederalgovernmentbudgetreductions.Thecommitteethereforeenvisionsabroadlynetworkedorganization,whichwouldbeimplementedthroughthecollaborationofthefederalgovernment'sscientificandtechnicalagenciesaswellascommercialandnoncommercialorganizationsoutsidethegovernment,andintegratedintotheemergingNationalInformationInfrastructure.
MostoftheelementsoftheNSIRFederationarealreadyinplace.Theseincludethedatacentersandfieldarchivesrunbyseveralofthefederalagenciesthatareamongtheprimarygeneratorsandcollectorsofthenation'sscientificdataandinformation.Inadditiontoholdingdata,thesecentersandarchiveshavehighlyskilledstaffwiththerequisiteexpertise.Theorganizationsarewidelydistributed,both
![Page 165: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/165.jpg)
geographicallyandbydiscipline.
Theexistingdatacentersandfieldarchives,however,donotapproachthefederatedorganizationalmodelforseveralreasons.Thereisnounifyingorganizationamongthevariouselements,thereiswidedisparityinthequalityanddepthofserviceprovided,andfewofthemhaveachartertopreservedata"permanently."AlthoughNARAhasthestatutorychartertopreservefederalrecordsinperpetuity,itscurrentandprojectedholdingsofelectronicscientificrecordsareverysmall.WhilethecommitteedoesnotbelievethatNARA'sarchivesofscientificdatashouldincreasesubstantially,itfoundlittleevidenceofactivitywithinthescientificandtechnicalagenciesthatwouldindicatethattheirabilitytoprovideforlong-termretentionandaccesstotheirdatawouldimprovewithoutsomerestructuring.
Afundamentalpreceptisthatthosemostfamiliarwithscientificdatathescientiststhemselvesareinthebestpositiontooverseethemanagementofthosedata(NRC,1982).Inlightofthevolumeanddiversityofscientificdata,adistributedapproachthatmaintainsthedataclosesttotheprimaryusercommunityisthemosteffectivemethodformanagingthem.Asmentionedabove,severalagencieshaveadoptedanapproachofcaringfortheirdatainsystemsoffieldarchivesordisciplinedatacenters.Althoughtheseagencieshavedevotedsignificantattentiontothepreservationofdata,theirconcernislimitedtoprovidingimmediateservicetoprimaryusersofthedatafortheiroriginallyintendedpurpose.Littlethoughthasbeengiventotheperpetualarchivingofthedatawithinmostagencies,withthenotableexceptionofNARAandNOAA,whichalreadyhaveastatutorymandatethatallowsthemtopreservedatacollectedbythefederalgovernment.Becauseitisnotpossibletobesurethatanydatacenterwillexistinperpetuity,somemechanismmustbeinplacetoensurethatthedatawillberetainedbyanappropriateorganizationwithinoroutsidethegovernmentintheeventthatthecontinuedexistenceofadatacenterisjeopardized.
![Page 166: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/166.jpg)
Ifaleadagencycanbedeterminedforasubjectmatter,thenitshouldtakeresponsibilityforcoordinationofscientificdataonthatsubject,nomatterwhichagencyhasphysicalownershiporcustodyofthosedata.Thecommitteerecognizes,however,thatsomedatasetsarelargelyofinterestattheboundariesofdisciplinesoragencychartersandthatconsequentlythesemaybemoredifficulttomanageordocumentproperly.Largedatasetsthatareofaninterdisciplinarynaturecausespecialproblemsin
![Page 167: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/167.jpg)
Page55
thisregard.Forthesecomplexsituations,nosimplerulewilltaketheplaceofnegotiationsamongtheinvolvedagenciestomakethenecessaryarrangementsforlong-termarchiving.Indeed,everyagencyshouldassumetheobligationtokeepitsholdingsofscientificdatainusableform,evenifthedataarenotinactiveuse,untilagreeingondispositionofthosedatawithNARAoranotheragency.
Inadditiontotheagency-administereddatacenters,thereareeducationalorprivateconcernsthatholdandadministerdataimportanttooneormoreagencies,suchasthearchiveddatafromtheNOAAGeostationaryOperationalEnvironmentalSatellitesattheUniversityofWisconsinortheseismicdataheldbytheIncorporatedResearchInstitutionsforSeismology.Whilesomeofthesenonfederalarchivesarefirmlyassociatedwithoneormorefederalagenciesthroughcontractualandfundingrelationships,inothercasesaone-to-oneassociationislessclear.Itfollowsthatawell-definedchainofresponsibilitymustbeestablishedforalldatathataretobepreserved.Thisdecisionshouldbemadebytheindividualsandinstitutionsmostcloselyassociatedwithandinterestedinthosedata,anditshouldbemadewithdueconsiderationforcostefficiency,appropriateexpertise,scientificinterest,andconvenience,amongotherfactors.Establishingaclearconnectionbetweenafieldarchiveandanagencyshouldinnowaylimitthecommunityofusersservedbythearchive,butshouldensureanorderlyandsecurepathofresponsibilityforthedata.
Thestructureofthenation'sscientificandtechnicalorganizationscontinuestochange.Insomeinstances,institutionsorevenagencieswillmerge,whileinothercases,organizationsmaydisappear.Whensuchchangesoccur,itislikelythatthescientificinterestsformerlyrepresentedbythoseorganizationswillbesubsumedbyexistingornewagenciesororganizations.ThegeneraltopologyoftheNSIRFederation,however,wouldnotchange.
![Page 168: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/168.jpg)
ThecommitteedoesnotanticipatethatthecreationandimplementationoftheFederationwillrequiremuchadditionalfunding,ifany,becauseitwillconsistprimarilyofimprovinglinkagesandcoordinationamongexistingdatacenters,archives,andrelatedorganizationswithinahighlydecentralizedmanagementstructure.Moreover,anycostsincurredinthisprocessshouldbemorethanoffsetbytheimprovementsinefficiencyandaccesstothedataandrelatedinformationresources.
RecommendationsForTheCreationOfTheNSIRFederation
Thecommitteethusrecommendsthatthefederalgovernmenttakethefollowingstepsforadequatelypreservingandprovidingaccesstodataaboutourphysicaluniverse:
AdopttheNationalScientificInformationResource(NSIR)FederationconceptasanintegralpartoftheNationalInformationInfrastructure(NII).Thisconceptmustencompassnotonlyanelectronicnetwork,butalsoindividuals,organizations,communities,dataresources,procedures,guidelines,andassociatedactivitiesofdatageneration,management,custodianship,anduse.TheNSIRFederationshouldprovidethefoundationfordefiningacoherentapproachtomanagementofthelifecycleofscientificdata,withthegoalofprovidingbroadandeffectiveaccesstoallpotentialusersascosteffectivelyaspossible.TheFederationshouldbedevelopedandimplementedthroughconsensusofcollaboratingorganizationswithdiverseandautonomousmissions.TheGCDIS,inparticular,isanexampleofaprototypeNSIR,focusedondataforaspecificsetofinterdisciplinaryscienceproblems.TheNSIRFederationwouldbuildonsuchefforts,providingforbettercoordinationandinteractionamongthem,andwouldhelporganizefledglingeffortstopreserveandprovideaccesstodatainotherdisciplines.
TheadministrationshouldtakethestepsnecessarytofullydefineandcreatetheNSIRFederation.Thereareatleasttwopotentialfocal
![Page 169: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/169.jpg)
pointswithintheadministrationforplanningsuchanactivity.ThesearetheinteragencyInformationInfrastructureTaskForcefortheNIIandtheNationalScienceandTechnologyCouncil.TheNSIRFederationcouldbecreatedinamannersimilartothecreationoftheFederalGeographicDataCommitteeanditsNationalSpatialDataInfrastructure(e.g.,
![Page 170: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/170.jpg)
Page56
throughanOfficeofManagementandBudgetCircularandExecutiveOrder),oroftheInteragencyWorkingGrouponDataManagementforGlobalChangeanditsGlobalChangeDataandInformationSystem(e.g.,throughlegislationincooperationwiththeadministration).Aconvocationofrepresentativesfromthescientific,dataandinformationmanagement,andarchivingcommunitieswouldbeagoodwaytodefineandinauguratethisinitiative,focusingonthemostsignificantissuesandproblemsidentifiedattheendofChapter2.
FollowingtheformalauthorizationbythefederalgovernmentforcreatingtheNSIRFederation,theprincipalparties,includingNARAandNOAA,shouldconcludeagreementsfortheimplementationofadistributedarchivesystem.Thesystemshouldinvolveallrelevantinstitutions,includingnongovernmentalentitiesthatarefundedbythefederalgovernmentorthatmaintaindatathatwereacquiredwithfederalfunds.Asageneralprinciple,datacollectedbyanagencyshouldremainwiththatagencyindefinitely.ThecommitteerecognizesthatthisrecommendationmayrequiresignificantoperationalchangesforagenciesotherthanNOAA,andevensomechangeswithrespecttoNOAA'sdataactivities.Inaddition,NARAshouldconsiderconcludinginteragencyagreementstogiveformalrecognitionofthisprocessasappropriate.Furthermore,theassociatedagenciesintheNSIRFederationmustworktogether,undertheleadofasmall,coordinatingexecutiveofficewiththeexpertisetoestablishdatamanagementguidelinesandminimumcriteriaforadequatemetadatathatcouldbeappliedacrosstheentireFederation.Theexecutiveofficecouldbeeitherahigh-levelinteragencycoordinatingcommittee,similartotheFGDC,oranewofficeatanappropriatefederalagency,suchastheNationalScienceFoundation,whichhasabroadscientificandtechnicalaswellascommunicationmandate.Inanycase,theexecutiveofficeshouldresistthetypicaltendency
![Page 171: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/171.jpg)
towardbureaucraticaccretionofpower,personnel,andresources,andthetendencytoconsolidateandcentralizedataholdings.AmanagementcouncilconsistingofrepresentativesofthememberorganizationsshouldbecreatedtohelpensurethatthecentralexecutivefunctionremainsfullyresponsivetoallmembersoftheFederation.
Dataaccessandpreservationservicesshouldbeimplementedonthemostcost-effectivebasispossiblefortheFederation.Forexample,oneinstitutionmayprovideaservicetooneormoreotherinstitutionsinordertoexploitpotentialeconomiesofscaleandfocalpointsofexpertise(e.g.,thespecializeddatacenterssuggestedinChapter4).Thismeasuremightincreasethecosttotheprovidinginstitution,butwoulddecreasetheoverallcosttothefederation,thegovernment,andthetaxpayer.Anexampleofthisisthemethodbywhichbackupcopiesofdatamightbekept.NARAmayhaveatanygiventimethemostcost-effective"vault"inwhichtokeepphysicallyseparatebackupcopiesofdataforallagencies,and,hence,thefederalgovernmentwouldsavemoneybyincreasingNARA'sbudgettoprovidethisservicefortheotheragencies.Ontheotherhand,ifcosttrade-offstudiesweretofindthatasinglelarge"vault"isnotascost-effectiveasdistributedfacilities,theneachagencywouldberesponsibleforitsownbackup.InallNSIRFederationactivities,emphasisshouldbeplacedoncontrolofcosts,withthemostsuccessfulmethodsusedbyindividualmembersidentifiedandsharedwithallothermembers.
TheinstitutionsbelongingtotheNSIRFederationshoulddevelopaprocessforcollaboratingeffectivelyonspecificinitiatives.Thisprocessshouldprovideamechanismtodefineandprioritizedatamanagementandpreservationinitiatives,toestablishtherequiredagreementsbetweencollaboratingorganizations,andtosecurefundingforeachinitiative.EachparticipatingorganizationwouldcontributetotheFederationaccordingtoitsparticularstrengthsandin
![Page 172: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/172.jpg)
amannerconsistentwiththefoundingcharter.Inaddition,anindependentadvisorybodyconsistingofexpertsfromusergroupsshouldbeformedinsupportofeachinitiative.
TheNSIRFederationshoulddevelopanationalresourceofinformationtechnologythatisconsistentwithitscharteredobjectivesandthatcanbeeffectivelydistributedtoinstitutionsthatmustmanagedata.Thesetechnologieswouldincludecompleteproducts,designs,guidelines,standards,
![Page 173: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/173.jpg)
Page57
andmethodologies.Arelatedlong-termtechnologystrategy,or"technologynavigation"function,shouldbedeveloped,assuggestedinChapter4.
TheNSIRFederationshouldinstituteanindependentlymanagedprocessforawardingNSIRcertificationtomemberscientificinstitutionsandtheirdataandinformationsystemsonthebasisofwell-definedcriteriaandstandards.Thecertificationprocessshouldbemanagedbyanongovernmental,not-for-profitorganization,whichwouldreceivetechnicalguidancefromtheparticipatingfederalagencies.Thecertificationneedstohavecredibilityinthecommunitysothatnonmemberinstitutionswillaspiretoattaincertificationandhaveittaggedtotheirproducts.Thecertificationalsoshouldbesomethingthatcommercialvalue-addedproviderswillseektoincreasethecredibilityoftheirproducts.
ItalsoisimportantforthecommitteetostatewhattheNSIRFederationshouldnotbe.Itshouldnotbecomeanexpensivebureaucraticentity.Theexecutiveofficemustnotimposeanystandardsorinformationtechnologiesfromabovethathavenotbeenvalidatedthroughaconsensusprocessofthememberorganizations.Finally,theexecutiveofficemustnotattempttomicromanagetheoperationsoftheparticipants,norshouldithaveanydirectcontrolovertheirbudgetsandfundingallocations.
RecommendationsSpecificallyForNARA
Inordertoimproveitsresponsibilitiesinthelong-termretentionofscientificandtechnicaldata,thecommitteerecommendsthatNARAstrengthenitsliaisonwitheachfederalagencythatproducessuchdatatoensurethatappropriateattentionisdevotedtolong-termdataretentioninadistributedstorageenvironment.
Asshownearlierinthisreport,NARAcannottoday,norwillitlikely
![Page 174: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/174.jpg)
everbeableto,actasthecustodianofmostphysicalsciencedata.ThedatavolumeistoogreatinrelationtothefundingappropriatedtoNARA,theNARAstaffdonothavethenecessaryspecializedscientificknowledge,theinteragencylinkagesarenotinplace,andahugeinfrastructuresimilartothatwhichalreadyexistsatotheragencieswouldneedtobeduplicatedatNARA.Theagenciesclosesttothedatasetsandbestequippedtodealwiththemarethemselvesalreadystrugglingwiththeseissues.However,NARAdoeshavegreatexpertiseinissuesinvolvingthelong-termstorageofdataandthepackagingrequirementsfordatatobeofvaluetofutureusers.
ThecommitteethereforebelievesthatNARA'sroleshouldbeprimarilyadvisoryorconsultative,tohelpensurethattheagenciesthataretheactualcustodiansofdataattheworkinglevelfollowalltherelevantfederallawsandguidelinesintakingcareofthedata.ThecommitteesuggeststhatscientificdataandrelatedinformationshouldgotoNARA'sphysicalpossessiononlyasalastresort,whentheagencythatcollectedthedatacannolongerprovideaccessfortheusercommunity.Ashasalreadybeennoted,scientificdataarebestmaintainedbytheagencythatoriginallyacquiredthosedataaslongasthereisanyregularactiveuse.Theholdingagenciesshouldcollect,analyze,store,andmakeavailablethemaximumfeasibleamountofrelevantphysicalsciencedata,consistentwiththeprinciplesandgoalssetforthfortheNSIRFederationandwiththeretentioncriteriaandappraisalguidelinesdiscussedabove.
Currently,agenciesinformNARAoftheirintentionsfortheirfederalrecords,includingscientificdata,throughvariousschedules.Allagenciesarerequiredtoschedulerecordswhentheyreach30yearsofage,althoughtheyareencouragedtodosoearlier.TheNationalClimaticDataCenterevenprovidesschedulesfordatathatitplanstoholdindefinitely,notingthatintention.Formosttypesofrecords,thepressuretoscheduleprovidestheusefulfunctionofpreventinganagencyfromsimplywarehousingcontinuallyincreasingvolumesof
![Page 175: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/175.jpg)
unusedrecordswithoutexamination.Fordatathatanagencydoesnotwishtodestroy,butthatarenotfrequentlyaccessed,NARAmakesavailablestoragespacewithouttakingownership.IfNARAdidnotprovidesomeworthinesstestforrecordsbeforeagreeingtoprovide
![Page 176: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/176.jpg)
Page58
storageforanotheragency,theFederalRecordsCenterscouldbecomeinundatedwithrecordsoflittlevalueorpotentialforfutureuse.
Asdiscussedinthisreport,weareheadingincreasinglytowardasystemofdistributedarchivesforelectronicrecords.Datasetsaredistributedamongvariousphysicallocations,andtheexpertisetointerpretthesedatasetsislikewisealreadydistributedandbecomingmoreso.TherapidincreaseincomputernetworkswithintheUnitedStatesandintherestoftheworldisbeginningtosignificantlyaffectthewaypeopleaccessinformation.Thereisalesseningneedfordatausersandproviderstophysicallypossessthedatatheyneedordistribute,andusersareincreasinglyunawareofthesourcelocation(s)ofthedatatheyareaccessing.NARAthereforeshouldcontinuetostudyarrangementsregardingthephysicalcustodyofelectronicrecords,therelationshipbetweenNARAandotheragencies,andhowthesewillandshouldbeaffectedbytheexpansionofelectronicnetworks.
Duringthecourseofthisstudy,thecommitteefoundthatwiththeexceptionofsomestaffmembersatgovernmentdatacenters,manygovernmentscientistsandmostnongovernmentscientistsarenotawareoftherequirementsoftheRecordsDisposalAct(44U.S.C.3301etseq.).EvensomeofthoseentrustedwithlargequantitiesofvaluabledatawerelargelyunawareofNARAanditsrelatedresponsibilitiesuntilcontactedbythecommittee,orbyitspanels.Thismaybepartiallybecausescientists,eventhosewithinthefederalgovernment,sometimesdonotrespondtothebureaucraticrequirementsoftheirowninstitutions.ThecommitteeisencouragedthatNARAisworkingtoaddressthisproblem.Nevertheless,manypanelvisitorsandmembersobservedthattheNARAbrochureshaveanauthoritarianandlegalistictoneandarenotconducivetoestablishingproductivepartnershipswithNARA.NARA'sfutureeffectivenessinoverseeingandadvisingonthearchivingofscientific
![Page 177: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/177.jpg)
andtechnicaldatarequiresthatitimproveitsrelationswithotheragenciesandinstitutions.
Asacorollary,noneofthecommittee'ssuggestionsshouldbeconstruedtoimplythatNARAshouldissueadditionalproclamationsorregulations.Thegoalshouldbetopresentmorecarrotsthansticks.Forexample,NARAshouldconsiderprovidingrewardsandrecognitiontoresearchers,managers,andfundersfordevelopingandimplementingsuccessfuldataretentionplans,withappropriatemetadata.Withbettercommunicationsandgreatersensitivitytotheneedsofthescientificcommunity,NARAcanplaytheroleofa''serviceprovider"and"appraisalconsultant."Forinstance,NARAisalreadyworkingwiththeDODLegacyResourceManagementProgramtoidentifyandpreserveculturalresourcesunderDODjurisdiction.NARAandthisDODprogramtogetherhavesponsoredaconferencetoassistmilitarycontractorsinpreservingtheirdocumentaryheritage.ThecommitteesuggeststhatNARApursueothersuchcollaborationsinthesamespiritofpartnership.
Asamatterofformalresponsibilityandtraining,NARAstaffaremoreconcernedwithlong-termarchivingissuesthanmoststaffatotheragencies.NARAthereforecanserveanessentialroleinremindingagenciesofthelong-termvalueofdataandshouldregularlyprovideadvicetoagenciesthatkeepscientificdataonhandforextendedperiodsoftime.NARAalsoshouldconductcontinuousresearchonretentionandappraisalissuestoremainwell-informed.ThecommitteerecommendsthatNARAformstandingadvisorycommitteeswithmanagersofscientificdata,historians,andscientificresearcherstoaddresstheretentionandappraisalofscientificandtechnicaldatacollections,andrelatedissues.
Unfortunately,NARAhasalmostnoscientificexpertisewithinitsranks(exceptrelatedtophysicalrecordspreservation).Despitethelargeamountsofscientificinformationwithinsomefederalrecords,
![Page 178: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/178.jpg)
NARAofficialshaveindicatedthattheydonotbelievethattheycouldkeepascientistonthestaffinterestedintheworkanddonotplantohireanypermanentscientificpersonnel.Nevertheless,NARAwillcontinuetobefacedwithdifficultissuesinvolvingthearchivingofscientificdata.Intheinterim,thecommitteesuggeststhatNARAshouldarrangefortemporarystaffassignmentsfromtheactivescientificranksofthefederalgovernmentonafrequentas-neededbasis.GiventhegreatchallengesthatNARAwillfacefromscientificdataandtheprovenabilityofotheragenciestoholdscientificallytrained
![Page 179: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/179.jpg)
Page59
personnelindatamanagementpositions,NARAshouldrethinkitspositionandconsidercreatingacadreofpermanentstaffwithscientificexpertise.
NARAalsomightconsidersettingupanin-housedatabasetotrackfederalholdings,especiallytoanticipateproblemswithdatasetshousedinotheragenciesthatmayeventuallyneedNARAprotectionorotherhelpfromNARA.Todothiseffectivelywouldrequireestablishingasetofcontactsinotheragencieswithpeoplewhounderstandthedatabasesintheagencycollections.
Thisbringsustotheneedforamoregenerallocatorfunction,or"directoryofdirectories,"fortheNSIRFederation'snetworkofnetworks.Archivesmustnotbeviewedormanagedasdatacemeteries,withonlyrareanddwindlingvisitsafterthedepositionofdata.Theprovisionofbroadaccesstodatamustbepartofarchivedesignandconstruction,andthussomesortofbroadlocatorismuchneeded.Thecommitteeisencouragedbytherecentinteragencyefforts,organizedbytheOfficeofManagementandBudget,todevelopaGovernmentInformationLocatorService.Nevertheless,thereisaneedforaNARA-maintaineddirectoryofarchiveddatawithinitsownsystem.ThisshouldincludearchivedrecordsmaintainedbyothergovernmentagenciesandfederallyfundedinstitutionsthatarerecognizedaspartofadistributedarchivesystemoverseenbroadlybyNARA.ThecommitteerecommendsthatNARAcollaboratewithotheragenciesthatmaintainlong-termcustodyofdatatodevelopaneffectiveaccessmechanismtothesedistributedarchives.Theinitialstepshouldfocusonlocatorsystemsandevolvetowardatransparentaccesssystem.
Finally,withregardtoitsrequirementsforaccessionofdata,NARAshouldworkwiththescientificcommunityandpotentialsourcesofscientificdatatodevelopadaptableperformancecriteriafordata
![Page 180: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/180.jpg)
formatsandmedia,ratherthanmandatingnarrowandinflexibleproductstandards.ThegoalwouldbetomeetNARA'sbasicneedtoensurelong-termusabilitywhilealsoenablingaccessionofdata,suchasimagesandstructures,thatcannotbeaccommodatedbyNARA'scurrentrestrictivefile-formatandmediastandards.
RecommendationsSpecificallyForNOAA
AsthelargestholderofearthsciencesdataintheUnitedStates,NOAAhasavastamountofscientificdatastoredatmanyfacilitiesacrossthecountry.TheprimarystoragesitesaretheNationalDataCenters,whichincludetheNationalClimaticDataCenter(NCDC),theNationalOceanographicDataCenter(NODC),andtheNationalGeophysicalDataCenter(NGDC).Eachofthesedatacentersnowhasitsownon-lineinformationservice.Thedatacentersareaccessiblethroughcommonnodes,forexamplethroughNOAA'swebserverorNASA'sMasterDirectoryserver.ThusauserwhounderstandsthestructureofNOAA'sdataholdingscannavigatethroughthedifferentdatacenters,lookfordataofinterestineachcenter'sholdings,andretrievethedataovertheInternet.However,itisnotpossibletosearchNOAA'sdataholdingswiththesameprecisionandaccuracywithwhichonecansearchforbibliographicdata,through,forexample,theCurrentContentsorINSPECdatabases.ThediversityandvolumeofdatathattheNationalDataCentersholdandregularlyreceivemakeitdifficulttoproduceanoveralldirectoryforallofNOAA'sdataholdings.Inparticular,NCDCreceivesdailyalloftheweatherinformationfortheUnitedStates.WithoutsuchageneraldirectoryitisdifficultforuserstoqueryacrossNOAAarchivestolocateandintegratediversedata.Moreover,oncetheuserfindsdata,thevarietyofstorageformatsanddatatypesmakesaccesscumbersome.Thus,thecommitteeencouragesNOAAtobeambitious.DevelopmentofanewcomprehensivedirectorycoveringallNOAA'sholdingsofgeosciencedatawouldsetthestandardforotheragenciesandwouldmakethedatamuchmoreaccessibletothe
![Page 181: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/181.jpg)
public.
Thisdirectorymayincorporatecapabilitiesofthemanydifferenton-linedirectoryservicescurrentlyinuseattheNationalDataCenters,buttheemphasisshouldbeonconnectivity,dataaccess,andinformation.Forthisreason,NOAAshouldconcentratefirstonthemorerecentdigitaldatathatcanmosteasilybeincorporatedintosuchadirectorysystem.Effortstogetolderanalogdatadigitizedshould
![Page 182: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/182.jpg)
Page60
continue,althoughsomedatamayhavetoremainintheiroriginalformat.Animportantfacetofthisdirectoryistolist,alongwiththedirectoryentry,howtolocateandaccessthedata.Oncetheyhavelocatedthedataofinterest,mostuserswantmainlytoretrievethedatainaformthattheycanuseforfurtheranalysis.
Thus,thedirectoryshouldspecifytheactuallocationofthedata,aswellasthemethodsbywhichthedatacanbeacquired.UnderthepresentNOAAsystem,acquisitioninvolvesaformalorderingprocedureandthetransferoffunds,atleastforanydatathatmustbetransferredviatapeorhardcopy.ExperimentalNOAAsystems(NOAA'sSatelliteActiveArchive)makeitpossibletoorderlimitedsatelliteimageryoverthenetworkatnocost.Forthoseordersrequiringthetransferoffunds,thedirectoryserviceshouldbeabletoestimatethecostofthedataordersothattheusercanfactorcostintothedecisiontoorder.
ThisinterconnectedNOAAdirectoryservicealsowouldassisttheNOAAdatacentersintheirmanagementofdata.ByhavingaccesstotoolsandtechniquesdevelopedatotherNOAAdatacentersandelsewhereinthedatastoragecommunity,theNOAAdatacenterswouldbebetterabletostayabreastofnewdevelopmentsandtoincorporatethemintotheirdataaccesssystems.SimilaritiesamongvariousearthsciencedataandtheemergingneedforinterdisciplinaryresearchmakeitnecessarytoimplementsuchanoveralldirectoryformanagingNOAAdata,forbothdatalocationandaccess.Asnotedearlier,NOAAalreadyhasstartedtodevelopdatadirectories,on-linedatasystems,anddataaccess.
NOAAandNASAhavemadeprogressindatarescueandinderivingbetterproductsfromolddata.Since1990,NCDChascopiedthousandsoftapesofsatellitedatathatwereattheendoftheirusefulshelflife.TheNOAA/NASAPathfinderprogramwasestablishedto
![Page 183: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/183.jpg)
makethesatellitedatamoregenerallyavailabletoresearchersandtocalculatenewproducts;ithasbeenaneffectiveprogram.Althoughthecommitteesupportsactivitiestopreserveolddata,rescueddata(includingdatamovedtobettermediaandanalogdatathathavebeendigitized)areoflittlevalueiftheycannotbeaccessedorretrieved.Thecommitteeadvocatesmoreemphasisonimprovingaccesstodataforinterestedusers.
Mostfederalagenciesarenowawarethatstorageandretrievalofdataareimportant.Problemsarisebecauseeachagency,andsometimesevendifferentpartsofthesameagency,setsupdatacentersandfacilities,andeachoftheseestablishesitsowntypeofsystem.Inaddition,becausethetechnologyforstoringdatachangesfrequently,itisdifficultifnotimpossibletodecidejustwhathardwareandsoftwaresystemshouldbeused.Thisuniquenessofsystemsoftenhinderssystemportabilityandtheexchangeofdataamongsystems.
Therearesomeapproachesandproceduresthataredesignedtobetechnology-independentandthereforecanbeusedtoavoidsomeoftheseproblems.Moreover,thetechnologicalandportabilityrequirementsforarchiving,storage,andtransmissionaredifferent,soa"universal"formatwillnotwork.Anarchivalformatmustbeutterlyportableandself-describing,ontheassumptionthat,apartfromthetranscriptiondevice,neitherthesoftwarenorthehardwarethatwrotethedatawillbeavailablewhenthedataareread.Astorageformatshouldbeoptimizedforretrievinganyaddressablesubsetofadataset.Asecondary,butimportant,considerationistheeasewithwhichthestorageformatmaybecastintoatransmissionformat.Atransmissionformatshouldbeoptimizedforeaseofconversiontootherformats,accommodationofbothdataandmetadatainasingledatastream,portability,andextensibility(i.e.,accommodatingdataandmetadatatypesandstructuresnotyetinvented).BecausebothNOAAandNARAhavealong-termarchivalproblem,thecommitteesuggeststhattheyworktogethertolocateandtesthardwareandsoftwareunits
![Page 184: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/184.jpg)
thatcanbeusedforthistechnology-independentapproach.Bylocatingthemostsimplecommontechnologies,itshouldbepossibletosetupsystemsthataresufficientlycapable,butyetareabletointeractwitheachother.Onceafewofthese"standards"aresetupandoperating,itislikelythatotheruserswillwanttorunthissuiteofsoftware.Ideally,thistypeofprojectwouldbebestcarriedoutundertheauspicesoftheNSIRFederation.
![Page 185: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/185.jpg)
Page61
Consideringtheforegoingdiscussion,thecommitteemakesthefollowingrecommendations:
NOAAshouldplaceahigherpriorityondocumentingandestablishingdirectoriesofitsdataholdings.
Furthermore,NOAA,withtheactivecooperationofNARA,shouldleadeffortstobetterdefinetechnology-independentstandardsforarchiving,storing,andtransmittingthedatawithinitspurview.
Finally,NOAA,aswellaseveryotherfederalscienceagency,shouldensurethatallitsdataaresharedandreadilyavailable;itfulfillsitsresponsibilityforqualitycontrol,metadatastructures,documentation,andcreationofdataproducts;itparticipatesinelectronicnetworksthatenableaccess,sharing,andtransferofdata;anditexpresslyincorporatesthelong-termviewinplanningandcarryingoutitsdatamanagementresponsibilities.
Thecreationofthecommittee'sproposedNSIRFederationwouldhelpprovideacollaborativemechanismandmoresustainedpeerpressuretomeettheseobjectives,andthusenhancethevalueofscientificandtechnicaldataandinformationresourcestothenation.
![Page 186: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/186.jpg)
Page62
ReferencesAmericanChemicalSociety(ACS).1993.ReportingExperimentalData,H.J.White(ed.),Washington,D.C.
Boorstin,D.J.1992.TheCreators,RandomHouse,NewYork.
CommitteeonEnvironmentandNaturalResources(CENR).1994.OurChangingPlanet:TheFY1995U.S.GlobalChangeResearchProgram,NationalScienceandTechnologyCouncil,Washington,D.C.
CommitteeonEnvironmentandNaturalResources(CENR).Inpress.TheU.S.GlobalChangeDataandInformationSystemDraftImplementationPlan,NationalScienceandTechnologyCouncil,Washington,D.C.
FederalGeographicDataCommittee(FGDC).1994.October1994FactSheet,FederalGeographicDataCommittee,Washington,D.C.
Gelsinger,P.P.,P.A.Gargini,G.H.Parker,andA.Y.C.Yu.1989.Microprocessorscirca2000,IEEESpectrum,October:43-47.
GeneralAccountingOffice(GAO).1990a.EnvironmentalData--MajorEffortIsNeededtoImproveNOAA'sDataManagementandArchiving,Washington,D.C.
GeneralAccountingOffice(GAO).1990b.SpaceOperations--NASAIsNotArchivingAllPotentiallyValuableData,Washington,D.C.
Haas,J.K.,H.W.Samuels,andB.T.Simmons.1985.AppraisingtheRecordsofModernScienceandTechnology:AGuide,MassachusettsInstituteofTechnology,Cambridge,Mass.
Handy,C.1992.BalancingCorporatePower:ANewFederalistPaper,
![Page 187: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/187.jpg)
HarvardBusinessReview70(6):59-72.
InformationInfrastructureTaskForce(IITF).1993.TheNationalInformationInfrastructure:AgendaforAction,Washington,D.C.
Jacobs,W.1947.Wartimedevelopmentsinappliedclimatology,MeteorologicalMonographs1(1),52pp.
Marshack,A.1985.HierarchicalEvolutionoftheHumanCapacity:ThePaleolithicEvidence,AmericanMuseumofNaturalHistory,NewYork.
NationalAcademyofPublicAdministration(NAPA).1991.TheArchivesoftheFuture:ArchivalStrategiesfortheTreatmentofElectronicDatabases,AreportfortheNationalArchivesandRecordsAdministration,Washington,D.C.
NationalAeronauticsandSpaceAdministration.1992.DraftGuidelinesforDevelopmentofaProjectDataManagementPlan(PDMP),NASAOfficeofSpaceScienceandApplications,Washington,D.C.
NationalResearchCouncil(NRC).1982.DataManagementandComputation--VolumeI:IssuesandRecommendations,SpaceScienceBoard,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1984.Solar-TerrestrialDataAccess,Distribution,andArchiving,SpaceScienceBoardandBoardonAtmosphericSciencesandClimate,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1986a.AtmosphericClimateData:ProblemsandPromises,BoardonAtmosphericSciencesandClimate,NationalAcademyPress,Washington,D.C.
![Page 188: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/188.jpg)
![Page 189: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/189.jpg)
Page63
NationalResearchCouncil(NRC).1986b.IssuesandRecommendationsAssociatedwithDistributedComputationandDataManagementSystemsfortheSpaceSciences,SpaceScienceBoard,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1988a.GeophysicalData:PolicyIssues,CommitteeonGeophysicalData,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1988b.SelectedIssuesinSpaceScienceDataManagementandComputation,SpaceScienceBoard,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1990.SpatialDataNeeds:TheFutureoftheNationalMappingProgram,BoardonEarthSciencesandResources,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1992a.SettingPrioritiesforSpaceResearch:OpportunitiesandImperatives,SpaceStudiesBoard,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1992b.TowardaCoordinatedSpatialDataInfrastructureforthenation,BoardonEarthSciencesandResources,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1993.1992ReviewoftheWorldDataCenter-AforRocketsandSatellites,NationalSpaceScienceDataCenter,BoardonEarthSciencesandResources,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1994.RealizingtheInformationFuture--TheInternetandBeyond,NRENAISSANCECommittee,ComputerScienceandTelecommunicationsBoard,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).1995.StudyontheLong-term
![Page 190: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/190.jpg)
RetentionofSelectedScientificandTechnicalRecordsoftheFederalGovernment:WorkingPapers,CommissiononPhysicalSciences,Mathematics,andApplications,NationalAcademyPress,Washington,D.C.
NationalResearchCouncil(NRC).Inpress.FindingtheForestintheTrees:TheChallengeofCombiningDiverseEnvironmentalData,U.S.NationalCommitteeforCODATA,NationalAcademyPress,Washington,D.C.
OfficeofManagementandBudget(OMB).1990.CoordinationofSurveying,Mapping,andRelatedDataActivities,CircularNo.A-16,Washington,D.C.
OfficeofManagementandBudget(OMB).1994.ManagementofFederalInformationResources,CircularNo.A-130(59F.R.37906,July25,1994),Washington,D.C.
OfficeofTechnologyAssessment(OTA).1994.RemotelySensedData:Technology,Management,andMarkets,OTA-ISS-604,GovernmentPrintingOffice,Washington,D.C.
Silberschatz,A.,M.Stonebreaker,andJ.Ullman.1991.Databasesystems:Achievementsandopportunities,CommunicationsoftheACM34(10):110-120.
![Page 191: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/191.jpg)
Page64
AppendixAListofAcronymsCD-ROM CompactDisk-ReadOnlyMemory
CENR CommitteeonEnvironmentandNaturalResourcesDMC DataManagementCenterDOD DepartmentofDefenseDOE DepartmentofEnergyEROS EarthResourcesObservingSystemESDM EarthScienceDataManagementFGDC FederalGeographicDataCommitteeFITS FlexibleImageTransportSystemGARP GlobalAtmosphericResearchProgramGCDIS GlobalChangeDataandInformationSystemGCRP GlobalChangeResearchProgramGILS GovernmentInformationLocatorServiceHTML HyperTextMarkupLanguageIRIS IncorporatedResearchInstitutionsforSeismologyIWGDMGCInteragencyWorkingGrouponDataManagementfor
GlobalChangeJANAF JointArmy-Navy-AirForceJCL JointControlLanguageNARA NationalArchivesandRecordsAdministrationNCDC NationalClimaticDataCenterNGDC NationalGeophysicalDataCenterNII NationalInformationInfrastructureNOAA NationalOceanicandAtmosphericAdministrationNODC NationalOceanographicDataCenterNRC NationalResearchCouncilNSDI NationalSpatialDataInfrastructureNSF NationalScienceFoundation
![Page 192: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/192.jpg)
NSF NationalScienceFoundationNSIR NationalScientificInformationResourceNSSDC NationalSpaceScienceDataCenterOMB OfficeofManagementandBudget
![Page 193: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/193.jpg)
Page65
PDS PlanetaryDataSystem
PO.DAAC PhysicalOceanographyDistributedActiveArchiveCenterSGML StandardGeneralizedMarkupLanguageTCP-IP TransmissionControlProtocol-InternetProtocolUSGS UnitedStatesGeologicalSurveyUSNRC UnitedStatesNuclearRegulatoryCommissionWWSSN World-WideStandardizedSeismographicNetwork
![Page 194: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/194.jpg)
Page66
AppendixBMinorityOpinionThisreporthasawealthofgoodmaterialinit,butIfeelthatImustwriteaminorityopinionononemainissue,thecommittee'srecommendationtocreatetheNSIRFederation.IthinkthattheexactfunctionsoftheNSIRFederationarestillnotclearenoughtoimmediatelyformit,especiallysincemechanismstocoordinatedataactivitiesalreadyexist.
AgroupsuchastheNSIRFederationwouldnotbeagoodmethodtosetthehardwarestandardsthatareusedindatasystems(networks,tapes,etc.).Thecoordinatedpartofdatadirectoryeffortscanbebuiltaroundpresentinteragencywork.ItisreasonablethatNARAshouldrequestlistsofdatasetsintendedforlong-termarchival,butmostoftheprocessofevaluatingdatasetsneedstobekeptclosetotheworkinglevel.Thediscussionofstandardizationinthereportshouldnotbeinterpretedtomeanthatallagenciesandarchivesshouldbeforcedtoadoptcertainstandardsandreworktheirdataholdingsintoacommonformandformat.Thereareotherconcernsforwhichananalysisoftheissuescouldbeuseful,butIbelievethattheNSIRFederationrequiresabetterdescriptionoftasksandmoredebatebeforesuchanewbodyisestablished.Otherwisewemayhavemorecoordination,moresystems,morecost,andlessdata.
Considertheimportanttaskofdevelopinginformationaboutdata.Informationaboutdatasetsisneededinatleasttwoorthreelevelsofdetail.Atthehighestlevelofinformation,theMasterDirectorymethodsthatareinplacefortheGCDIScanbeadopted(orevensimplifiedmore)todescribethedatasets.ThisinteragencyDirectoryInterchangeFormat(DIF)isusednationallyandinternationally.We
![Page 195: Preserving scientific data on our physical universe: a new strategy for archiving the nation's scientific information resources](https://reader036.fdocuments.net/reader036/viewer/2022081622/613c128d22e01a42d40e7bb2/html5/thumbnails/195.jpg)
needtokeepitsimpleenoughsothatpeoplewillsubmittheinformation.Someagency-levelcatalogeffortsfordatasetshaveexistedsinceabout1968,andbecamemoreseriousinthelate1970s.WeshouldbuildontheGCDIScatalogefforts,andcertainlynotinventmorecomplicatedsystems.Otherdatainformationeffortsareneeded,buttheywillbebasedonabottom-upflowofideas,onworkshops,andthelike.Eachdatasystemdoesnothavetodoexactlythesamething,buttheymustbeeasytouse.ItisnotclearthataformalNSIRFederationisneededtocoordinatethis.
HowdoestheNSIRFederationrelatetootherdatacoordinatingmechanisms?TheInteragencyWorkingGrouponDataManagementforGlobalChange(IWGDMGC)meetsregularlytohelpcoordinatedataissuesacrossmany"globalchange"disciplines,whichincludeair,water,ice,rocks,soils,andsomebiology.ItseemstomethattheIWGDMGCandtheproposedNSIRFederationaremainlytryingtodothesamething.Theycovermuchofthesameturfintermsofdisciplines.Theybothwantinformationaboutdata,accesstodata,anddatathatwillexistformorethan20years.Ifwecreateseparateorganizationsdoingroughlythesamething,thenitbecomesevenlesslikelythatkeyagency