Aspects of Reproducibility in Earth Science
-
Upload
raul-palma -
Category
Science
-
view
161 -
download
0
Transcript of Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science – ongoing workRaul PalmaPoznan Supercomputing and Networking Center, PolandDagstuhl seminar: Reproducibility of Data-Oriented Experiments in e-ScienceJanuary, 2016
Context
• Project ID: 674907• Project Type: RIA• Start Date: 01.10.2015• Duration: 36 Months• Website: TBC
• Maximum Grant Amount: 6,649,002 €• Total funded effort in person/months: 663• Coordinator: European Space Agency• Contact Person: Mirko Albani (ESA)
EVEREST Consortium
Key objectivesEstablish a VRE e-infrastructure for Earth Science
addressing the needs of different ES communities to facilitate their collaborative working and research
Discover, access, assess and process existing and new heterogeneous ES datasets and preserved knowledge held by distributed data centres
Share data, models, algorithms, scientific results and their own experiences within a community or across communities
Capture, annotate and store the workflows, processes and results from their research activities;
Ensure the long-term sustainability and preservation of data, models, workflows, tools and services developed by existing communities
Validate the VRE with four main Virtual Research Communities Sea Monitoring VRC Natural Hazards VRC (floods, geological, weather, wildfires) Land Monitoring VRC Supersites VRC (volcanoes and seismic)
Key objectivesDefine, implement and validate the Research Objects (RO) concepts and technologies within the ES context as the mean for sharing information and establish more effective collaboration in the VRE
Reproducibility aspects
Earth Science Research and Information Lifecycle (high level story)
Experimental Science (to compare)
Experiment Results (data)
Scientific Interpretatio
n
BackgroundHypothesis
AssumptionsInput data
Method
PublicationResults(Data)
Contribution to Science Communicatecontribution to the community
Contribution to Research Community
Peer review: “Are these novel findings? Was the method sound?”
Reader:“I trust that this method is sound.”
Reuse (incremental)
Supersite Science - ES VRC (more concrete story)Historical science mostly based on
past observations, as opposed to experimental science
Testing of hypothesis is not normally the main activity
Main activities of the VRC: measure geophysical parameters in the natural
environment, derive information on the effects of the phenomena and processes, model this information to generate space/time representations of
geophysical phenomena, provide these representations to risk management stakeholders, use the information to develop theories or confirm hypotheses
Supersite VRC operational scenario
In situ data providers (normally local monitoring agencies) provide open access to their data collections (with a data policy), including raw and processed data
Space agencies acquire and distribute satellite EO data (personal licenses to sign)
Authorized scientists should be able to access and display the data online, process them using community tools, validate the results, model the validated data, generate research products and build consensus on scientific information for end-users
Authorized end-users (local) should be able to access the scientific information online and provide feedback
The general public should be able to browse part of the data, the published results, part of the scientific information provided to users (if the latter authorize disclosure)
With a Supersite agreement in place:
Research Objects in Supersite VRCCurrent main use scenariosDocumentation/communicationReproducibility of scientific results
Research Objects in Supersite VRC
Document best practices (WFs, analysis methods, monitoring methods, etc.)
Training purposesProvide long term preservation of scientific knowledge
(how data are analyzed, how results are validated, etc.)Provide long term preservation of end-user stories
(demonstrating scientist-end-user interactions)Public disseminationProvide good management of intellectual property,
through licensing and PID/DOI, to allow fast work recognitionOthers tbd
Documentation/communication
Research Objects in Supersite VRC
Execute “standard” WFs for data analysis/modelling.
validating results generate “standard” products (e.g. deformation maps) as
mass products training
Testing algorithms and data, either modifying the WF to execute new analysis
methods/models on the same dataset, or executing the original WF on different Supersites
datasets
Others tbd
Reproducibility of scientific results
Some issues in reproducibility The VRC is not (yet) using formalized WFs. Their use, and the use of
ROs, must be promoted through a simple, incremental approach. The data access may be tricky, since their formats and metadata could
depend on the Supersite. Some datasets (and most results) are not maintained by external sources and
should be stored in the VRE (and exported as web services to the outside).
WFs reproducibility can be a problem, since they could use a mix of COTS and scientific SW, with licensing, HW compatibility, and computational resources issues. They do not use web processing services at present.
WFs are rarely fully automated. Some may require considerable manual intervention. Some other use a trial and error procedure, during repeated execution one could
discard some data or choose different parameters. In general some internal WF decisions may be based on expert judgment and
should be documented.
Research Object example
RO example for the Supersite VRC
Ground deformation mapping is a typical use case for this VRC. It may be carried out by different researchers on different volcanoes
or even on the same volcano.
It normally consists of two consecutive WFs: the analysis of a multitemporal InSAR image dataset to calculate
ground displacement time series the validation of the results by comparison with other data or
results.
RO for Volcano deformation mapping
RO example for the Supersite VRC
The main engine of the WF is the analysis SW (COTS): SarScape, which requires IDL. Other scientists may be more comfortable using other SW, or even using
remote processing services (as those provided by the GEP).
Input data are normally accessed through remote web services: ESA Virtual Archive, Sentinel Hub, DLR Supersite portal, ASI Data Gateway.
Validation data (GPS time series, previous deformation data, levelling data) are not always provided as a service.
Output results must be placed in the VRC database, and exported as web services. They are subsequently used by other scientists during a consensus
process to generate a final product for the End-users.
RO for Volcano deformation mapping