Why quality control and quality assurance is important for the legacy of GEOTRACES through its...

Post on 24-May-2015

227 views 1 download

Tags:

Transcript of Why quality control and quality assurance is important for the legacy of GEOTRACES through its...

Why quality control and quality assurance is important for the legacy of GEOTRACES through its database?

Adam Leadbetter (alead@bodc.ac.uk), British Oceanographic Data Centre

Outline

- Data matter!

- Why compatible data?

- The Geotraces database

- A data intensive future…

1. Data Matter!

Presenter
Presentation Notes
Some quotes to set the context for my talk

Data matter!

“A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.”

Santiago Ramón y Cajal(Nobel Prize winner,1906) in

Advice to a Young Investigator (1897)

Data matter!

“You are not finished until you have done the research, published the results, and published the data, receiving formal credit for everything.

Preserve or Perish”Mark Parsons

US National Snow and Ice Data CenterData Management for the International Polar Year (2006)

2. Why compatible data?

Why compatible data?

“If HTML and the [World Wide] Web made all the online documents look like one huge book, [compatibility] will make all the data in the world look like one huge database.”

Sir Tim Berners-LeeW3C

Weaving the Web (1999)

Why compatible data?

Why compatible data?

- The Linked Data cloud is built on compatible data

- Similarly, Geotraces db builds on compatible data

- How?

Why compatible data?

- Intercalibration for QC / QA

- Only on the legacy database

- A distinction must be made where IC has not happened

- May be older “compliant data” which does not meet standards

Why compatible data?

- Standards

- Metadata- Bottle – type & make- Filter – type & make- Analytical method

- Parameter codes

- Allows data merging & long-term data archiving

Presenter
Presentation Notes
Geotraces set metadata standards. Based on BODC’s standards – feed in to European / global standards (SeaDataNet / ISO / INSPIRE). Enough metadata to load into system – who, what, when, where, why (project / cruise reports) QC / QA – means you’ve met the standard. Data screened. More detailed in sections below

Why compatible data?

- Merging

- Allows easy management of “crossover stations”

- Marked as “fixed stations” in the db

- Enables comparison of data between cruises

Why compatible data?

- Mantra

“To make the data accessible and usable in 5, 10, 30… years timewithout the need to contact the

data originator.”

Presenter
Presentation Notes
Does not take the credit away from the originator. But, on a bad day I can’t remember what I was doing that morning… Ultimate goal of QA / QC-ing data that are submitted.

3. The GeoTraces database

http://www.bodc.ac.uk/geotraces/

Presenter
Presentation Notes
Global coverage Yellow = Cruises which have happened Black = Cruises from IPY years Red = Planned sections

The GeoTraces database

- Key parametersTrace elementsStable isotopesRadioactive isotopesRadiogenic isotopesOthers to allow future work to be done

- Supporting parametersSalinity, Temperature, O2, nutrients

http://www.bodc.ac.uk/geotraces/

Presenter
Presentation Notes
Trace elements – e.g.: Fe Essential micronutrient, Mn Tracer of Fe inputs and redox cycling Stable isotopes – delta15N, delta13C Radioactive isotopes – 230Th, 231 Pa Radiogenic isotopes – Pb, Nd Particles / Aerosols Nutrients - nitrate, phosphate, silicic acid

The GeoTraces database

- 2014: Intermediate data product

- It will only include- Submitted data (get your data in by 2013)- Intercalibrated data- Data passed by the IC committee

Presenter
Presentation Notes
Data product lead by Reiner Schlitzer @ AWI

The GeoTraces database

- 2014: Intermediate data product

- It will only include- Submitted data (get your data in by 2013)- Intercalibrated data- Data passed by the IC committee

Presenter
Presentation Notes
DOIs come back to the Parsons quote. The full data lifecycle is achieved… Data publications: e.g. ESSD; RMetS/Wiley GeoScience Data Journal; Data letters in G3 (Geochemistry, Geophysics, Geosystems).

4. A data intensive future

Presenter
Presentation Notes
A few thoughts on the future - Many ideas borrowed from Fox (RPI) and Diviacco (OGS, Trieste)

A data intensive future

“We know more than we can tell.”

Michael PolanyiFellow of the Royal SocietyThe Tacit Dimension (1967)

Presenter
Presentation Notes
So how can we tell more? We have to define where data fit into our scientific lives. And may be even examine the way in which we conduct science.

A data intensive future

Data Information Knowledge

Producers Consumers

Context

PresentationOrganization

IntegrationConversation

CreationGathering

Experience

A data intensive future

Observation

Pattern

Tentative hyp.

Theory

Induction

A data intensive future

Observation

Pattern

Tentative hyp.

Theory

InductionTheory

Hypothesis

Observation

Confirmation

Deduction

A data intensive future

Is a method of logical inference introduced by C. S. Peirce which comes prior to induction and deduction for which the colloquial name is to have a "hunch”

Abduction

A data intensive future

Is a method of logical inference introduced by C. S. Peirce which comes prior to induction and deduction for which the colloquial name is to have a "hunch”

Abduction

• Starts when an inquirer considers of a set of seemingly unrelated facts

• armed with an intuition that they are somehow connected and …

• But data intensive!!• And this can be a job for visualization!!!

Presenter
Presentation Notes
Supported by comparable, compatible data. GeoTraces project database is a perfect platform for abductive reasoning

Conclusions

- Data matter – and increasingly so!

- The GeoTraces data assembly centre aids in making data compatible

- The GeoTraces database will be a big legacy

- Who knows how it may end up being used?

Conclusions

- Low quality data have higher costs- High quality data require communication- Need a planned QA & QC strategy- Investment in training- Best practices- Use appropriate tooling- Extensive metadata to prevent “data entropy”

Robinson, Meyer & Lenhardt (2012). Eos 93(19), 189

Thank you

alead@bodc.ac.uk, @AdamLeadbetter