© University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of...

21
9 September 2008 © University of Reading 2008 www.reading.ac.uk Reading e-Science Centre Harmonization of environmental data using the Climate Science Modelling Language Jon Blower, Alastair Gemmell (Reading e-Science Centre) Andrew Woolf, Dominic Lowe, Arif Shaon (STFC e-Science Centre) Stephen Pascoe (British Atmospheric Data Centre) Keiran Millard, Quillon Harphem (HR Wallingford)

Transcript of © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of...

Page 1: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

9 September 2008

© University of Reading 2008 www.reading.ac.uk

Reading e-Science Centre

Harmonization of environmental data using the Climate Science Modelling LanguageJon Blower, Alastair Gemmell (Reading e-Science Centre)Andrew Woolf, Dominic Lowe, Arif Shaon (STFC e-Science Centre)Stephen Pascoe (British Atmospheric Data Centre)Keiran Millard, Quillon Harphem (HR Wallingford)

Page 2: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

We need to integrate and comparelots of different types of data…

Page 3: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

SSM/I HadCM3

HiGEMERA-40

Satellite

Re-analysis product

Low res. Climate GCM

HadCM3

Hi-res Climate GCM, New physics

Putt, Gurney and Haines

…for validating numerical models…

Page 4: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

… calibrating instruments …

+ =

Page 5: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

…data assimilation…

Black line: control run

time

Green stars: observationsRed line: assimilation run

Page 6: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Flood prediction

... and making predictions

Search and rescue

Climate prediction

Page 7: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Where we are now (mostly)

Separate websites for

each data provider

Page 8: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

The need for harmonization

• Each community has evolved its own means for presenting data:– File formats– Metadata conventions– Coordinate systems

• These are not usually mutually compatible

• … and vital metadata can be missing

• No widely-accepted standards exist for certain types of data

• Hence scientists spend lots of time dealing with low-level technical issues

• Need a common view onto all these datasets

Page 9: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Open Geospatial standards

• Aim to describe all geographic data

• XML encoding– Geography Markup Language

• Web Services for data exchange

• Rooted in international standards

• Mandated by European INSPIRE directive

• But fiendishly complex• Evolved from map-oriented

systems– Vertical and temporal

information not handled cleanly

Page 10: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Bridging the gap: CSML• Climate Science Modelling Language

– Abstract data model defined using ISO/OGC approach

– XML encoding based upon GML

• Adapts open geospatial standards to environmental science data– “Best of both worlds”

• Wraps existing data– Doesn’t expect providers to convert data

• Data are seen as geographical “features”, not as a file system

Page 11: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Selected CSML Feature Types

PointSeriesFeature

(timeseries at a point)

ProfileFeature

(vertical profile at a point)

GridSeriesFeature

(series of multidimensional grids)

SwathFeature

(single satellite sweep)

SectionFeature

(vertical section)

Feature Types are classified by their geometry

Page 12: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Harmonizing 2 databases using CSML

• Different data providers, different internal representation– Met Office “MIDAS” dataset– “Environmental Change

Network” dataset

• Modelled both databases as collections of CSML PointSeriesFeatures

• Allowed sharing of plotting and analysis tools– CSML-XML documents

converted to maps, plots and KML

• Intermediate step via XML not necessary in ideal world

Page 13: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Java-CSML• Need reusable libraries to

apply CSML more widely• Aim is to reduce cost of

developing data-driven applications

• Interoperates with other means of modelling data in Java:– GeoAPI, Common Data

Model

• High-level analysis/visualization routines completely decoupled from low-level data access

Page 14: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Java-CSML: Design attempts

1. Transform CSML’s XML schema to Java code using automated tool• Led to very deeply-nested code

2. Based upon OGC-sponsored GeoAPI• Incomprehensible unless very familiar with ISO

standards• GeoAPI is a moving target

3. Based on well-known Java concepts• Accessible to “typical” Java programmer• Compatibility with other data models assured

through wrappers• Insulated against inevitable changes to standards• More code needs to be written by Java-CSML

designers• Less code needs to be written by users

Page 15: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Java-CSML Application 1:Coastal oceanography decision support system

Red line: Smartbuoy dataBlue dots: model output

Page 16: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Behind the scenes

Smartbuoys(via Web Feature Service)

Physical model(via NetCDF files)

Biological model(via OPeNDAP server)

Java-CSMLwrappers

Java-CSMLPlotting routines

Page 17: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Java-CSML Application 2:Atmospheric ozone

Control run

Assimilation run

Page 18: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Specializing CSML Features

• A generic data model can’t encode all possible metadata without becoming extremely complex

• In CSML generic feature types can be specialized– cf. object-oriented

inheritance

• Hence core data model retains simplicity

ProfileFeature

ArgoProfileFeatureint qualityFlag

Page 19: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Java-CSML Application 3:Ocean data assimilation

ArgoProfileFeatureProfileFeature

Red lines: Argo dataBlue lines: model output

Page 20: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Summary• CSML bridges gap between bottom-up (science) and top-

down (GIS) approaches to modelling data– Wraps existing data holdings

• Data modelled as Feature Types distinguished by geometry and “sensible plotting”– Complexity managed through feature inheritance

• Doesn’t attempt to model everything!– Other technologies deal with discovery, provenance,

security…

• Java-CSML framework allows data intercomparison applications to be built quickly– Automates tedious and error-prone tasks

Page 21: © University of Reading 2008 Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.

Wider lessons• “Interoperable” data formats not necessarily

suitable for storage– Because no single data model can satisfy every

application– Abstraction usually leads to data loss!

• Trade-offs between scope and complexity– Don’t attempt to put everything in one specification

• Symbiotic relationship between standards, tools and applications– Must be developed in parallel