Quantifying uncertainty in the UK carbon flux

41
29 May 2008 IMA Scottish Branch 1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield

description

Quantifying uncertainty in the UK carbon flux. Tony O’Hagan University of Sheffield. Outline. The carbon flux problem Quantifying input uncertainties Propagating uncertainty Results. Computer models. - PowerPoint PPT Presentation

Transcript of Quantifying uncertainty in the UK carbon flux

Page 1: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 1

Quantifying uncertainty in the UK carbon flux

Tony O’HaganUniversity of Sheffield

Page 2: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 2

Outline

The carbon flux problem

Quantifying input uncertainties

Propagating uncertainty

Results

Page 3: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 3

Computer models

In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real-world processes

For understanding, prediction, control

There is a growing realisation of the importance of uncertainty in model predictions

Can we trust them?Without any quantification of output uncertainty, it’s easy to dismiss them

Page 4: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 4

Examples

Climate prediction

Molecular dynamics

Nuclear waste disposal

Oil fields

Engineering design

Hydrology

Page 5: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 5

Carbon flux

Vegetation can be a major factor in mitigating the increase of CO2 in the atmosphere

And hence reducing the greenhouse effect

Through photosynthesis, plants take atmospheric CO2

Carbon builds new plant material and O2 is released

But some CO2 is released again Respiration, death and decay

The net reduction of CO2 is called Net Biosphere Production (NBP)

I will refer to it as the carbon flux

Complex processes modelled in SDGVMSheffield Global Dynamic Vegetation Model

Page 6: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 6

CTCD

The Centre for Terrestrial Carbon Dynamics was a NERC Centre of Excellence

Now part of National Centre for Earth Observation

One major exercise was to estimate the carbon flux from vegetation in England and Wales in 2000

SDGVM run at each of 707 pixels over England & Wales4 plant functional types (PFTs)Principal output is NBPMany inputs

Page 7: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 7

SDGVM C flux outputs for 2000

Map of SDGVM estimatesshows positive flux (C sink)in North, but negative(C source) in Midlands

Total estimated flux is9.06 Mt C

Highly dependent onweather, so will varygreatly between years

Page 8: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 8

Accounting for uncertainty

There are several sources of uncertaintyUncertain inputs

PFT parameters, defining plant growth etc

Soil structure

Land cover types

Weather

Model structureAll models are wrong!

Two main challengesFormally quantifying these uncertainties

Propagating input uncertainty through the model

Page 9: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 9

Progress to date

A paper dealing with uncertainty in plant functional inputs and soil inputs

Kennedy, O'Hagan, Anderson et al (2008). Quantifying uncertainty in the biospheric carbon flux for England and Wales. J. Royal Statistical Society A 171, 109--135.

A paper showing how to quantify uncertainty in land cover

Cripps, O'Hagan, Quaife and Anderson (2008). Modelling uncertainty in satellite derived land cover maps. http://tonyohagan.co.uk/academic/pub.html

Recent work combines these

Still need to account for uncertainty in weather and model structure

Page 10: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 10

Quantifying input uncertainties

Plant functional type parametersExpert elicitation

Soil compositionSimple analysis from extensive data

Land coverMore complex analysis of ‘confusion matrix’ data

Page 11: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 11

Elicitation

Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters

Four PFTs – Deciduous broadleaf (DBL), evergreen needleleaf (ENL), crops, grass

Many parameters for each PFTKey ones identified by preliminary sensitivity analysis

Important to allow for uncertainty about mix of species in a pixel and role of parameter in the model

In the case of leaf life span for ENL, this was more complex

Page 12: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 12

ENL leaf life span

Page 13: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 13

Correlations

PFT parameter value at one site may differ from its value in another

Because of variation in species mix

Common uncertainty about average over all species induces correlation

Elicit beliefs about average over whole UKENL joint distributions are mixtures of 25 components, with correlation both between and within years

Page 14: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 14

Soil composition

Percentages of sand, clay and silt, plus bulk density

Soil map available at high resolution

Multiple values in each SDGVM siteUsed to form average (central estimate)

And to assess uncertainty (variance)

Augmented to allow for uncertainty in original data (expert judgement)

Assumed independent between pixels

Page 15: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 15

Land cover map

LCM2000 is another high resolution mapObtained from satellite imagesVegetation in each pixel assigned to one of 26 classesAggregated to give proportions of each PFT at each siteBut data are uncertain

Field data are available at a sample of pixelsCountryside Survey 2000Table of CS2000 class versus LCM2000 class is called the confusion matrix

Page 16: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 16

CS2000 versus LCM2000 matrix

Not symmetric

Rather small numbers

LCM2000

CS2000

DBL ENL Grass Crop Bare

DBL 66 3 19 4 5

Enl 8 20 1 0 0

Grass 31 5 356 22 15

Crop 7 1 41 289 9

Bare 2 0 3 8 81

Page 17: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 17

Modelling land cover

The matrix tells us about the probability distribution of LCM2000 class given the true (CS2000) class

Subject to sampling errors

But we need the probability distribution of true PFT given observed PFT

Posterior probabilities as opposed to likelihoodsWe need a prior distribution for land coverWe used observations in a neighbourhood

Implicitly assuming an underlying smooth random field

And the confusion matrix says nothing about spatial correlation of LCM2000 errors

We again relied on expert judgementUsing a notional equivalent number of independent pixels per site

Page 18: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 18

Overall proportions

Red lines show LCM2000 proportionsClear overall biases

Analysis gives estimates for all PFTs in each SDGVM siteWith variances and correlations

Page 19: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 19

Propagating uncertainty

Uncertainty analysisProblems with simple Monte Carlo approach

EmulationGaussian process emulation

The MUCM project

Page 20: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 20

Uncertainty analysis

We have a computer model that produces output y = f (x) when given input x

But for a particular application we do not know x precisely

So X is a random variable, and so therefore is Y = f (X )

We are interested in the uncertainty distribution of Y

How can we compute it?

Page 21: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 21

Monte CarloThe usual approach is Monte Carlo

Sample values of x from its distributionRun the model for all these values to produce sample values yi = f (xi)These are a sample from the uncertainty distribution of Y

Typically requires thousands of samples of input parameters

And in this case we would need to run SDGVM 4x707 times for each sample!

Neat but impractical if it takes minutes or hours to run the model

Page 22: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 22

Emulation

A computer model encodes a function, that takes inputs and produces outputs

An emulator is a statistical approximation of that function

Estimates what outputs would be obtained from given inputs

With statistical measure of estimation error

Given enough training data, estimation error variance can be made small

Page 23: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 23

So what?

A good emulator estimates the model output accurately

with small uncertainty

and runs “instantly”

So we can do uncertainty analysis etc fast and efficiently

Conceptually, weuse model runs to learn about the function

then derive any desired properties of the model

Page 24: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 24

GP solution

Treat f (.) as an unknown function with Gaussian process (GP) prior distribution

Use available runs as observations without error, to derive posterior distribution (also GP)

Make inference about the uncertainty distributionE.g. The mean of Y is the integral of f (x) with respect to the distribution of X

Its posterior distribution is normal conditional on GP parameters

Page 25: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 25

Why GP emulation?

Simple regression models can be thought of as emulators

But error estimates are invalid

We use Gaussian process emulationNonparametric, so can fit any function

Error measures can be validated

Analytically tractable, so can often do uncertainty analysis etc analytically

Highly efficient when many inputs

Reproduces training data correctly

Page 26: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 26

2 code runs

Consider one input and one output

Emulator estimate interpolates data

Emulator uncertainty grows between data points

Page 27: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 27

3 code runs

Adding another point changes estimate and reduces uncertainty

Page 28: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 28

5 code runs

And so on

Page 29: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 29

BACCO

This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models

All based on building the GP emulator of the model from a set of training runs

This area is now known as BACCOBayesian Analysis of Computer Code Output

Page 30: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 30

BACCO includes

Uncertainty analysis

Sensitivity analysis

Calibration

Data assimilation

Model validation

Optimisation

Etc…

All within a single coherent framework

Page 31: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 31

MUCM

Managing Uncertainty in Complex ModelsLarge 4-year research grant

Started in June 2006

7 postdoctoral research assistants

4 PhD studentships

Based in Sheffield, Durham, Aston, Southampton, LSE

Objective: to develop BACCO methods into a robust technology that is widely applicable across the spectrum of modelling applications

Page 32: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 32

Emulation of SDGVM

We built GP emulators of all 4 PFTs at 30 of the 707 sites

Estimates (posterior means) and uncertainties (variances and covariances) inter-/extrapolated to the other sites by kriging

Uncertainty due to both emulation and kriging separately accounted for

Page 33: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 33

Sensitivity analysis for one site/PFT

Used to identify the most important inputs. These are the ones we needed to formulate uncertainty about carefully.

Page 34: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 34

Results

Page 35: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 35

Mean NBP corrections

Page 36: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 36

NBP standard deviations

Page 37: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 37

Aggregate across 4 PFTs

Mean NBP Standard deviation

Page 38: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 38

England & Wales aggregate

PFTPlug-in estimate

(Mt C)Mean(Mt C)

Variance (Mt C2)

Grass 5.28 4.37 0.2453

Crop 0.85 0.43 0.0327

Deciduous 2.13 1.80 0.0221

Evergreen 0.80 0.86 0.0048

Covariances -0.0081

Total 9.06 7.46 0.2968

Page 39: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 39

Sources of uncertainty

The total variance of 0.2968 is made up as follows

Variance due to PFT and soil inputs = 0.2642

Variance due to land cover uncertainty = 0.0105

Variance due to interpolation/emulation = 0.0222

Land cover uncertainty much larger for individual PFT contributions

Dominates for ENL

But overall tends to cancel out

Page 40: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 40

Conclusions

Bayesian methods offer a powerful basis for computation of uncertainties in model predictionsAnalysis of E&W aggregate NBP in 2000

Good case study for uncertainty and sensitivity analyses

But needs to take account of more sources of uncertainty

Involved several technical extensionsHas important implications for our understanding of C fluxesPolicy implications

Page 41: Quantifying uncertainty in the UK carbon flux

29 May 2008 IMA Scottish Branch 41

Finally

This was joint work with many othersPlant, soil and earth observation – Shaun Quegan, Ian Woodward, Mark Lomas, Tristan Quaife, Andreas Heinemeyer, Phil Ineson

Statistics – Marc Kennedy, John Paul Gosling, Ed Cripps, Keith Harris, Clive Anderson

Linkshttp://www.shef.ac.uk/ctcd

http://mucm.group.shef.ac.uk

http://tonyohagan.co.uk/academic