2013 Workshop on Quantitative Evaluation of Downscaled Data

Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5

“Perfect Model” Experiments: Testing Stationarity in Statistical

Downscaling (NCPP Protocol 2)

IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS?

1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma

2013 Workshop on QuantitativeEvaluation of Downscaled Data

Goals of this presentation1. Define the ‘stationarity assumption’ inherent to

statistical downscaling of future climate projections.2. Present our ‘perfect model’ (aka ‘big brother’) approach

to quantitatively assess the extent to which the stationarity assumption holds.

3. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…)

4. Introduce options to extend and supplement the method (setting the hurdle at different heights).

5. Invite statistical downscalers to consider testing their methods within the perfect model framework.

6. Garner feedback from workshop participants.

This project does not produce any data files that one would use in real world applications.

DATA INFO KNOW-LEDGE

WISDOM

The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made.

REAL WORLD APPLICATION:

? ? ?

Start with 3 types of data sets … lacking observations of the futureCompute transform functions in training step (1 example below) Produce downscaled refinements of GCM (historical and/or future)Skill: Compare Obs to SD historical output (e.g., cross validation)Cannot evaluate future skill -- left assuming transform functionsapply equally well to past & future -- “The Stationarity Assumption”

TARGET

PREDICTORS

“PERFECT MODEL” EXPERIMENTAL DESIGN:Start with 4 types of data sets – Hi-res GCM output as proxy for obs& coarsened version of Hi-res GCM output as proxy for usual GCM

? ? ?

Proxy for observations in real-world applicationProxy for GCM output in real-world application

Daily time resolution~25km grid spacing

194 x 114 grid22k pts

~200km grid spacing64:1

~64 of the smaller25km resolution grid cells fit withinthe coarser cell

Daily time resolution~25km grid spacing

194 x 114 grid

Follow the same interpolation/regriddingsequence to produce coarsened data sets for the future climate projections

“PERFECT MODEL” EXPERIMENTAL DESIGN:Compute transform functions in training step (1 example below) Produce downscaled output for historical and future time periods Can directly evaluate skill both for the historical period and the future using the Hi Res GCM output as “truth” – Test Stationarity

QuantitativeTests of Stationarity: The extent to which

SKILL computed for the FUTURE CLIMATE PROJECTIONS is diminished relative to the

SKILL computed for the HISTORICAL PERIOD

GFDL-HiRAM-C360 experimentsAll data used in the perfect model testswere derived from GFDL-HiRAM-C360 (C360) model simulations conducted following CMIP5 high resolution time slice experiment protocols.

We consider a pair of 30-year long “historical”experiments (1979-2008), labeled H2 and H2here…

For the period 2086-2095, we make useof a pair of 3 member ensembles -- a total of six 10-year long experiments, identified as the “C” and “E” ensemblesto the right (C warms more than E)...

Same GCM used to generate 2 sets of future projections (3 members each)1979-2008 model climatology

2086-2095 projection “C” (3x10yr)

2086-2095 projection “E” (3x10yr)

"E" Regional Warming [K]

"C" Regional Warming [K]

0 1 2 3 4 5 6 7 8

4.1

6.2

5.0

7.2

Land All pts

Q: How High a Hurdle does this Perfect Model approach present?

A: Varies geographically,by variable of interest,time period of interest, etc.

Coarsened data isslightly cooler withSlightly smaller std dev.… not too challenging

August Daily Max Temperaturesfor a point in Oklahoma

(2.5C histogram bins)

approx. +7Cmean warming

August Daily Max Temperaturesfor a point in Oklahoma


August Daily Max Temperaturesfor a point just NE of San Fancisco

Coarsened data ismore ‘maritime’ &HiRes target is more ‘continental’.

Presents more of achallenge during thehistorical period than the Oklahoma example.


August Daily Max Temperaturesfor a point just NE of San Fancisco


Note that in several locations there is a tendency for ARRM cross-validated downscaled output to have too low standard deviations just offshore and too high std. dev. over coastal land

Map produced by NCPP Evaluation Environment ‘Machinery’

NEXT: A sampling of results…

Intended to be illustrative…not exhaustive, nor systematic.

Anne Stoner (TTU) will show more.

Area Mean Time Mean Absolute Downscaling ErrorsΣ | (Downscaled Estimate – HiRes GCM) |

(NumDays)

1979-2008 "C" 2086-2095 "E" 2086-20950.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

0.63

0.900.760.82

1.14

0.98

All Pts Land

+40% +20%

Start with a summary: ARRM downscaling errors are larger for daily max temp at end of 21stC than for 1979-2008

Mean Absolute downscaling Error (MAE) during 1979-2008Σ | (Downscaled Estimate – HiRes GCM) |

(60 * 365)

1979-2008

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: Downscaling MAE for 1979-2008

Mean absolute downscaling error during 2086-2095“E” Projections (3 member ensemble)

"E" 2086-2095

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: MAE pattern for “E” projections (+5,4C)

Mean absolute downscaling error during 2086-2095“C” Projections (3 member ensemble)

"C" 2086-2095

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: MAE pattern for “C” projections (+7,6C)

Mean Climate Change Signal Difference (ARRM minus Target) “C” Projection

Geographic Variations: Bias pattern for “C” projections (+7,6C)

J F M A M J J A S O N D

Red = “C” Blue=“E”

J F M A M J J A S O N DEntire US48 Domain

Looking at how well the stationarity assumption holds in different seasons

Where and when ratio=1.0, the stationarity assumption fully holds (i.e., no degradation in mean absolute downscaling error during 2085-2095 vs. the 1979-2008 period using in training.)

4

3

2

1

tasmax

…comparing near-coastal land vs. interior


J F M A M J J A S O N DBlue= coastal SC-CSCBlack = full US48+ domainGreen = interior SC-CSC


“C” ensemble

4

3

2

1

“C” ensembletasmax

A clear intra-month MAE trend in some but not all months


“sawtooth” has lower values in cooler part of month, higher in warmer part of month

“C” ensembletasmax

Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity

More SDMethods

More ClimateVariables & Indices

Different HiResClimate Models

Different Emissions Scenarios & Times

Use of SyntheticTime Series

Alter GCM-baseddata in known waysto ‘Raise the Bar’

Mix & Match GCMs’Target & Predictors

?? MoreIdeas ??

GFDL-HiRAM-C360 experimentsData sets used in the perfect model tests are beingmade available to the community. This enables others to use our exp design to test the stationarity assumption for their own SD method.Also, we invite SD developers to consider collaborating with us, so that their techniques &perspectives may be more fully incorporated into this evolving perfect model-based research & assessment system.For more info, visit www.gfdl.noaa.gov/esd_eval &www.earthsystemcog.org/projects/gfdl-perfectmodel

In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method.

from us

from us

In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method.

from us

SD method

of choice

from us

Obviously, one does not need to perform tests using all 22,000+ grid points. Below we indicate the locations of

a set of 16 points we’ve used for some development work.(color areas are 3x3 with grid point of interest in the center)

Anne Stoner and Katharine Hayhoe with more ‘perfect model’ analyses…

Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5

“Perfect Model” Experiments: Testing Stationarity in Statistical

Downscaling (NCPP Protocol 2)

IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS?

1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma

2013 Workshop on QuantitativeEvaluation of Downscaled Data

“PERFECT MODEL” EXPERIMENTAL DESIGN:Compute transform functions in training step (1 example below) Produce downscaled output for historical and future time periods Can directly evaluate skill both for the historical period and the future using the Hi Res GCM output as “truth” – Test Stationarity

Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity

More SDMethods

More ClimateVariables & Indices

Different HiResClimate Models

Different Emissions Scenarios & Times

Use of SyntheticTime Series

Alter GCM-baseddata in known waysto ‘Raise the Bar’

Mix & Match GCMs’Target & Predictors

?? MoreIdeas ??

This project does not produce any data files that one would use in real world applications.

DATA INFO KNOW-LEDGE WISDOM

The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made.

Odds-n-ends from day 1* Not all SD codes run on a cell phone* Not All GCMs have 200km atmos grid (50km)* GCMs are developed primarily as research tools – not prediction tools

Goals of this presentation1. Define the ‘stationarity assumption’ inherent to

statistical downscaling of future climate projections.2. Present our ‘perfect model’ (aka ‘big brother’) approach

to quantitatively assess the extent to which the stationarity assumption holds.

3. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…)

4. Introduce options to extend and supplement the method (setting the hurdle at different heights).

5. Invite statistical downscalers to consider testing their methods within the perfect model framework.

6. Garner feedback from workshop participants.

2013 Workshop on Quantitative Evaluation of Downscaled Data

Documents

Transcript of 2013 Workshop on Quantitative Evaluation of Downscaled Data