Large-scale Indicators for Severe Weatherericg/GillelandEtAl2008.pdf · Large-scale Indicators for...

Large-scale Indicators for Severe Weather

Eric Gilleland∗ Matthew Pocernich† Harold E. Brooks‡ Barbara G. Brown§

Patrick Marsh¶

AbstractTrends in extreme values of a large-scale indicator for severe weather (specifically, convective available

potential energy (CAPE) multiplied by 0-6 km wind shear (shear)) are investigated using the generalizedextreme value distribution with trends in the location parameter. The study primarily looks at reanalysisobservational data for the entire globe, but also performs an initial analysis of a regional climate model(CCSM3) over the United States.

Results for global trends from the reanalysis data set are similar to those found previously for thefrequency of high values of this indicator. Comparison of the reanalysis data over the United Stateswith the CCSM3 output show numerous discrepancies, some of which are known problems with boththe reanalysis and climate model output for precipitation.

Key Words: Extreme values, GEV, severe weather, climate models, reanalysis data

1. Introduction

Severe weather typically occurs on fine scales that cannot currently be resolved by the large-scale climate models. Past studies on climate change have been focused primarily on averageweather conditions such as mean temperatures, but more recently concern has arisen regardingthe impact of climate change on more severe weather phenomena (e.g., tornados, hurricanes, hailstorms, strong winds, etc.) as these types of phenomena can have huge impacts on society interms of both lives and economic impacts.

In order to glean information about extremes under a changing climate, one approach isto investigate large-scale indicators of severe weather. That is, are there variables that can beresolved by existing climate models that can be used to make inferences about the intensityand/or frequency of severe weather for different climate scenarios? It is known that concur-rently high values of convective available potential energy (CAPE, J/kg) and 0-6 km wind shear(m/s, henceforth shear) weakly discriminate between types of storms (e.g., Brooks et al., 2003;Rasmussen and Blanchard, 1998) indicated in table 1. Figure 1 shows discrimination plots fordifferent categories of severe storms (table 1) for the product of CAPE and shear.1

Figure 2 shows density scatter plots for shear against CAPE, which makes it clear that highvalues of these two variables rarely occur simultaneously.

Exploration of the trends in the frequency of threshold exceedances for the product of CAPEand shear as well as the intensity of this variable have been explored using a global reanalysis

∗Research Applications Laboratory, National Center for Atmospheric Research, 3450 Mitchell LN, Boulder,CO, 80301, Email: [email protected]

†Research Applications Laboratory, National Center for Atmospheric Research, 3450 Mitchell LN, Boulder,CO, 80301

‡National Severe Storms Laboratory, Norman, OK§Research Applications Laboratory, National Center for Atmospheric Research, 3450 Mitchell LN, Boulder,

CO, 80301¶University of Oklahoma, Norman, OK1The product of CAPE and shear has statistical advantages as well as easier discriminatory properties over

looking at the two bivariately.

Figure 1: Discrimination plots for categories of severe storms as described in table 1. Probabilitydensity function (pdf) graphs are shown in the top panel, and cumulative distributijon function(cdf) graphs in the bottom panel; both stratified by storm severity category.

Table 1: Definitons for categories of severe storms as shown in figure 1.Non-severe hail < 1.9 cm. (3/4 in.) diameter

winds < 55 kts. no tornadoSevere Hail ≥ 1.9 cm. diameter

winds ≤ 55 kts. and < 65 kts. or tornadoSignificant Hail ≥ 5.07 cm. (2 in.) diameterNon-tornadic Winds ≥ 65 kts.Significant Same as sig. tornadic with F2 (or greater) tornado.Tornadic

Figure 2: Density scatter plots of shear vs. CAPE.

dataset with this variable derived from the existing data set (Pocernich et al., 2008). Here, thefocus rests on issues pertaining to the analysis of the intensities of concurrently large values ofthese variables.

We describe the global reanalysis data and climate model output used here in section 2,followed by an overview of the statistical methods in section 3. Section 4 gives the initial findingsfrom evaluations on the global reanalysis data, and section 5 gives the results for the climatemodel output. Finally, discussion of future work is given in section 6.

2. Measurements

Attention is given to an observational data set consisting of global reanalyses of radio soundingsas well as climate model output over the the Continental United States from a global climatemodel. These sets of measurements are described in the following two subsections.

2.1 Global Reanalysis Observations

The reanalysis data are on a 1.875o × 1.915o lon-lat grid with over 17 thousand points, andtemporal spacing every 6 hours for 42 years (1958-1999). Further details about the reanalysisdata can be found in Brooks et al. (2003).

2.2 Global Climate Model Output

Initial exploration of these variables from the current climate as output from the CCSM3 modelis also underway. Initially, the output is for 756 grid points at 1.4o × 1.4o resolution over theUnited States.

3. Statistical Methods

Because the focus of the present study is on the behavior of large values of a process, it isof interest to investigate using extreme value analysis (EVA). We describe the general modelsapplied here in the next subsection, and discuss estimation subsequently.

3.1 Extreme Value Models

Similar to the central limit theorem for sums, the maxima for a sample of independent andidentically distributed random variables follow asymptotically one of three types of distributions.Provided, of course, that the limiting distribution is non-degenerate. These three types can bewritten as a single family of extreme value distributions, known as the generalized extreme value(GEV) distribution, and given by

G(z) = exp{−(1 +

ξ

σ(z − µ))−ξ

+

}, (1)

where µ, ξ ∈ (−∞,∞), σ > 0 are parameters, and y+ = 0 if y ≤ 0. The shape parameter,ξ, determines the type of the distribution where ξ = 0 is the light-tailed Gumbel distributiondefined by continuity, ξ < 0 gives the Weibull distribution with bounded upper tail at µ− σ/ξ,and ξ > 0 yields the heavy-tailed Fréchet distribution with bounded lower tail at µ−σ/ξ. Similarresults hold for exceedances over thresholds, but such models are left here for future work.

When interest is in modeling the GEV distribution (1), one is most often interested in theextreme quantiles, referred to as return levels in this context. Because (1) is invertible, the 1− pquantiles, zp, are easily obtained as

zp ={

µ− σξ

[1− log(1− p)−ξ

], ξ 6= 0

µ− σ log (log(1− p)) , ξ = 0(2)

In order to analyze trends, or incorporate covariate information, it is natural to model themwithin the parameters themselves. Typically, models of the following form are considered.

µ(t) =nµ∑i=0

µifi(t)

σ(t) =nσ∑j=0

σjgj(t)

ξ(t) =nξ∑

k=0

ξkhk(t),

where f, g, h are functions (e.g., sine and cosine, identity, exponential, etc.), t are covariates ortrend variables. Care must be taken when incorporating covariates into the scale parameter, σ,in order to ensure that it is positive everywhere. Usually, an exponential link function is used so

that the model is of the form ln(σ(t)) =nσ∑j=0

σjgj(t).

3.2 Estimation

Distribution (1) leads to the following log-likelihood equation.

log L(θ;z) = −n log σ−

(1− 1/ξ)n∑

i=1

log[1 + ξ

zi − µ

σ

]−

n∑i=1

log[1 + ξ

zi − µ

σ

]−1/ξ

, (3)

subject to the constraint that 1+ ξ(zi−µ)/σ > 0. For the Gumbel case, the likelihood simplifiesto

−n log σ −n∑

i=1

log[zi − µ

σ

]−

n∑i=1

exp[−zi − µ

σ

]. (4)

There is no analytical solution to the optimization over the parameters for (3) and (4).Therefore, numerical optimization routines are required to find the maximum likelihood estimates(MLE’s) for the thre parameters. For small data sets, it is usual to estimate the parameters usingL-moments (e.g., Hosking and Wallis, 1997), or the generalized MLE (GMLE) method of Martinsand Stedinger (2000), as more stable solutions can be found. However, it is not possible toincorporate trends into the parameter estimates using the L-moments approach. The GMLEapproach requires some prior input, which we do not have here. Bayesian estimation (e.g., Colesand Tawn, 1996) is, of course, also possible, and future work will investigate such methods.

The likelihoods (3) and (4) can also be written with the incorporation of covariates in theparameters, and iterative likelihood ratio tests can be used to test for significant improvementsin the model fits. AIC and BIC approaches are also possible, but are not used here.

Figure 3: 20-year return levels for annual maximum CAPE×shear (csmax)estimated by theGEV (left) and the 95% quantile of the reanalysis (right). No spatial correlation is accounted forin either graph.

4. Initial results for global reanalysis data

Initial investigations have centered on fitting the generalized extreme value (GEV) distributionto annual maxima of the product of CAPE and shear (henceforth, csmax). Figure 3 shows theGEV-estimated 20-year return levels from having fit the GEV individually at each grid point(i.e., no spatial correlation taken into account) as well as the empirical 20-year return levelsobtained from the reanalysis csmax (i.e., the 95% quantile taken at each grid point). Of course,estimating a high quantile from such a short record of data is questionable so that the graphon the right is not a very accurate assessment of the “true" 20-year return level. Nevertheless,it is clear that although the GEV estimates seem to reproduce the correct spatial structure,they are everywhere too small. This may be a result of large uncertainty in the estimates (notshown), and may be overcome by employing Bayesian estimation (Richard L. Smith, personalcommunication; see also Coles and Pericchi, 2003).

To obtain information about trends in csmax over the 42 years of global reanalysis data,temporal covariates are investigated in the parameters of the GEV. Iteratively more complicatedmodels are tried and tested for significance using the likelihood-ratio test. Where any trends aredetected for these data, the only significant ones are linear in the location parameters. That is,

µ(year) = µ0 + µ1 · year

Some significant trends in the scale parameter are found, but these occur at grid points wherethe reanalysis data is less believable such as the polar regions. Therefore, these models are notused.

Figure 4 shows the results from fitting a linear trend in the location parameter of the GEV.Checking point-wise significance for these trends is performed. The spatial pattern of the inter-cept (or constant term) recovers the general pattern of high values of csmax, and the slope termsshow a similar pattern as those found from the frequency analysis (Pocernich et al., 2008, notshown). Four regions of interest are inspected more closely. Figure 5 shows these regions without

Figure 4: Intercept (left) and slope (right) terms from fitting a linear trend in the shape param-eter of the GEV. No significance test is performed in this graph.

testing for significance, and figure 6 shows them with point-wise significance. It is importantto account for both spatial correlation and multiple testing, however, especially when analyzingover so many points. Therefore, figure 7 shows the results from applying the false discovery rate(fdr) test proposed by Ventura et al. (2004). Significant positive trends (i.e., increasing csmax in-tensities) are found off the eastern coasts of Asia even after accounting for spatial correlation andmultiple testing issues. Some significant decreases in extreme csmax intensities after applyingthe fdr to the point-wise significance tests are also observed for southern South America, whereasno significant trends remain over the United States. Very little trend activity is detected overEurope, but there exist a few locations of increasing (northern Germany, southern Scandinavia,south-eastern Europe) and decreasing trends (northern Sweden).

5. Results for current climate (1980-1999) as output by CCSM3 over the UnitedStates

Figure 8 shows the median annual maximum (AM) csmax over 1980-1999 from CCSM3 (left)and the reanalysis data (right). While the spatial patterns are similar, there are noticeabledifferences. Furthermore, there are substantial discrepancies in intensities (in both directions)as can be more easily seen in figure 9, which shows their differences (CCSM3 median AM csmax− reanalysis median AM csmax).

Performing traditional verification (i.e., point-to-point), which does not account for smallspatial discrepancies, shows that the CCSM3 does only slightly better than a completely randommodel (skill score (SS) of only about 0.5), and not as well as simply using the previous year’sreanalysis data (table 2). However, it should be noted that the reanalysis data is not necessarilythe “truth" as it, for example, shows the higher values of csmax on the lee side of the Rockies asbeing a bit too far to the east, whereas the CCSM3 captures this spatial feature better than thereanalysis (Harold E. Brooks, personal communication). Similar results are obtained for otheraggregations (apart from the median) of csmax, and indeed other series besides AM (e.g., Marshet al., 2007).

Because of the lack of agreement between the reanalysis and CCSM3 output and the reanaly-sis, and the lack of confidence in what the “truth" really is, one must be careful in making strongassertions about csmax under a changing climate. However, it is important to realize that the

Figure 5: Trends in location parameter (i.e., the slope term) for csmax over four regions ofinterest. No significance test is performed in these graphs.

Figure 6: Trends in location parameter (i.e., the slope term) for csmax over four regions ofinterest. Point-wise significance test is performed in these graphs.

Figure 7: Trends in location parameter (i.e., the slope term) for csmax over four regions ofinterest. False discovery rate (fdr) applied to significance tests performed in these graphs.

Figure 8: Median annual maximum (AM) series of csmax from 1980-1999 for CCSM3 (left)and reanalysis (right).

−120 −110 −100 −90 −80 −70

2530

3540

4550

0

50000

100000

150000

200000

Median AM cape*shear CCSM3 (1980−1999)

−120 −110 −100 −90 −80 −70

2530

3540

4550

0

50000

100000

150000

200000

Median AM cape*shear reanalysis (1980−1999)

Table 2: Traditional verification results from comparing the CCSM3 Median AM csmax (fore-cast) against the reanalysis median AM csmax (1980-1999).MAE 10,660ME 4,835MSE 1.8× 108

MSE - baseline 3.5× 108

MSE - persistence 1.6× 106

SS - baseline 0.488

Figure 9: Difference between median AM csmax from CCSM3 output minus reanalysis.

−120 −110 −100 −90 −80 −70

2530

3540

4550

−40000

−30000

−20000

−10000

0

10000

20000

Median AM CCSM3 − Reanalysis (1980−1999)

climate is essentially the distribution (not necessarily the mean) from which weather is derived.Therefore, an ensemble of climate models should be examined besides just a single realization ofthe distribution.

H.E. Brooks (personal communication) recently found that using the following derived vari-able in place of CAPE better discriminates severe storms (figure 10).

Wmax =√

2 · CAPE (5)

Another advantage of using Wmax in Eq. (5) above over CAPE is that the units are now inm/s; the same as shear. Figure 11 shows the median AM shear × Wmax (henceforth swmax) forboth reanalysis (left) and CCSM3 output (right). As expected, there are still large differences inintensities, which can be more easily seen in figure 12, which shows the differences. The modelshows lower values than the reanalysis over North Dakota and S. Minnesota, as well as SouthernTexas, the Caribbean, Florida, and off the southern east coast. The model generally projectshigher values of swmax over the Rockies and Appalachians, with extremely higher values oversouthern Arizona and into Mexico.

6. Initial Conclusions, Future and Ongoing Work

The initial analysis of extreme intensities of csmax complements the work of Pocernich et al.(2008) where the frequencies of threshold exceedances are studied. The next logical step wouldbe to model the two aspects simultaneously using a peaks over threshold (POT) extreme valueapproach. Further, recent findings by H.E. Brooks (personal communication) show that Wmax(5) instead of CAPE may be more useful in identifying likely severe weather scenarios from large-scale phenomenon. Additionally, it has been recommended that use of Bayesian estimation mayhelp to reduce uncertainty in GEV-estimated return levels, as well as more accurate estimatesconsistent with the data.

Other possible future directions include using other climate model output in addition to theCCSM3 runs used here. One source of output that will be studied are cases from the North Ameri-can Regional Climate Change Assessment Program (NARCCAP, http://www.narccap.ucar.edu/),for example. Another possibiliy is to use global climate model output to initialize such regionalclimate models in order to more directly investigate severe weather distributions under a changingclimate.

Acknowledgments

This work is supported by the Weather and Climate Impacts Assessment Science Program(http://www.assessment.ucar.edu/), which is funded by the National Science Foundation(NSF). The authors thank Harold Brooks and Patrick Marsh for providing us with the globalreanalysis data and climate model output, as well as for their consultations. We also thankRichard L. Smith for insightful suggestions of future directions.

References

Brooks, H., J. Lee, and J. Craven, 2003: The spatial distribution of severe thunderstorm andtornado environments from global reanalysis data. Atmos. Res., 67–68, 73–94.

Coles, S. and L. Pericchi, 2003: Anticipating catastrophes through extreme value modelling.Appl. Statist., 52, 405–416.

Figure 10: (0-6km) Shear against Wmax with red contours showing the conditional probabilitiesof significant severe storms over the United States. Figure courtesy of H.E. Brooks

Figure 11: Median (1980-1999) AM shear × Wmax for reanalysis (left) and CCSM3 output(right).

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

Coles, S. and J. Tawn, 1996: A bayesian analysis of extreme rainfall data. Appl. Statist., 45,463–478.

Hosking, J. and J. Wallis, 1997: Regional frequency analysis: An approach based on L-moments.Cambridge University Press, Cambridge, UK, 240 pp.

Marsh, P., H. Brooks, and D. Karoly, 2007: Assesment of the severe weather environmentin north america simulated by a global climate model. Atmospheric Science Letters, 1–7,doi:10.1002/asl.159.

Martins, E. and J. Stedinger, 2000: Generalized maximum-likelihood generalized extreme-valuequantile estimators for hydrologic data. Water Resources Res., 36, 737–744.

Pocernich, M., E. Gilleland, H. Brooks, B. Brown, and P. Marsh, 2008: Analysis of atmosphericconditions conducive to small scale extreme events from larger scale global reanalysis data.Manuscript in Preparation.

Rasmussen, E. and D. Blanchard, 1998: A baseline climatology of sounding-derived supercelland tornado forecast parameters. Wea. Forecasting , 13, 1148–1164.

Ventura, V., C. Paciorek, and J. Risbey, 2004: Controlling the proportion of falsely rejectedhypotheses when conducting multiple tests with climatological data. J. Climate, 17, 4343–4356.

Figure 12: Median AM shear × Wmax for CCSM3 minus reanalysis (1980-1999).

−500

0

500

1000

1500

2000

2500

Large-scale Indicators for Severe Weatherericg/GillelandEtAl2008.pdf · Large-scale Indicators for...

Documents

Transcript of Large-scale Indicators for Severe Weatherericg/GillelandEtAl2008.pdf · Large-scale Indicators for...