Forecasting a Better Future CMAS Workshop, October 28, 2003 Forecasting Air Quality with Models-3...

Forecasting a Better Future

CMAS Workshop, October 28, 2003

Forecasting Air Quality with Models-3 Components: Performance

Expectations

John N. McHenry, Chief ScientistBaron Advanced Meteorological Systems

c/o North Carolina Supercomputing Center

3021 Cornwallis Road, Research Triangle Park, NC, 27709

Email: [email protected]: 919-248-9237

mailto:[email protected]



Co-AuthorsStuart McKeen2, William F. Ryan3,

Nelson Seaman4, Janusz Pudykiewicz5, Georg Grell6, Ariel Stein8, Carlie Coats1,

Sarav Arunachalam7, Jeff Vukovich7, Wayne Angevine2, Brian Eder9,

1Baron Advanced Meteorological Systems, RTP, NC2NOAA Aeronomy Laboratory, Boulder, CO

3The Pennsylvania State University4NOAA/NWS Office of Science and Technology, Suiteland, MD5Meteorological Service of Canada, Dorval, Quebec, CANADA

6NOAA Forecast Systems Laboratory, Boulder, CO7Univ. of N. Carolina, Carolina Environmental Program, Chapel Hill, NC

8NOAA Office of Research and Development, Silver Spring, MD9NOAA Office of Research and Development, RTP, NC



Outline• Introduction: What is Numerical Air Quality Prediction?

• Overview: NOAA 2001-2002 pilot programs in Numerical Air Quality Prediction

• 2001 NOAA “Early Start” Pilot: Forecast Technique Intercomparison for Aug. 1-10, 2001 Episode

• 2002 NOAA Pilot: Model Intercomparison for Aug 5-29, 2002

• Results and Conclusions: benchmark expectations for the Eta-CMAQ Forecast System



What is Numerical Air Quality Prediction (NAQP)?

• NWP model(s) such as MM5, Eta

• Anthropogenic and Biogenic Emissions Model(s) such as SMOKE and its component models

• Photochemical/Particulate Atmospheric Chemistry Model(s) such as CMAQ, MAQSIP-RT, CHRONOS, HYSPLIT-CHEM

•OR: Integrated Met/Chem models such as MM5-Chem, WRF-Chem

• Data Ingest

• Model Output

• Product Dissemination within operational forecasting deadlines




BAMS Component Models and DataFlows

MM5V3.4 (NC WxScope)

SMOKE Emissions

Processing and Modeling System

Met Data Ingest

MAQSIP-RT Photochemical Model

Guidance Products




Operational SMOKE Implementation: Single Instantiation




Typical Output Guidance Products




• 1998: Single 36km grid over NE US • 1999: Single 36km grid over NE US; 45km 2/3 CONUS Grid plus Texas 15km and Houston 5km

• 2000: Removed original 36km grid; added SE US 15km grid

• 2001/02: Added NE 15km grid; Boston and Birmingham 5km grids

• 2003:

• Inaugurated BAMS

• Added expanded Mid-Atlantic 15km grid; re-configured SE US grid; added demo Midwest 15km grid

Some recent history at MCNC (then) and BAMS (now)



Overview: NOAA 2001-2002 pilot programs in Numerical Air Quality Prediction

• Test Existing Numerical Air Quality Prediction Systems and their components in preparation for selection and eventual deployment of a national forecasting capability at NCEP

• Provide information crucial to the development of a National Weather Service forecast system capable of meeting the needs of operational air quality forecasters

•Phase 1: Summer 2001 “Early Start” Initiative •MAQSIP-RT running in real-time in New England with evaluation against current techniques

•MM5-Chem and HYSPLIT-Chem (offline testing)

•Phase 2: Summer 2002 Pilot coordinated with the New England Air Quality Study (NEAQS)

•MAQSIP-RT, MM5-Chem, and HYSPLIT-Chem all running in real-time providing forecasting guidance to the NOAA Ship “Ron Brown” collecting atmospheric chemical data off the New England Coast



Phase 1 Results: Aug 1-10, 2001 NE/Mid-Atlantic Episode

Typical NE US Ozone Episode MAQSIP-RT 15km NE 36-hour forecasts

–Evaluated against hourly surface monitor data–Statistical model forecast comparision for PHL: Peak 1h–Multi-forecast comparison for New England: Peak 8h

–Canadian CHRONOS model–Official NESCAUM (human issued) forecasts–Persistence

Aug 5 thrown out due to computer problems

MAQSIP-RT Tested Against Previously Existing Forecast Techniques Available in Summer 2001 in

New England



Aug 1-10, 2001 Episode

MAQSIP-RT and SMOKE 45km and 15km domains in reference to MM5 45km domain. USGS land-cover categories

are shown. The NE 15km domain is outlined in yellow. The MM5 15km domain and 5km domains are not shown




500 hPa analysis, prepared by NCEP, for 1200 UTC on August 7, 2001. Solid contours are geopotential heights in dm,

dashed contours are temperature in Celsius. Station data

follows the standard convention.




•August 1-2: Onset of Higher O3 and Air Mass Differences

•August 1 an upper level trough was just offshore with surface high pressure centered over central Maryland

•Early on August 2, an area of low pressure developed southeast of Cape Hatteras (HAT) at the base of the departing trough. Onshore flow was enhanced as the center of high pressure moved offshore, providing a cooler, cleaner maritime air mass to the southern Mid-Atlantic while in New England, winds re-circulated as high pressure passed to the south

•August 3-5: Frontal Boundaries

•The first in a series of short waves crossed New England early on August 3 driving a cold front, preceded by a prefrontal trough, over the eastern Great Lakes.

•As its upper level support moved rapidly offshore on August 4, the low-level frontal boundary became quasi-stationary along a line from Portland, Maine (PSM) to Pittsburgh (PIT). Significant cloud cover occurred east of the I-95 Corridor as the upper level low lingered offshore.

•August 6-10: The “High Tide of Summer.”

•The frontal boundary dissipated over the region on August 6 and temperatures warmed as the upper level ridge pushed slightly east

•Hot weather Aug 7. August 8, a “back door” cold front dropped quickly across eastern New England reaching just north of Providence (PVD) by 1800 UTC

•As the short wave departed on August 9, the upper level ridge oscillated eastward. Boundary layer winds backed to the west-southwest, the band of highest O3 became oriented directly along the I-95 Corridor, and peak concentrations rose.

•A vigorous cold front approached the region on August 10.




Model predicted (left) and observed (right) peak

1-hour-average O3 for August 2 (top) and 4 (bottom), 2001. Observed O3 courtesy EPA AIRNow (http://www.epa.gov/

airnow).

http://www.epa.gov/airnow









Model predicted (left) and observed (right) peak

1-hour-average O3 for August 7 (top) and 8 (bottom), 2001. Observed O3 courtesy EPA AIRNow (http://www.epa.gov/

airnow).










Hourly predicted (blue line) and observed (red line) O3 for the Tioga monitor located in north-central Pennsylvania (41o38’41” N, 70o56’21” W; top), and domain-wide-mean

hourly predicted and observed O3 concentrations for August 9, 2001 (bottom)




Domain-average hourly observed and predicted O3 concentrations for the periods August 1-4 and August 6-10 for the entire NE 15 km model

domain




MAQSIP-RT Forecast Evaluated Against Hourly Surface Monitor Data




MAQSIP-RT Comparison Against Statistical Model in Philadelphia

• PHL statistical model in continuous use for 5+ years• Multiple linear regression using a 10-year database updated annually• Three distinct forecast algorithms used and combined• Meteorological inputs come from NWP models selected at run-time based on operational assessment of NWP strengths and weaknesses for a given forecast scenario




Conclusions from 1h evaluation

•MAQSIP-RT meets or exceeds EPA performance criteria for regulatory application, but in forecast mode:

• Gross error, in percent, in the 15-27% range throughout the episode (EPA criteria 35% )• Normalized bias –9.7% (EPA criteria +/- 5-15% )• Unpaired peak prediction accuracy was not computed

Conclusions from peak-1h evaluation against PHL Statistical Model

•MAQSIP-RT met or exceeded the raw statistical guidance for PHL: • Both models’ Mean-Absolute Error about 12ppb• Median Absolute Error: MAQSIP-RT (7.3ppb) versus PHL (~10.8ppb)• MAQSIP-RT carried better forecasts on 5 of the nine days evaluated




Map of the northeastern United States showing 67 surface ozone monitor locations at which forecasts are issued daily. The 15km MAQSIP-RT NE forecast domain is outlined in red. Corridor monitors are enclosed within the yellow “finger,” with distinction between coastal (blue dots) and interior (green dots) corridor

monitors shown. Western-rural monitors are shown as red dots, coastal monitors as orange dots, and the

single Great Lakes monitor as the yellow dot near the eastern shore of Lake Ontario.

MAQSIP-RT Comparison Against Other Operational Forecast Methods in New England for

Peak 8HR Forecasts




-25 -20 -15 -10 -5 0 5 10 15 20 25

801

802

803

804

806

807

808

809

810

Episode

Dat

e of

Epi

sode

Bias (ppbv)

PER_biasNEF_biasCHR_biasMAQ_bias

Episode and daily bias statistics for peak 8-hour-average O3 at all 67 forecasted monitors for the August 1-10, 2001 episode. MAQSIP-RT (MAQ) is given in green, NE forecasts (NEF) in blue,

CHRONOS (CHR) in red and Persistence (PER) in brown. All measures are in ppbv.




As in previous slide, but for mean absolute error (MAE), top, and

rmse (bottom)




Scatter plot of MAQSIP-RT forecasts and observations for forecasted monitors grouped by western-rural monitors (yellow), interior corridor monitors (purple),

coastal monitors (blue), and the Great Lakes monitor (red). Observations are plotted as a function of the model forecast, best linear fit statistics are given in the

border.




Scatter plot of peak 8-hour-average O3 for the Interior Corridor forecast monitors for MAQSIP-RT and NE forecasts.

Observations are plotted as a function of model forecasts, best linear fit statistics are given in the border.




Spatial peak 8-hour-average 24 h ozone forecasts

(CHRONOS <top left>; MAQSIP-RT <top right>;

Official NE Forecast <bottom left>) versus gridded

observations (bottom right, courtesy EPA AIRNow) for

August 2, 2001, using identical color scales

following US EPA color-scale




Same as in previous slide,

except for Aug 7, 2001.




Same as in previous slide, except for August 9, 2001




Peak 8h forecast performance measures for selected sub-regions (all measures in ppbv).




Contingency table for threshold forecasts.




Skill score results for forecast methods. “Blend” refers to a 50-50

weighted average of both numerical forecasts.




Study Conclusions

•Overall model performance by MAQSIP-RT was quite good

•The model, executed in real time as a forecast system, met or exceeded EPA performance criteria for regulatory air quality models

•Moreover, it performed consistent with current benchmark statistical forecast methods with respect to metropolitan-wide peak 1-hour-average O3 in the PHL area

•In the NE US Traditional Measure Forecast Evaluation:

•MAQSIP-RT improved on expert forecasts, persistence, and the CHRONOS numerical model by a variety of traditional (bias, MAE, rmse, IA) measures, taken over the whole set of monitors

•MAQSIP-RT performed best in two key sub-regions, the WRMs that “define” the regional background O3 concentrations, and the CMs that are often subject to abrupt air mass changes.

•The expert forecasts were slightly better in the interior of the I-95 corridor, reflecting model difficulties in resolving the effects of steep near-urban precursor gradients as well as forecaster experience in this environment




Study Conclusions

•In the NE US Categorical (threshold) Measure Forecast Evaluation:

•MAQSIP-RT outperformed the CHRONOS model and provided results similar to the expert New England forecasters for an 8-hour-average of 85ppbv as the threshold

(This threshold represents the cutoff between relatively good and relatively poor air quality and is used to trigger Ozone Action Day advisories in New England)

•The strengths of MAQSIP-RT with respect to high O3 threshold forecasts are the lack of a systematic bias and a relatively low false alarm rate.

•Overall, these results suggest that MAQSIP-RT, if used as the only source of forecast information, would have provided OAD and health advisories statistically indistinguishable from those issued by the expert New England forecasters for this episode.



Phase 2 Results: Aug 5-29, 2002, NE Quadrant of US

• comparison of three prototypical air quality forecast models in preparation for operational development at NOAA/NCEP

• MAQSIP-RT• MM5-Chem (Grell, FSL)• HYSPLIT-Chem (Stein, ARL)

• Evaluated for period Aug 5-29, 2002



Phase 2: Summer 2002 NOAA Pilot

MM5-Chem Domains




HYSPLIT-Chem Domain




Peak 1-Hr Concs Peak 8-Hr Concs

MAQSIP-RT

MM5-Chem

HYSPLIT-Chem

Model = (0.63) Obs + 24.4Model = (0.66) Obs + 21.5

Model = (0.68) Obs + 30.1 Model = (0.70) Obs + 24.9

Model = (0.48) Obs + 35.4 Model = (0.39) Obs + 32.2

Scatter plots of the model versus AQS for both 1- and 8-hour maximum ozone concentrations (ppb) with exceedance thresholds and least squares regression indicated.




MAQSIP

RMSE (ppb) NME (%) MB (ppb) NMB (%)

Erro

r

-40

-20

0

20

40

Bia

s

All <65 65-84 85-103 >=104

40

20

0

MM5-Chem


Erro

r

-40

-20

0

20

40

Bia

s

All <65 65-84 85-103 >=104

40

20

0

HYSPLIT


Erro

r

-40

-20

0

20

40

Bia

s

All <65 65-84 85-103 >=104

40

20

0

Errors and Biases over concentration ranges corresponding to EPA’s AQI for maximum 8-hr forecast




Discrete evaluation results

Categorical evaluation results




Skill Score also used to compare models, using the RMSE of the Persistence forecast as

a baseline

Statistically, the persistence forecast and model forecast can be expressed as:

P = μ + EP (1)

M = μ + EM (2)

where P is the forecasted value by persistence forecast, M is the value forecasted by a model, μ is the true value, and EP and EM are the errors associated with persistence forecast and model forecast, respectively.

If the model forecast outperforms the persistence forecast, then EM must be smaller than EP. Based on (10) and (11), the skill score can be defined as:

SS = [ (EP– EM)/ EP ] x 100% (3)

where EP and EM can be any valid error metrics such as RMSE and NME (in this study RMSE is used to calculate the skill score).




Persistence Forecast

• Temporal Persistence: defined as peak-1h or peak-8h observed value at a given AQS observation location persisting 24-hours into the future

• Spatial Persistence uses Location A’s observed values as Location B’s Forecast, if A is the nearest location to B among all the available AQS stations within the model domain




Spatial RMSE Comparisons: Provide additional background for interpreting the spatial skill score

-92 -90 -88 -86 -84 -82 -80 -78 -76 -74 -72 -70 -68

34

36

38

40

42

44

46

48

0 to 10 1 0 to 20 2 0 to 30 3 0 to 55

-92 -90 -88 -86 -84 -82 -80 -78 -76 -74 -72 -70 -68

34

36

38

40

42

44

46

48

0 to 10 1 0 to 20 2 0 to 30 3 0 to 55

-92 -90 -88 -86 -84 -82 -80 -78 -76 -74 -72 -70 -68

34

36

38

40

42

44

46

48

0 to 10 1 0 to 20 2 0 to 30 3 0 to 55

Blue = 0-10 PPB

Green = 10-20 PPB

Red = 20-30 PPB

Black = 30-55 PPB

MAQSIP-RT

MM5-Chem

HYSPLIT-Chem




Skill Scores (TSS = temporal score; SSS = spatial score) for maximum 1-hr forecast

Skill Scores for maximum 8-hr forecast

As measured against persistence RMSE as baseline



Summer 2002 NOAA Pilot Study Conclusions

•Accuracy statistics for all three models exceeded predefined NWS goals ( the 90% accuracy categorical statistic)

•All models over-predict when O3 concentrations are lower and under-predict otherwise

•Each model produces substantial errors

•None of the models predicted exceedances well as shown by low Critical Success Indices

•False Alarm Rates or number of times that the model predicted an exceedance when none occurred were high for all models for both 1-hr and 8-hr forecasts

•MAQSIP-RT beat the persistence forecast over time as measured by the Skill Score RMSE except when concentrations are greater than 120 ppb for maximum 1-hr forecast and 104 ppb for maximum 8-hr forecast

•Both MM5-Chem and HYSPLIT perform worse than the persistence forecast as measured by the Skill Score



Eta-CMAQ: What should we expect?• Currently, NOAA has established a single, statistically weak performance metric to define “adequacy” in forecast performance for Eta-CMAQ:

Accuracy of 90% in distinguishing exceedances versus non-exceedances ( peak-1h versus peak-8h needs clarification )

• At a minimum, metrics that are consistent with today’s best available model, resulting from NOAA’s own pilot studies, should define the adequacy of a “national” forecast capability.

• These metrics should include regulatory, discrete, categorical, and spatial/temporal skill measures, be developed over regional subsets of the data, and be available for objective comparison with other models and forecast methods

• “Accuracy” as defined in the contingency table should probably be only a secondary metric among this set



Eta-CMAQ: What should we expect?Some Suggested Minimum Criteria for Acceptable Peak

8-HR Performance:

Categorical

1. Bias .6 – 1.4

2. CSI at least 15%

3. POD at least 25%

4. FAR under 70%

Discrete

1. Mean Bias < ~ 5 PPB

2. MAE < ~ 13PPB

3. RMSE < ~ 18PPB

Skill Scores

1. Should be positive.

Regulatory

1. Gross Error ~ 25%

2. Normalized Bias under 15%



Contact Information

•Contact Information

•Web: http://www.baronams.com

•Web: http://www.baronservices.com

John N. McHenry, Chief ScientistBaron Advanced Meteorological Systems

c/o North Carolina Supercomputing Center

3021 Cornwallis Road, Research Triangle Park, NC, 27709Email: [email protected]

Phone: 919-248-9237

http://www.baronams.com/

http://www.baronservices.com/






Forecasting a Better Future CMAS Workshop, October 28, 2003 Forecasting Air Quality with Models-3...

Documents

Transcript of Forecasting a Better Future CMAS Workshop, October 28, 2003 Forecasting Air Quality with Models-3...