TF5651_ch07

7/27/2019 TF5651_ch07

1/16

Chapter 7

Analysis of an air quality

data set

In this Chapter we will make a detailed analysis of a comprehensive set of air

pollution concentrations and their associated meteorological measurements. I

would like to thank Professor David Fowler and Dr Robert Storeton-West of the

Centre for Ecology and Hydrology (CEH) for their help in supplying the data.

The measurements were taken over the period 1 January31 December 1993

at an automatic monitoring station operated by CEH at their Bush Estate

research station, Penicuik, Midlothian, Scotland. Three gas analysers provided

measurements of O3, SO2, NO, and NOx; NO2 was obtained as the difference

between NOx and NO. Windspeed, wind direction, air temperature and solarradiation were measured by a small weather station. The signals from the

instruments were sampled every 5s by a data logger, and hourly average values

calculated and stored.

7.1 THE RAW DATA SET

There were 8760 hours in the year 1993. Hence a full data set would involve 8760

means of each of the nine quantities, or 78 840 data values in all. An initialinspection of the data set showed that it was incomplete. This is quite normal for

air quality data there are many reasons for loss of data, such as instrument or

power failures or planned calibration periods. Any missing values have to be

catered for in subsequent data processing. The data availability for this particular

data set is given inTable 7.1.

Many of these lost values were due to random faults and were uniformly dis-

tributed throughout the year. The reliability of gas analysers depends largely on

the simplicity of their design, and the ranking of the analysers in this example is

quite typical. Ozone analysers based on UV absorption are very straightforwardinstruments that rarely fail provided that regular servicing is carried out. UV

fluorescence sulphur dioxide analysers are rather more complicated, and NOxanalysers even more so. The lower availability for the nitrogen oxides in this case

was in fact due to an extended period at the start of the year when problems were

being experienced with the analyser.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

2/16

Table 7.1 Data availability for the 1993 CEH data set

Measurement Number of hours Percentage of 8760available

Ozone 8569 97.8Sulphur dioxide 8251 94.2Nitric oxide 7010 80.2Nitrogen oxides 7010 80.2Nitrogen dioxide 7010 80.2Windspeed 8443 96.4Wind direction 8459 96.6Air temperature 8650 98.7

Solar radiation 8642 98.7

The raw data for gas concentrations, wind speed and wind direction are shown

as time sequences inFigure 7.1. Plotting the data in this form is an excellent rapid

check on whether there are serious outliers (data values lying so far outside

the normal range that they are probably spurious). Differences in the general

trends of the concentrations through the year are also apparent. The O3 con-

centration (Figure 7.1(a)) rose to a general peak in AprilMay before declining

steadily through to November, and the most common values at any time werearound half the maxima of the hourly means. Sulphur dioxide concentrations

(Figure 7.1(b)) were typically much smaller, although the maxima were nearly

as great as for ozone. Typical NO concentrations (Figure 7.1(c)) were low

throughout the year, although the intermittent maxima were higher than for the

other gases. NO2 concentrations (Figure 7.1(d)) were high around May and

November, and particularly low in June/July. There were occasions when the

concentrations of any of these gases increased and declined very rapidly they

appear as vertical lines of points on the time series. Superficially, there does not

appear to be any systematic relationship between the timing of these occasionsfor the different gases. The windspeed (Figure 7.1(e)) declined during the first

half of the year and then remained low. The wind direction (Figure 7.1(f)) was

very unevenly distributed, being mainly from around either 200 (just west of

south), or 0 (north).

7.2 PERIOD AVERAGES

The plots shown in Figure 7.1 give an immediate impression of the variationsduring the year, but are not of much use for summarising the values for

comparison with other sites or with legislated standards, for example. We

therefore need to undertake further data processing. The most straightforward

approach is to explore the time variations by averaging the hourly means over

different periods. First, we need to decide how to handle those missing values.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

3/16

Figure 7.1 Time series of hourly means for the 1993 CEH data set.

60

50

40

30

20

10

0

0 1000 2000 3000 4000 5000 6000 7000 8000Hour of the year

Ozone hourly means

Ozoneconcentration/ppb

60

50

40

30

20

10

0

0 1000 2000 3000 4000 5000 6000 7000 8000

Hour of the yearSulphur dioxide hourly means

SO

conce

ntration/ppb

2

SO

concentration/ppb

2

0 1000 2000 3000 4000 5000 6000 7000 8000

Hour of the yearNitric oxide hourly means

198

178

158

138

118

98

78

58

38

18

2

(a)

(b)

(c)

2002 Jeremy Colls

7/27/2019 TF5651_ch07

4/16

80

70

60

50

40

30

20

10

00 1000 2000 3000 4000 5000 6000 7000 8000

NO

concentration/ppb

2

Hour of the yearNitrogen dioxide hourly means

16

14

12

10

8

6

4

00 1000 2000 3000 4000 5000 6000 7000 8000

Windspee

d(m/s)

Hour of the yearWindspeed hourly means

2

0 1000 2000 3000 4000 5000 6000 7000 8000

Hour of the year

360

320

280

240

200

160

120

80

40

0

Winddirection/degrees

(e)

(f)

Figure 7.1 Continued.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

5/16

60

50

40

30

20

10

0

35

30

25

20

15

10

5

01 3 5 7 9 1

113151719 21232527293133353739414345474951

Week of the yearWeekly means for all pollutants

35

30

25

20

15

10

5

0

1 2 3 4 5 6 7 8 9 10 11 12

Conc

entration/ppb

Month of the yearMonthly means for all pollutants

Concentration/ppb

O3SO2NONOxNO2

O3SO2NONOxNO2

O ,ppbv3

SO2,ppbv

NO,ppbv

NOx,ppbv

NO2

Concentration/ppb

11835526986103120137154171

188205222239256273290307324341

358

Day of the yearDaily means for all pollutants

(a)

(b)

(c)

O3

NOx

NO2

SO2

Figure 7.2 Time series of: (a) daily, (b) weekly and (c) monthly means for the 1993 CEHdataset.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

6/16

When we plotted the (nominal) 8760 h means, a few missing values did not have

a big effect on the appearance. As we average over longer periods, the number of

values decreases to 365 daily means, 52 weekly means and only 12 monthlymeans, so that the effect of missing values becomes proportionately greater.

Before we start averaging, we must decide on a protocol that includes as much of

the data as possible, but does not create an average value when there is simply not

enough data to justify it. For example, consider the calculation of a daily average

from the 24 individual hourly averages that contribute to it. If one value is

missing, and the sequence of values before and after is varying smoothly, then it

is legitimate to substitute the missing value with the average of the adjacent

values. If the sequence varies erratically, this procedure serves little purpose.

Instead, we can ignore the missing value, and calculate the daily mean as theaverage of the remaining 23 h, arguing that the whole day was still well repre-

sented. If only 12 or 8 h remain (as might happen if a faulty instrument was

reinstated in the early afternoon), then this argument loses credibility and the

whole day should be discarded. The same idea can be applied to the calculation

of weekly, monthly and annual averages, with a requirement that, say, 75% of the

contributing values be present if the average is to be calculated. This philosophy

must particularly be adhered to when the data is missing in blocks, so that no

measurements have been taken over significant proportions of the averaging

period. For example, it is clear fromFigure 7.1(d)that the annual average NO2concentration would not include any of January or February, and therefore might

not be representative of the year as a whole.

InFigure 7.2(ac)the 1993 data for gas concentrations are presented as daily,

weekly and monthly averages respectively. The short-term variations are succes-

sively reduced by the longer averaging periods, and the trends that we originally

estimated from the raw data become clearer.

7.3 ROSES

In Section 6.1.1 we discussed the influence of wind direction on the pollutant

concentration at a point. We have analysed the CEH dataset specifically to

highlight any such dependencies. The 360 of the compass were divided into

16 sectors of 22.5 each. The hourly means taken when the wind was from

each sector were then averaged, and the values plotted in the form shown in

Figure 7.3. These diagrams are known as roses or rosettes. Figure 7.3(a) shows

that the wind direction was usually from between South and South-west, and

Figure 7.3(b) that these were also the winds with the highest average speeds.Figure 7.3(c) gives the ozone rose for the year the almost circular pattern is

expected because ozone is formed in the atmosphere on a geographical scale

of tens of km, rather than being emitted from local sources. Hence the

concentration should be relatively free of directional dependence. Sulphur

dioxide, on the other hand, is a primary pollutant which will influence

2002 Jeremy Colls

7/27/2019 TF5651_ch07

7/16

Figure 7.3 Direction roses of wind frequency, wind speed and pollutant gasconcentration

concentrations downwind of specific sources. Figure 7.3(d) indicates possiblesources to the North, South-east and West-north-west of the measurement site.

Concentrations from the sector between South and South-west the predomi-

nant wind direction are the lowest. The roses for NO (Figure 7.3(e)) and NO2(Figure 7.3(f)) are not so well defined. These gases are both primary pollutants

(dominated by NO) and secondary pollutants (dominated by NO2). Hence we

2002 Jeremy Colls

7/27/2019 TF5651_ch07

8/16

can see both patterns of directional dependence, with NO behaving more like

SO2, and NO2 more like O3.

Figure 7.4 shows the location of the monitoring site in relation to localtopographical features and pollution sources, knowledge of which can help to

understand the pollution data. Two factors strongly influence the wind rose the

Pentland Hills run north-eastsouth-west, and the Firth of Forth estuary

generates northsouth sea breezes. The combined effect of both factors produces

the strongly south to south-west wind rose which was seen in Figure 7.3(a). The

main sources responsible for the primary pollutants are urban areas and roads.

The city of Edinburgh, which lies 10 km to the north, generates the northerly

SO2 peak seen on Figure 7.3(d). Although the small town of Penicuik lies close

to the south, there is apparently no SO2 peak from that direction, nor is there anidentifiable source responsible for the south-east SO2 peak. Detailed inter-

pretation of such data cannot be made without a detailed source inventory, since

weak low sources close to the monitor can produce similar signals to those from

stronger higher more distant sources.

Figure 7.4 The topography, urban areas and roads around the CEH measurementsite at Penicuik.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

9/16

7.4 DIURNAL VARIATIONS

Another way of gaining insight into pollutant occurrence is to average the con-centrations according to the hour of the day. We must be careful, though, to allow

for the ways in which the characteristics of the days themselves change during the

year. InFigures 7.5and7.6, we have carried out diurnal analyses for the months

of June and December respectively.

In June, the period of daylight is long and the peak solar radiation

high (Figure 7.5(a)). The air temperature is warm (Figure 7.5(b)) and shows a

pronounced afternoon increase in response to the solar radiation. Average

windspeeds are low, with convective winds increasing during daylight

hours (Figure 7.5(c)). The ozone concentration (Figure 7.5(e)) shows abackground level of about 23 ppb, with photochemical production increasing

this concentration to a peak of 30 ppb at around 1600. The diurnal variation of

SO2 (Figure 7.5(f)) is quite different there are sharp peaks centred on 1000 and

1700 which result from local emissions, and no clear dependence on solar

radiation. As with the pollutant roses, the diurnal variations of NO and NO2 are

a blend of these two behaviours.

Figure 7.6(ah) show the corresponding variations during December. Now,

the days are short and solar energy input is a minimum. Air temperatures are low

and barely respond to the sun, windspeeds are higher and almost constant throughthe day. As a consequence of these changes, ozone shows no photochemical

production in the afternoon. Somewhat surprisingly, SO2 has lost all trace of

the 1000 and 1700 spikes, although these remain very clear for NO. The pattern

for NO2 is very similar in December and June.

7.5 SHORT-TERM EVENTS

So far in this chapter, we have grouped data in different ways specifically to

smooth out short-term variations and clarify patterns. We can also benefit from

a detailed look at shorter periods of measurements. In Figure 7.7are shown the

time series for one particular period of 300 h (between 4100 and 4400 h, or

roughly from the 20 June to the 3 July). The wind direction was generally

southerly, except for two periods of about 50 h each when it swung to the north

and back several times. When the wind direction changed, there were bursts of

higher concentrations of NO, NO2 and SO2, and the O3 background concentra-

tion was disturbed. These changes were probably associated with emissionsfrom a local source that was only upwind of the measurement site when the wind

was from one particular narrow range of directions. This would not only bring

the primary pollutants, but also excess NO to react with the O3 and reduce the

concentration of the latter.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

10/16

Figure 7.5 Average diurnal variations in June.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

11/16

Figure 7.6 Average diurnal variations in December.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

12/16

Figure 7.7 Variations of gas concentration with wind direction over a single period of300h. 2002 Jeremy Colls

7/27/2019 TF5651_ch07

13/16

7.6 FREQUENCY DISTRIBUTIONS

As discussed inChapter 4, the concentrations of air pollutants often show a log-normal frequency distribution i.e., the logarithms of the concentrations are

distributed normally. We have analysed the hourly and daily means from the Centre

for Ecology and Hydrology data set to confirm this. The overall range of the

concentrations that occurred over the year was divided into subranges, and the

number of values that fell within each subrange was counted. This is the frequency

distribution. These counts were then expressed as a percentage of the total number,

and summed by subrange to give the cumulative frequency distribution. If the

frequency distribution is log-normal, then the cumulative distribution plots as a

straight line on log-probability axes. In Figures 7.8 and7.9, the distributions for thehourly and daily means of the five gases are shown. Those for SO2, NO and NOxare quite linear, NO2 less so, and O3 not at all. The O3 distribution is characteristic

100

10

1

95.0 90.0 70.0 50.0 30.0 10.0 5.0 1.0 0 1.

Proportion of the time for which the hourly meanconcentration exceeded the value given on the axis/%y

Gasconcentration/ppb

O3

NOx

NO2

NO

SO2

Figure 7.8 Cumulative frequency distributions of hourly-mean pollutant concentrations.

2002 Jeremy Colls
http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/

7/27/2019 TF5651_ch07

14/16

of a variable that has a significant background component the concentration was

not low for as high a proportion of the time as the log-normal form requires.

Certain statistical parameters can be derived from the log-probability curves.

Commonly quoted are the concentrations below which the value falls for 50%

(median), 90%, 98% and 99% of the time. The 98% value of daily means is used

in European Union Directives on air quality it is equivalent to stating that the

concentration should not be exceeded for more than seven days in the year. The

values indicated by Figures 7.8and 7.9 are extracted in Table 7.2. It is clear thatone such parameter alone is not sufficient to define the distribution. If the distri-

bution is linear, we can measure the gradient, which is equivalent to the standard

geometric deviation of the sample. Then the median and gradient completely

define the population distribution. There are more complex formulations of the

log-normal distribution that can be used to describe non-linear data sets.

100

10

1

99.9 99.0 95.0 70.0 50.0 10.0 5.0 1.0 0 1.

Proportion of the time for which the daily meanconcentration exceeded the value on the axis/%y

Gasconcentra

tion/ppb O3

NOx NO2

NO

SOz

90.0 30.0

Figure 7.9 Cumulative frequency distributions of daily-mean pollutant concentrations.

2002 Jeremy Colls

7/27/2019 TF5651_ch07

15/16

7.7 FURTHER STATISTICAL ANALYSES

Other standard statistical parameters can be used to describe the data. The

summary statistics for the 1993 data are given in Table 7.3.

Finally, we can apply the ideas on the relationships between the period

maxima that were outlined in Chapter 4. If the maximum 1-h concentration is

Cmax,1 h, and the maximum over any other period t is Cmax,t, then we should find

that Cmax,t Cmax,1 h tq, where q is an exponent for the particular gas. For the

Table 7.2 Percentiles of hourly and daily means

Pollutant Hourly means Daily means

50 90 98 99 50 90 98 99(per cent) (per cent)

O3 24 34 40 42 24 32 36 38SO2 1.5 6 13 15 2 5 9 11NO 1 3 16 27 1 4 13 18NOx 5 21 41 56 7 18 30 39NO2 5 17 27 31 7 14 22 23

Table 7.3 Summary statistics for the 1993 CEH data set

Gas Hourly means/ppb Daily means/ppb Weekly means/ppb

Mean Median Standard Median Standard Median Standarddeviation deviation deviation

O3 23.6 24.7 9.2 23.8 7.4 23.4 5.8SO

2

2.6 1.6 3.1 2.0 2.2 2.7 1.4NO 1.7 0.3 7.5 0.7 3.7 1.3 2.0NOx 9.2 5.8 11.6 7.5 7.8 8.9 5.3NO2 7.5 5.3 6.9 6.7 5.2 7.8 27.2

Table 7.4 Values ofCmax,t for the different pollutants

Cmax,t Pollutant

O3 SO2 NO NOx NO2

Cmax, 1 h 57 47 185 186 72Cmax, 1 day 43 16 38 57 37Cmax, 1 week 34 7 11 24 16Cmax, 1 month 31 5 7 20 13q 0.095 0.35 0.51 0.35 0.27

2002 Jeremy Colls
http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/

7/27/2019 TF5651_ch07

16/16

1993 data set, the maximum values over the different averaging periods are

shown inTable 7.4. Plotting log Cmax,t against log t gives the results shown in

Figure 7.10, in which the gradients of the lines give the values of q for thedifferent gases.

1000

100

10

1

1 10 100

Number of hours

Maximum

concentrationduringperiod/pp

b

O3

NOxNO2 NO

SO2

1 hour 1 day 1 week 1 month

Figure 7.10 Correlations between the maximum period average and the averaging period.

TF5651_ch07

Documents

Transcript of TF5651_ch07