TF5651_ch07

download TF5651_ch07

of 16

Transcript of TF5651_ch07

  • 7/27/2019 TF5651_ch07

    1/16

    Chapter 7

    Analysis of an air quality

    data set

    In this Chapter we will make a detailed analysis of a comprehensive set of air

    pollution concentrations and their associated meteorological measurements. I

    would like to thank Professor David Fowler and Dr Robert Storeton-West of the

    Centre for Ecology and Hydrology (CEH) for their help in supplying the data.

    The measurements were taken over the period 1 January31 December 1993

    at an automatic monitoring station operated by CEH at their Bush Estate

    research station, Penicuik, Midlothian, Scotland. Three gas analysers provided

    measurements of O3, SO2, NO, and NOx; NO2 was obtained as the difference

    between NOx and NO. Windspeed, wind direction, air temperature and solarradiation were measured by a small weather station. The signals from the

    instruments were sampled every 5s by a data logger, and hourly average values

    calculated and stored.

    7.1 THE RAW DATA SET

    There were 8760 hours in the year 1993. Hence a full data set would involve 8760

    means of each of the nine quantities, or 78 840 data values in all. An initialinspection of the data set showed that it was incomplete. This is quite normal for

    air quality data there are many reasons for loss of data, such as instrument or

    power failures or planned calibration periods. Any missing values have to be

    catered for in subsequent data processing. The data availability for this particular

    data set is given inTable 7.1.

    Many of these lost values were due to random faults and were uniformly dis-

    tributed throughout the year. The reliability of gas analysers depends largely on

    the simplicity of their design, and the ranking of the analysers in this example is

    quite typical. Ozone analysers based on UV absorption are very straightforwardinstruments that rarely fail provided that regular servicing is carried out. UV

    fluorescence sulphur dioxide analysers are rather more complicated, and NOxanalysers even more so. The lower availability for the nitrogen oxides in this case

    was in fact due to an extended period at the start of the year when problems were

    being experienced with the analyser.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    2/16

    Table 7.1 Data availability for the 1993 CEH data set

    Measurement Number of hours Percentage of 8760available

    Ozone 8569 97.8Sulphur dioxide 8251 94.2Nitric oxide 7010 80.2Nitrogen oxides 7010 80.2Nitrogen dioxide 7010 80.2Windspeed 8443 96.4Wind direction 8459 96.6Air temperature 8650 98.7

    Solar radiation 8642 98.7

    The raw data for gas concentrations, wind speed and wind direction are shown

    as time sequences inFigure 7.1. Plotting the data in this form is an excellent rapid

    check on whether there are serious outliers (data values lying so far outside

    the normal range that they are probably spurious). Differences in the general

    trends of the concentrations through the year are also apparent. The O3 con-

    centration (Figure 7.1(a)) rose to a general peak in AprilMay before declining

    steadily through to November, and the most common values at any time werearound half the maxima of the hourly means. Sulphur dioxide concentrations

    (Figure 7.1(b)) were typically much smaller, although the maxima were nearly

    as great as for ozone. Typical NO concentrations (Figure 7.1(c)) were low

    throughout the year, although the intermittent maxima were higher than for the

    other gases. NO2 concentrations (Figure 7.1(d)) were high around May and

    November, and particularly low in June/July. There were occasions when the

    concentrations of any of these gases increased and declined very rapidly they

    appear as vertical lines of points on the time series. Superficially, there does not

    appear to be any systematic relationship between the timing of these occasionsfor the different gases. The windspeed (Figure 7.1(e)) declined during the first

    half of the year and then remained low. The wind direction (Figure 7.1(f)) was

    very unevenly distributed, being mainly from around either 200 (just west of

    south), or 0 (north).

    7.2 PERIOD AVERAGES

    The plots shown in Figure 7.1 give an immediate impression of the variationsduring the year, but are not of much use for summarising the values for

    comparison with other sites or with legislated standards, for example. We

    therefore need to undertake further data processing. The most straightforward

    approach is to explore the time variations by averaging the hourly means over

    different periods. First, we need to decide how to handle those missing values.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    3/16

    Figure 7.1 Time series of hourly means for the 1993 CEH data set.

    60

    50

    40

    30

    20

    10

    0

    0 1000 2000 3000 4000 5000 6000 7000 8000Hour of the year

    Ozone hourly means

    Ozoneconcentration/ppb

    60

    50

    40

    30

    20

    10

    0

    0 1000 2000 3000 4000 5000 6000 7000 8000

    Hour of the yearSulphur dioxide hourly means

    SO

    conce

    ntration/ppb

    2

    SO

    concentration/ppb

    2

    0 1000 2000 3000 4000 5000 6000 7000 8000

    Hour of the yearNitric oxide hourly means

    198

    178

    158

    138

    118

    98

    78

    58

    38

    18

    2

    (a)

    (b)

    (c)

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    4/16

    80

    70

    60

    50

    40

    30

    20

    10

    00 1000 2000 3000 4000 5000 6000 7000 8000

    NO

    concentration/ppb

    2

    Hour of the yearNitrogen dioxide hourly means

    16

    14

    12

    10

    8

    6

    4

    00 1000 2000 3000 4000 5000 6000 7000 8000

    Windspee

    d(m/s)

    Hour of the yearWindspeed hourly means

    2

    0 1000 2000 3000 4000 5000 6000 7000 8000

    Hour of the year

    360

    320

    280

    240

    200

    160

    120

    80

    40

    0

    Winddirection/degrees

    (e)

    (f)

    Figure 7.1 Continued.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    5/16

    60

    50

    40

    30

    20

    10

    0

    35

    30

    25

    20

    15

    10

    5

    01 3 5 7 9 1

    113151719 21232527293133353739414345474951

    Week of the yearWeekly means for all pollutants

    35

    30

    25

    20

    15

    10

    5

    0

    1 2 3 4 5 6 7 8 9 10 11 12

    Conc

    entration/ppb

    Month of the yearMonthly means for all pollutants

    Concentration/ppb

    O3SO2NONOxNO2

    O3SO2NONOxNO2

    O ,ppbv3

    SO2,ppbv

    NO,ppbv

    NOx,ppbv

    NO2

    Concentration/ppb

    11835526986103120137154171

    188205222239256273290307324341

    358

    Day of the yearDaily means for all pollutants

    (a)

    (b)

    (c)

    O3

    NOx

    NO2

    SO2

    Figure 7.2 Time series of: (a) daily, (b) weekly and (c) monthly means for the 1993 CEHdataset.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    6/16

    When we plotted the (nominal) 8760 h means, a few missing values did not have

    a big effect on the appearance. As we average over longer periods, the number of

    values decreases to 365 daily means, 52 weekly means and only 12 monthlymeans, so that the effect of missing values becomes proportionately greater.

    Before we start averaging, we must decide on a protocol that includes as much of

    the data as possible, but does not create an average value when there is simply not

    enough data to justify it. For example, consider the calculation of a daily average

    from the 24 individual hourly averages that contribute to it. If one value is

    missing, and the sequence of values before and after is varying smoothly, then it

    is legitimate to substitute the missing value with the average of the adjacent

    values. If the sequence varies erratically, this procedure serves little purpose.

    Instead, we can ignore the missing value, and calculate the daily mean as theaverage of the remaining 23 h, arguing that the whole day was still well repre-

    sented. If only 12 or 8 h remain (as might happen if a faulty instrument was

    reinstated in the early afternoon), then this argument loses credibility and the

    whole day should be discarded. The same idea can be applied to the calculation

    of weekly, monthly and annual averages, with a requirement that, say, 75% of the

    contributing values be present if the average is to be calculated. This philosophy

    must particularly be adhered to when the data is missing in blocks, so that no

    measurements have been taken over significant proportions of the averaging

    period. For example, it is clear fromFigure 7.1(d)that the annual average NO2concentration would not include any of January or February, and therefore might

    not be representative of the year as a whole.

    InFigure 7.2(ac)the 1993 data for gas concentrations are presented as daily,

    weekly and monthly averages respectively. The short-term variations are succes-

    sively reduced by the longer averaging periods, and the trends that we originally

    estimated from the raw data become clearer.

    7.3 ROSES

    In Section 6.1.1 we discussed the influence of wind direction on the pollutant

    concentration at a point. We have analysed the CEH dataset specifically to

    highlight any such dependencies. The 360 of the compass were divided into

    16 sectors of 22.5 each. The hourly means taken when the wind was from

    each sector were then averaged, and the values plotted in the form shown in

    Figure 7.3. These diagrams are known as roses or rosettes. Figure 7.3(a) shows

    that the wind direction was usually from between South and South-west, and

    Figure 7.3(b) that these were also the winds with the highest average speeds.Figure 7.3(c) gives the ozone rose for the year the almost circular pattern is

    expected because ozone is formed in the atmosphere on a geographical scale

    of tens of km, rather than being emitted from local sources. Hence the

    concentration should be relatively free of directional dependence. Sulphur

    dioxide, on the other hand, is a primary pollutant which will influence

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    7/16

    Figure 7.3 Direction roses of wind frequency, wind speed and pollutant gasconcentration

    concentrations downwind of specific sources. Figure 7.3(d) indicates possiblesources to the North, South-east and West-north-west of the measurement site.

    Concentrations from the sector between South and South-west the predomi-

    nant wind direction are the lowest. The roses for NO (Figure 7.3(e)) and NO2(Figure 7.3(f)) are not so well defined. These gases are both primary pollutants

    (dominated by NO) and secondary pollutants (dominated by NO2). Hence we

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    8/16

    can see both patterns of directional dependence, with NO behaving more like

    SO2, and NO2 more like O3.

    Figure 7.4 shows the location of the monitoring site in relation to localtopographical features and pollution sources, knowledge of which can help to

    understand the pollution data. Two factors strongly influence the wind rose the

    Pentland Hills run north-eastsouth-west, and the Firth of Forth estuary

    generates northsouth sea breezes. The combined effect of both factors produces

    the strongly south to south-west wind rose which was seen in Figure 7.3(a). The

    main sources responsible for the primary pollutants are urban areas and roads.

    The city of Edinburgh, which lies 10 km to the north, generates the northerly

    SO2 peak seen on Figure 7.3(d). Although the small town of Penicuik lies close

    to the south, there is apparently no SO2 peak from that direction, nor is there anidentifiable source responsible for the south-east SO2 peak. Detailed inter-

    pretation of such data cannot be made without a detailed source inventory, since

    weak low sources close to the monitor can produce similar signals to those from

    stronger higher more distant sources.

    Figure 7.4 The topography, urban areas and roads around the CEH measurementsite at Penicuik.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    9/16

    7.4 DIURNAL VARIATIONS

    Another way of gaining insight into pollutant occurrence is to average the con-centrations according to the hour of the day. We must be careful, though, to allow

    for the ways in which the characteristics of the days themselves change during the

    year. InFigures 7.5and7.6, we have carried out diurnal analyses for the months

    of June and December respectively.

    In June, the period of daylight is long and the peak solar radiation

    high (Figure 7.5(a)). The air temperature is warm (Figure 7.5(b)) and shows a

    pronounced afternoon increase in response to the solar radiation. Average

    windspeeds are low, with convective winds increasing during daylight

    hours (Figure 7.5(c)). The ozone concentration (Figure 7.5(e)) shows abackground level of about 23 ppb, with photochemical production increasing

    this concentration to a peak of 30 ppb at around 1600. The diurnal variation of

    SO2 (Figure 7.5(f)) is quite different there are sharp peaks centred on 1000 and

    1700 which result from local emissions, and no clear dependence on solar

    radiation. As with the pollutant roses, the diurnal variations of NO and NO2 are

    a blend of these two behaviours.

    Figure 7.6(ah) show the corresponding variations during December. Now,

    the days are short and solar energy input is a minimum. Air temperatures are low

    and barely respond to the sun, windspeeds are higher and almost constant throughthe day. As a consequence of these changes, ozone shows no photochemical

    production in the afternoon. Somewhat surprisingly, SO2 has lost all trace of

    the 1000 and 1700 spikes, although these remain very clear for NO. The pattern

    for NO2 is very similar in December and June.

    7.5 SHORT-TERM EVENTS

    So far in this chapter, we have grouped data in different ways specifically to

    smooth out short-term variations and clarify patterns. We can also benefit from

    a detailed look at shorter periods of measurements. In Figure 7.7are shown the

    time series for one particular period of 300 h (between 4100 and 4400 h, or

    roughly from the 20 June to the 3 July). The wind direction was generally

    southerly, except for two periods of about 50 h each when it swung to the north

    and back several times. When the wind direction changed, there were bursts of

    higher concentrations of NO, NO2 and SO2, and the O3 background concentra-

    tion was disturbed. These changes were probably associated with emissionsfrom a local source that was only upwind of the measurement site when the wind

    was from one particular narrow range of directions. This would not only bring

    the primary pollutants, but also excess NO to react with the O3 and reduce the

    concentration of the latter.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    10/16

    Figure 7.5 Average diurnal variations in June.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    11/16

    Figure 7.6 Average diurnal variations in December.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    12/16

    Figure 7.7 Variations of gas concentration with wind direction over a single period of300h. 2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    13/16

    7.6 FREQUENCY DISTRIBUTIONS

    As discussed inChapter 4, the concentrations of air pollutants often show a log-normal frequency distribution i.e., the logarithms of the concentrations are

    distributed normally. We have analysed the hourly and daily means from the Centre

    for Ecology and Hydrology data set to confirm this. The overall range of the

    concentrations that occurred over the year was divided into subranges, and the

    number of values that fell within each subrange was counted. This is the frequency

    distribution. These counts were then expressed as a percentage of the total number,

    and summed by subrange to give the cumulative frequency distribution. If the

    frequency distribution is log-normal, then the cumulative distribution plots as a

    straight line on log-probability axes. In Figures 7.8 and7.9, the distributions for thehourly and daily means of the five gases are shown. Those for SO2, NO and NOxare quite linear, NO2 less so, and O3 not at all. The O3 distribution is characteristic

    100

    10

    1

    95.0 90.0 70.0 50.0 30.0 10.0 5.0 1.0 0 1.

    Proportion of the time for which the hourly meanconcentration exceeded the value given on the axis/%y

    Gasconcentration/ppb

    O3

    NOx

    NO2

    NO

    SO2

    Figure 7.8 Cumulative frequency distributions of hourly-mean pollutant concentrations.

    2002 Jeremy Colls

    http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/
  • 7/27/2019 TF5651_ch07

    14/16

    of a variable that has a significant background component the concentration was

    not low for as high a proportion of the time as the log-normal form requires.

    Certain statistical parameters can be derived from the log-probability curves.

    Commonly quoted are the concentrations below which the value falls for 50%

    (median), 90%, 98% and 99% of the time. The 98% value of daily means is used

    in European Union Directives on air quality it is equivalent to stating that the

    concentration should not be exceeded for more than seven days in the year. The

    values indicated by Figures 7.8and 7.9 are extracted in Table 7.2. It is clear thatone such parameter alone is not sufficient to define the distribution. If the distri-

    bution is linear, we can measure the gradient, which is equivalent to the standard

    geometric deviation of the sample. Then the median and gradient completely

    define the population distribution. There are more complex formulations of the

    log-normal distribution that can be used to describe non-linear data sets.

    100

    10

    1

    99.9 99.0 95.0 70.0 50.0 10.0 5.0 1.0 0 1.

    Proportion of the time for which the daily meanconcentration exceeded the value on the axis/%y

    Gasconcentra

    tion/ppb O3

    NOx NO2

    NO

    SOz

    90.0 30.0

    Figure 7.9 Cumulative frequency distributions of daily-mean pollutant concentrations.

    2002 Jeremy Colls

  • 7/27/2019 TF5651_ch07

    15/16

    7.7 FURTHER STATISTICAL ANALYSES

    Other standard statistical parameters can be used to describe the data. The

    summary statistics for the 1993 data are given in Table 7.3.

    Finally, we can apply the ideas on the relationships between the period

    maxima that were outlined in Chapter 4. If the maximum 1-h concentration is

    Cmax,1 h, and the maximum over any other period t is Cmax,t, then we should find

    that Cmax,t Cmax,1 h tq, where q is an exponent for the particular gas. For the

    Table 7.2 Percentiles of hourly and daily means

    Pollutant Hourly means Daily means

    50 90 98 99 50 90 98 99(per cent) (per cent)

    O3 24 34 40 42 24 32 36 38SO2 1.5 6 13 15 2 5 9 11NO 1 3 16 27 1 4 13 18NOx 5 21 41 56 7 18 30 39NO2 5 17 27 31 7 14 22 23

    Table 7.3 Summary statistics for the 1993 CEH data set

    Gas Hourly means/ppb Daily means/ppb Weekly means/ppb

    Mean Median Standard Median Standard Median Standarddeviation deviation deviation

    O3 23.6 24.7 9.2 23.8 7.4 23.4 5.8SO

    2

    2.6 1.6 3.1 2.0 2.2 2.7 1.4NO 1.7 0.3 7.5 0.7 3.7 1.3 2.0NOx 9.2 5.8 11.6 7.5 7.8 8.9 5.3NO2 7.5 5.3 6.9 6.7 5.2 7.8 27.2

    Table 7.4 Values ofCmax,t for the different pollutants

    Cmax,t Pollutant

    O3 SO2 NO NOx NO2

    Cmax, 1 h 57 47 185 186 72Cmax, 1 day 43 16 38 57 37Cmax, 1 week 34 7 11 24 16Cmax, 1 month 31 5 7 20 13q 0.095 0.35 0.51 0.35 0.27

    2002 Jeremy Colls

    http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/http://tf5651_ch04.pdf/
  • 7/27/2019 TF5651_ch07

    16/16

    1993 data set, the maximum values over the different averaging periods are

    shown inTable 7.4. Plotting log Cmax,t against log t gives the results shown in

    Figure 7.10, in which the gradients of the lines give the values of q for thedifferent gases.

    1000

    100

    10

    1

    1 10 100

    Number of hours

    Maximum

    concentrationduringperiod/pp

    b

    O3

    NOxNO2 NO

    SO2

    1 hour 1 day 1 week 1 month

    Figure 7.10 Correlations between the maximum period average and the averaging period.