Ensuring High Quality Data

34
October 1999 PM Data Analysis Workbook: Data Validation 1 The Importance of Data Validation Data Validation Levels Level I: Field and Laboratory Checks Level II: Internal Consistency Checks and Examples Level III/IV: Unusual Value Identification and Examples Validation of PM 2.5 Mass Information to be Provided with PM Sampler Data Ensuring High Quality Data Are Measurements Comparable? National Contract Lab Responsibilities Data Access Sample Size Issues References Appendix: Criteria Tables for PM 2.5 Mass Validation Critical Criteria Table Operational Evaluations Table Systematic Issues “The purpose of data validation is to detect and then verify any data values that may not represent actual air quality conditions at the sampling station.” (U.S. EPA, 1984)

description

The Importance of Data Validation Data Validation Levels Level I: Field and Laboratory Checks Level II: Internal Consistency Checks and Examples Level III/IV: Unusual Value Identification and Examples Validation of PM 2.5 Mass Information to be Provided with PM Sampler Data. - PowerPoint PPT Presentation

Transcript of Ensuring High Quality Data

Page 1: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 1

• The Importance of Data Validation

• Data Validation Levels – Level I: Field and Laboratory

Checks– Level II: Internal Consistency

Checks and Examples– Level III/IV: Unusual Value

Identification and Examples

• Validation of PM2.5 Mass• Information to be Provided

with PM Sampler Data

Ensuring High Quality Data• Are Measurements

Comparable? • National Contract Lab

Responsibilities• Data Access• Sample Size Issues• References• Appendix: Criteria Tables for

PM2.5 Mass Validation– Critical Criteria Table– Operational Evaluations Table– Systematic Issues

“The purpose of data validation is to detect and then verify any data values that may not represent actual air quality conditions at the

sampling station.” (U.S. EPA, 1984)

Page 2: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 2

The Importance of Data Validation

• Data validation is critical because serious errors in data analysis and modeling results can be caused by erroneous individual data values.

• Data validation consists of procedures developed to identify deviations from measurement assumptions and procedures.

• Timely data validation is required to minimize the generation of additional data that may be invalid or suspect and to maximize the recoverable data.

effort to recover data

data recovery

time

Do data validation early!

datacollection

Main et al., 1998

Page 3: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 3

The Importance of Data Validation

• The quality and applicability of data analysis results are directly dependent upon the inherent quality of the data. In other words, data validation is critical because serious errors in data analysis and modeling results can be caused by erroneous individual data values. The EPA's PM2.5 speciation guidance document provides quality requirements for sampling and analysis. The guidance document also discusses data validation including the suggested four-level data validation system. It is the monitoring agency’s responsibility to prevent, identify, correct, and define the consequences of difficulties that might affect the precision and accuracy, and/or the validity, of the measurements.

• Once the quality assured data are provided to data analysts, additional data validation steps need to be taken. Given the newness and complexity of the PM2.5 speciation monitoring and sample analysis methods, errors are likely to pass through the system despite rigorous application of quality assurance and validation measures by the monitoring agencies. Therefore, data analysts should also check the validity of the data before conducting their analyses.

• While some quality assurance and data validation can be performed without a broad understanding of the physical and chemical processes of PM (such as ascertaining that the field or laboratory instruments are operating properly), some degree of understanding of these processes is required. Key issues to understand include PM physical, chemical, and optical properties; PM formation and removal processes; and sampling artifacts, interferences, and limitations. These topics were discussed in the introduction and references therein. The analyst should also understand the measurement uncertainty and laboratory analysis uncertainty. These uncertainties may differ significantly among samplers and analysis methods which, in turn, have an affect on the interpretation and uses of the data (e.g., in source apportionment).

Page 4: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 4

Data Validation Procedures and Tools

Data validation tools for PM are in development

Page 5: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 5

Data Validation Levels

• Level I. Routine checks during the initial data processing and generation of data (e.g., check file identification; review unusual events, field data sheets, and result reports; do instrument performance checks).

• Level II. Internal consistency tests to identify values in the data that appear atypical when compared to values of the entire data set.

• Level III. Current data comparisons with historical data to verify consistency over time.

• Level IV. Parallel consistency tests with data sets from the same population (e.g., region, period of time, air mass) to identify systematic bias.

U.S. EPA, 1999a

Page 6: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 6

Level I: Field and Laboratory Checks

• Verify computer file entries against data sheets.

• Flag samples when significant deviations from measurement assumptions have occurred.

• Eliminate values for measurements that are known to be invalid because of instrument malfunctions.

• Replace data from a backup data acquisition system in the event of failure of the primary system.

• Adjust measurement values of quantifiable calibration or interference bias.

Chow et al., 1996

Page 7: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 7

Level II: Internal Consistency Checks

• Compare collocated samplers (scatter plots, linear regression).

• Check sum of chemical species vs. PM2.5 mass (multielements Al

to U + sulfate + nitrate + ammonium ions + OC + EC - Sulfur).

• Check physical and chemical consistency (sulfate vs. total sulfur, soluble potassium vs. total potassium, soluble chloride vs. chlorine, b abs vs. elemental carbon).

• Balance cations and anions.

• Balance ammonium.

• Investigate nitrate volatilization and adsorption of gaseous organic carbon.

• Prepare material balances and crude mass balances.

Chow, 1998

Page 8: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 8

Level II: Consistency Check Guidelines

IC = ion chromatographyXRF = energy dispersive X-ray fluorescenceAAS = atomic absorption spectrophotometry

Chow, 1998

Consistency Check Expectation

Difference between PM10 and PM2.5 PM2.5 PM10

Sum of individual chemical species and PM2.5 species sum < PM2.5

Ratio of water-soluble sulfate by IC to total sulfurby XRF

~ 3

Ratio of chloride by IC to chlorine by XRF < 1

Ratio of water-soluble potassium by AAS to totalpotassium by XRF

< 1

babs compared to elemental carbon good correlation

Page 9: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 9

SSI 1 ( µg/m3 )

0 50 100 150 200

TE

OM

( µ

g/m

3 )

0

50

100

150

200BakersfieldOct. - Feb.

SSI 1 ( µg/m3 )

0 50 100 150 2000

50

100

150

200SacramentoOct. - Feb.

Collocated ComparisonSSI 1 and TEOM (Winter/Fall)

Slope = 0.61Intercept = 7.4r = 0.82N = 14

Slope = 0.65Intercept = 3.0r = 0.84N = 58

Example: Compare Collocated Samplers• Data from collocated samplers

should be compared - between the same sampler type and different sampler types.

• During the 1995 Integrated Monitoring Study (IMS95) in California, the collocated PM2.5 samplers (same type) at Bakersfield showed excellent agreement.

• SSI 1 and TEOM measurements did not correlate very well during the winter/fall season. The two samplers showed much better agreement during March-September (not shown).

Chow, 1998

1:1

Reg.

Reg. = linear regression fit

Page 10: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 10

Example: Check Sum of Chemical Species vs. PM2.5 Mass

• Compare the sum of species to the PM2.5 mass measurements.

• The comparison shown here indicates an excellent correlation (r=0.98).

• The sum of species concentrations is lower than the reported mass because the sum of species does not include oxygen.

== MultielementsMultielements (from Al to U)(from Al to U)

++ IonsIons (SO(SO44==, NO, NO33

––, NH, NH44++))

++ CarbonCarbon (OC, EC)(OC, EC)

–– SulfurSulfur

______________________

* Exclude * Exclude ClCl–– and K and K++ to avoid double- to avoid double-counting.counting.

Chow, 1998

1:1

Reg.

Page 11: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 11

Example: Check Chemical and Physical Consistency (1 of 2)

• Chemical and physical consistency checks include comparing sulfate with total sulfur (sulfate should be about three times the sulfur concentrations) and comparing soluble potassium with total potassium.

• In the examples shown, the sulfur data compare well while the potassium data comparison shows a considerable amount of scatter.

Sulfate vs. Total Sulfur Soluble Potassium vs. Total Potassium

Chow, 1998

3:11:1

Reg.

Reg.

Page 12: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 12

Example: Check Chemical and Physical Consistency (2 of 2)

• Another consistency check that can be performed (if data are available) is to compare the elemental carbon concentrations with particle absorption (babs) measurements.

• In the example shown, the two measurements agree well.

babs vs. Elemental Carbon

Reg.

Chow, 1998

Page 13: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 13

Example: Anion and Cation Balance

• Equations to calculate anion and cation balance (moles/m3)

Anion equivalence

e = Cl- + NO3- + SO4

=

35.453 62.005 48.03

Cation equivalence

e = Na+ + K+ + NH4+

23.0 39.098 18.04

Plot cation equivalents vs. anion equivalents

Chow 1998

Reg.

Page 14: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 14

Example: Ammonia Balance

• Equations to calculate ammonia balance (g/m3)

Calculated ammonium based on NH4NO3 and NH4HSO4 = 0.29 (NO3

-)+ 0.192 (SO4

=)

Calculated ammonium based on NH4NO3 and (NH4)2SO4 = 0.29 (NO3

-)+ 0.38 (SO4

=)

Plot calculated ammonium vs. measured ammonium for both forms of sulfate

Chow 1998

Page 15: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 15

Example: Nitrate Volatilization Check

• Particularly for the western U.S., the analyst should understand the extent of possible nitrate volatilization in the data set.

• This example shows that nitrate volatilization was significant during the summer.

Chow 1998

San Joaquin Valley, CA

Page 16: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 16

Example: Adsorption of Gaseous OC Check

• Some VOCs evaporate from a filter (negative artifact) during sampling while others are adsorbed (positive artifact).

• The top figure shows the organic carbon (OC) concentrations on the backup filters were frequently 50% or more of the front filter concentrations. The error bars reflect measurement standard deviation.

• The bottom figure shows the ratio of the backup OC to the front filter OC as a function of PM2.5 mass. Relatively larger organic vapor artifacts at lower PM2.5 concentrations suggests that particles provide additional adsorption sites on the front filters (Chow et al., 1996).

Chow 1998

Page 17: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 17

Example: Material Balance

= Geological ( [ 1.89 Al ] + [ 2.14 Si ] + [ 1.4 Ca ] + [ 1.43 Fe ] )

+ Organic carbon ( 1.4 OC )

+ Elemental carbon

+ Ammonium nitrate ( 1.29 NO3

– )

+ Ammonium sulfate ( 1.38 SO4

= )

+ Remaining trace elements (excluding Al, Si, Ca, Fe, and S)

+ Unidentified

Chow 1998

Denver, CO Core Sites

Page 18: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 18

Example: Crude Mass Balance

Calculated Mass =

Geological Material ( [ 1.89 aluminum ] +[ 2.14 silicon ] +[ 1.4 calcium ][ 1.43 iron ] )

+ Combustion Byproducts ( babs 8.6 )

+ Secondary Sulfate ( 3 total elemental sulfur )

a) 06/05/95 (average PM10 mass = 52.8 ± 19.1 µg/m3)

0%

20%

40%

60%

80%

100%

LAMB

LONE

NWAL

GOLF

LONMDONO

NECDCIN

NNCOV

MARK

GROWCLIF

PECOBEM

I

WALN

MIC

H

MCDA

ECHAEFER

NOCO

HAMI

LOSSBIL

L

VAND

SWLC

NWCP

CRAI

THUNLASV

Crustal Combustion Sulfate Others

Industrial Vacant LandResidentialCommercialConstruction

Las Vegas, NV

Chow 1998

• Crude mass balances can be constructed to investigate estimated source contributions.

• Do the crude estimates make sense spatially and temporally?

Site types

Sites

Page 19: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 19

Level III/IV: Unusual Value Identification

• Extreme values

• Values that normally track the values of other variables in a time series

• Values that normally follow a qualitatively predictable spatial or temporal pattern

Chow et al., 1996

The first assumption upon finding a measurement that is inconsistent with physical expectations is that the unusual value is due to a measurement error.

If, upon tracing the path of the measurement, nothing unusual is found, the value can be assumed to be a valid result of an environmental cause.

Page 20: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 20

Example: Unusual Value Identification

• Potassium nitrate (KNO3) is a major component of all fireworks.

• This figure shows all available PM2.5 K+ data from all North American sites, averaged to produce a continental average for each day during 1988-1997.

• Fourth of July celebration fireworks are clearly observed in the potassium time series.

• Fireworks displays on local holidays/events could have a similar affect on data.

Regional averaging and count of sample numbers were conducted in Voyager, using variations of the Voyager script on p. 6 of the Voyager Workbook Kvoy.wkb. Additional averaging and plotting was conducted in Microsoft Excel.

Poirot (1998)

Page 21: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 21

Data Validation Continues During Data Analysis

• Two source apportionment models were applied to PM2.5 data collected in Vermont, and the results of the models were compared.

• Excellent agreement for the selenium source was observed for part of the data while the rest of the results did not agree well.

• Further investigation showed that the period of good agreement coincided with a change in laboratory analysis (with an accompanying change in detection limit and measurement uncertainty - the two models treat these quantities differently.)

Poirot, 1999

Page 22: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 22

Validation of PM2.5 Mass

• Consistent validation of PM2.5 mass concentrations across the U.S. is needed. To aid in this, three tables of criteria were developed and are provided in the appendix to this section of the workbook.

• Observations that do not meet each and every criterion on the Critical Criteria Table should be invalidated unless there are compelling reasons and justification not to do so.

• Criteria that are important for maintaining and evaluating the quality of the data collection system are included in the Operational Evaluations Table. Violation of a criterion or a number of criteria may be cause for invalidation.

• Criteria important for the correct interpretation of the data but that do not usually impact the validity of a sample or group of samples are included on the Systematic Issues Table.

U.S. EPA, 1999c

Page 23: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 23

Information to be Provided with PM Sampler Data

Measurement VariationsFlow rate 30-s max interval, average for sample period, CV for

period, 5-min. average out-of-specificationsSample volumeAmbient temperature 30-s interval; min, max, average for periodBarometric pressure 30-s interval; min, max, average for periodFilter temperature 30-s interval; 30-s interval differential out-of-spec.; max

differential from ambient, date and time of occurrenceDate and timeSample start and stop timesettingsSample period start timeElapsed sample time Actual and out-of-spec.1-min. Power interruptions Start time of first 10User-entered information For example, sampler and site identification

40 CFR 50 Appendix L, Table L-1

These supplemental measurements will be useful to help explain or caveat unusual data

Page 24: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 24

Are Measurements Comparable?

• Example comparison of 24-hr average TEOM (from hourly measurements), IMPROVE (gravimetric mass from the A filter), and FRM PM2.5 mass measurements made in New Haven, CT during the third and fourth quarters of 1998.

• During the colder months at this site, the TEOM seems to report a lower concentration than the FRM.

PM2.5 Mass measurements in New Haven, CT 1998

0

10

20

30

40

50

60

70

07/0

1/19

98

07/1

5/19

98

07/2

9/19

98

08/1

2/19

98

08/2

6/19

98

09/0

9/19

98

09/2

3/19

98

10/0

7/19

98

10/2

1/19

98

11/0

4/19

98

11/1

8/19

98

12/0

2/19

98

12/1

6/19

98

12/3

0/19

98

g/m

3

TEOM

IMPROVE

FRM

TEOM, IMPROVE and FRM mass for PM2.5 at New

Haven, CT July-December 1998

0

5

10

15

20

25

30

35

40

45

50

0 5 10 15 20 25 30 35 40 45 50

FRM Mass (g/m3)

Mas

s (

g/m

3)

IMPROVE

TEOM

PM2.5 average values (mg/m3) New Haven, CT 1998 (No. of samples in the calculated average). For example, in the third quarter, TEOM and IMPROVE samples ran concurrently on 24 days. The ten values where all three samplers ran are a subset of the 24.

Graham, 1999

Page 25: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 25

National Contract Lab Responsibilities

Discussion of national contract laboratory responsibilities to be added.

Page 26: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 26

Data Access (1 of 2)

Official data sources: – AIRS Data via public web at http://www.epa.gov/airsdata

– AIRS Air Quality System (AQS) via registered users register with EPA/NCC (703-487-4630)

– PM2.5 websites via public web

– PM2.5 Data Analysis Workbook at http://capita.wustl.edu/databases/userdomain/pmfine/

– EPA PM2.5 Data Analysis clearinghouse at

http://www.epa.gov/oar/oaqps/pm25/

– Northern Front Range Air Quality Study at http://nfraqs.cira.colostate.edu/index2.html

– NEARDAT at http://capita.wustl.edu/NEARDAT

Page 27: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 27

Data Access (2 of 2)

Secondary data sources:– Meteorological parameters from National Weather

Service (NWS) http://www.nws.noaa.gov

– Meteorological parameters from PAMS/AIRS AQS register with EPA/NCC (703-487-4630)

– Collocated or nearby SO2, nitrogen oxides, CO, VOC from AIRS AQS

– Private meteorological agencies (e.g., forestry service, agricultural monitoring, industrial facilities)

Page 28: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 28

Sample Size Issues

How complete must data be to show that an area meets the NAAQS for PM?

U.S. EPA, 1999b

Sample size requirements for data analyses will vary depending upon the analysis type, the analysis goals, the variability in the data, and other factors.

Standard Data completeness to show you meet the standardsDaily PM2.5 Single site: at least 75% of the scheduled sampling days per quarterDaily PM10 Single site: at least 75% of the scheduled sampling days per quarterAnnual PM2.5 Single site: if each quarter has at least 75% of the scheduled sampling days, the

annual mean for that year and site is validCommunity monitoring zone: In each of the three years, at least one site musthave a valid annual mean. The valid sites may be the same every year, or mayvary from year to year.

Annual PM10 Single site: at least 75% of the scheduled sampling days per quarter.

Page 29: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 29

Summary

• Data validation is vital because serious errors in data analysis and modeling results can be caused by erroneous individual data values.

• This workbook section provides a discussion of data validation levels, example validation checks, and other information important to the data validation process.

Page 30: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 30

ReferencesAyers G.P., Keywood M.D., Gras J.L. (1999) TEOM vs. manual gravimetric methods for determination of PM2.5 aerosol mass concentrations.

Atmos. Environ., 33, pp. 3717-3721.

Chow J.C. and J.G. Watson (1998) Guideline on speciated particulate monitoring. Draft report 3 prepared by Desert Research Institute for the U.S. EPA Office of Air Quality Planning and Standards. August.

Chow J.C. (1998) Descriptive data analysis methods. Presentation prepared by Desert Research Institute for the U.S. EPA in Research Triangle Park, November.

Chow J.C., J.G. Watson, Z. Lu, D.H. Lowenthal, C.A. Frazier, P.A. Solomon, R.H. Thuillier, K. Magliano (1996) Descriptive analysis of PM2.5 and PM10 at regionally representative locations during SJVAQS/AUSPEX. Atmos. Environ., Vol. 30, No. 12, 2079-2112.

Chow J.C. (1995) Measurement methods to determine compliance with ambient air quality standards for suspended particles. J. Air Waste Manage. Assoc., 45, pp.320-382.

Graham, J. (1999) personal communication.

Homolya J.B., Rice J., Scheffe R.D. (1998) PM2.5 speciation - objectives, requirements, and approach. Presentation. September.

Main H.H., Chinkin L.R., and Roberts P.T. (1998) PAMS data analysis workshops: illustrating the use of PAMS data to support ozone control programs. Web page prepared for the U.S. Environmental Protection Agency, Research Triangle Park, NC by Sonoma Technology, Inc., Petaluma, CA, <http://www.epa.gov/oar/oaqps/pams/analysis> STI-997280-1824, June.

Poirot R. (1999) personal communication

Poirot R. (1998) Tracers of opportunity: Potassium. Paper available at http://capita.wustl.edu/PMFine/Workgroup/SourceAttribution/Reports/In-progress/Potass/ktext.html

U.S. Environmental Protection Agency (1984) Quality assurance handbook for air pollution measurement systems, volume ii: ambient air specific methods (interim edition), EPA/600/R-94/0386, April.

U.S. Environmental Protection Agency(1999a) Particulate matter (PM2.5) speciation guidance document. Available at http://www.epa.gov/ttn/amtic/files/ambient/pm25/spec/specpln3.pdf

U.S. Environmental Protection Agency(1999b) Guideline on data handling conventions for the PM NAAQS. EPA-454/R-99-008, April.

U.S. Environmental Protection Agency(1999c) PM2.5 mass validation criteria. Available at http://www.epa.gov/ttn/amtic/pmqa.html

Page 31: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 31

Critical Criteria Table

U.S. EPA, 1999c

Page 32: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 32

Operational Evaluations Table (1 of 2)

U.S. EPA, 1999c

Page 33: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 33

Operational Evaluations Table (2 of 2)

U.S. EPA, 1999c

Page 34: Ensuring High Quality Data

October 1999 PM Data Analysis Workbook: Data Validation 34

Systematic Issues

U.S. EPA, 1999c