Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

25
Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS

Transcript of Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

Page 1: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

Lecture 23:Brief Introduction to Data quality

By Austin Troy

------Using GIS--Introduction to GIS

Page 2: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Data Quality•Two key components of quality in data are accuracy and precision

•Error is a result of both inaccuracy and imprecision in the data; it is a general term encompassing lack of reliability

•GIS data quality is, in theory, a compromise between needs and costs

•In practice it is usually about what is available

Introduction to GIS

Page 3: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Data Quality•Cost of data is a reflection of that precision:

•Because lower-quality data tend to be cheaper and more available, a very common problem in GIS is the inappropriate use of data

•A critical step in developing a GIS is deciding “what is accurate enough?”

•This is function of needs, cost, accessibility and time

•User needs determine accuracy and, in general, accuracy determines price

Introduction to GIS

Page 4: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

AccuracyWhat is accuracy?

• “the degree to which information on a map or in a digital database matches true or accepted values.”

• From Kenneth E. Foote and Donald J. Huebner http://www.colorado.edu/geography/gcraft/notes/error/error_f.html

• It is also a reflection of how close a measurement represent the actual quantity measured

• Accuracy is a reflection of the number and severity of errors in a dataset or map.

Introduction to GIS

Page 5: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Precision• Quality is also a function of “precision”

• Precision is the intensity or level of preciseness, or exactitude in measurements. The more precise a measurement is, the smaller the unit which you intend to measure

• Hence, a measurement down to a fraction of a cm is more precise than a measurement to a cm

• However, data with a high level of precision can still be inaccurate—this is due to errors

• Each application requires a different level of precision

Introduction to GIS

Page 6: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Precision• Each application requires a different level of precision

• Engineering and surveying applications typically require highly levels of precision; they may be measuring to a millimeter

• On the other end of the spectrum, studies of weather patterns, or crop cover require much less precision

• Precise data are costly: for example carefully surveyed point locations needed by utilities to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect

Introduction to GIS

Page 7: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Accuracy and Precision

• One of the primary types of error in GIS is positional error—that is, errors in 2D (x,y) and in the 3rd dimension (height)

• Positional accuracy and precision are functions of the scale at which the digital layer was created

• If created from digitizing a paper map, the minimum usable scale of the digital layer is considered the scale of that map

• Scale is a function of the map’s resolution

Introduction to GIS

Page 8: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Accuracy• Positional accuracy standards specify that

acceptable positional error varies with scale

• Data can have high level of precision but still be positionally inaccurate

• Positional error is inversely related to precision and to amount of processing

Introduction to GIS

Page 9: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Measurement of AccuracyAccuracy is often stated as a confidence interval: e.g.

104.2 cm +/- .01 means true value lies between 104.21 and 104.19

One of the key measurements of positional accuracy is root mean squared error (MSE); equals squared difference between observed and expected value for observation i divided by total number of observations, summed across each observation i

This is just a standardized measure of error—how close the predicted measure is to observed

Introduction to GIS

Page 10: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Error• Different agencies have different standards for

positional error

• Example: USGS horizontal positional requirements state that 90% of all points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000

Introduction to GIS

Page 11: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Error• USGS Accuracy standards on the ground:

1:4,800 ± 13.33 feet

1:10,000 ± 27.78 feet

1:12,000 ± 33.33 feet

1:24,000 ± 40.00 feet

1:63,360 ± 105.60 feet

1:100,000 ± 166.67 feet

Introduction to GIS

See image from U. Colorado showing accuracy standards visually

Hence, a point on a map represents the center of a spatial probability distribution of its possible locations

Thanks to Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder for links

Page 12: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Error• A critical point is to remember that “zooming” in a

digital map does not increase the level of accuracy

• The accuracy and precision are based on the scale of the digital layer’s original parent source

• To see this, let’s look at river data derived from sources at three scales and three levels of precision

• 1:2,000,000- small scale

• 1:100,000- medium scale

• 1:24,000-large scale

Introduction to GIS

Page 13: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Positional Error-some examples

Introduction to GIS

Page 14: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Attribute Precision• Attribute accuracy and precision refer to quality of

non-spatial, attribute data

• Precision for numeric data means lots of digits

• Example: recording income down to cents, rather than just dollars

• Precision for categorical data means lots of categories

• Example: Anderson LU level 3 versus level 1

Introduction to GIS

Page 15: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Conceptual Accuracy• Misclassification result from differences in judgment

or in the automated classification tools

• The accuracy of classifications will depend on the precision.

• The less precise your classifications, the less likely there will be errors

• If just classifying as “land and water”, that is not very precise, and not likely to result in an error

Introduction to GIS

Page 16: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Other measures of data quality• Logical consistency

• Completeness

• Data currency/timeliness

• Accessibility

• These apply to both attribute and positional data

Introduction to GIS

Page 17: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Logical Consistency• Do data follow rules of logic?

• Attribute Example: is something classified as both water and as commercially zoned land?

• Geospatial example: Do lines intersect when they should not (eg. With power lines)? Do polygons not close on themselves

Introduction to GIS

Page 18: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Completeness• Is a data layer complete or lacking in coverage?

• Examples: does a layer on roads leave out some roads? If so, does it do so systematically or randomly? Does a database of buildings in a city leave out some buildings?

• Examples where completeness is crucial: a database of houses used to notify neighbors when a noxious facility is proposed? Imagine if a bunch of people were left out?

Introduction to GIS

Page 19: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Currency and Timeliness• Since some things change faster than others, the

importance of timeliness in data depends on what is being displayed

• By the time they have been digitized, they are often out of date ; e.g. tax parcels

• Updates are key, but the frequency of updates should depend on what is being displayed.

• Temporal validity must be stated: this tells someone using a map how long the data are considered valid

Introduction to GIS

Page 20: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Currency and Timeliness

Introduction to GIS

Page 21: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Currency and Timeliness

Introduction to GIS

• Streets are another data set where currency is important; blue represents all the additional streets built between 1990 and 2000

Page 22: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Conflation• When one layer is better in one way and another is

better in another and you wish to get the best of both

• Way of reconciling best geometric and attribute features from two layers into a new one

• Very commonly used for case where one layer has better attribute accuracy or completeness and another has better geometric accuracy or resolution

• Also used where newer layer is produced for some theme but is has lower resolution than older one

Introduction to GIS

Page 23: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Two general types of Conflation• Attribute conflation: transferring attributes from

an attribute rich layer to features in an attribute poor layer

• Feature conflation: improvement of features in one layer based on coordinates and shapes in another, often called rubber sheeting. User either transforms all features or specifies certain features to be kept fixed

Introduction to GIS

Page 24: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Attribute conflation• More spatially accurate layer is referred to as the

base, coordinate or target layer

• Layer with more accurate attribution is referred to as the reference, or non-base layer

• TIGER line files: good attribution, poor accuracy; USGS DLGs: opposite. Attribute conflation is frequently used by third party vendors to assign the rich attribute data of TIGER to the positionally accurate DLGs. Nodes are matched by iteratively rubber sheeting the reference layer to the base layer until matching nodes fall within certain tolerance. Then line features are matched up.

Introduction to GIS

Page 25: Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

©2005 Austin Troy

Conflation examples

Introduction to GIS

Source: Stanley Dalal, GIS cafe