TROY SURVEYING & GEOMATICS SCIENCES 1 A GIS FOR DAUPHIN ISLAND TROY UNIVERSITY.
Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.
-
Upload
claud-robbins -
Category
Documents
-
view
252 -
download
1
Transcript of Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.
Lecture 23:Brief Introduction to Data quality
By Austin Troy
------Using GIS--Introduction to GIS
©2005 Austin Troy
Data Quality•Two key components of quality in data are accuracy and precision
•Error is a result of both inaccuracy and imprecision in the data; it is a general term encompassing lack of reliability
•GIS data quality is, in theory, a compromise between needs and costs
•In practice it is usually about what is available
Introduction to GIS
©2005 Austin Troy
Data Quality•Cost of data is a reflection of that precision:
•Because lower-quality data tend to be cheaper and more available, a very common problem in GIS is the inappropriate use of data
•A critical step in developing a GIS is deciding “what is accurate enough?”
•This is function of needs, cost, accessibility and time
•User needs determine accuracy and, in general, accuracy determines price
Introduction to GIS
©2005 Austin Troy
AccuracyWhat is accuracy?
• “the degree to which information on a map or in a digital database matches true or accepted values.”
• From Kenneth E. Foote and Donald J. Huebner http://www.colorado.edu/geography/gcraft/notes/error/error_f.html
• It is also a reflection of how close a measurement represent the actual quantity measured
• Accuracy is a reflection of the number and severity of errors in a dataset or map.
Introduction to GIS
©2005 Austin Troy
Precision• Quality is also a function of “precision”
• Precision is the intensity or level of preciseness, or exactitude in measurements. The more precise a measurement is, the smaller the unit which you intend to measure
• Hence, a measurement down to a fraction of a cm is more precise than a measurement to a cm
• However, data with a high level of precision can still be inaccurate—this is due to errors
• Each application requires a different level of precision
Introduction to GIS
©2005 Austin Troy
Precision• Each application requires a different level of precision
• Engineering and surveying applications typically require highly levels of precision; they may be measuring to a millimeter
• On the other end of the spectrum, studies of weather patterns, or crop cover require much less precision
• Precise data are costly: for example carefully surveyed point locations needed by utilities to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect
Introduction to GIS
©2005 Austin Troy
Positional Accuracy and Precision
• One of the primary types of error in GIS is positional error—that is, errors in 2D (x,y) and in the 3rd dimension (height)
• Positional accuracy and precision are functions of the scale at which the digital layer was created
• If created from digitizing a paper map, the minimum usable scale of the digital layer is considered the scale of that map
• Scale is a function of the map’s resolution
Introduction to GIS
©2005 Austin Troy
Positional Accuracy• Positional accuracy standards specify that
acceptable positional error varies with scale
• Data can have high level of precision but still be positionally inaccurate
• Positional error is inversely related to precision and to amount of processing
Introduction to GIS
©2005 Austin Troy
Measurement of AccuracyAccuracy is often stated as a confidence interval: e.g.
104.2 cm +/- .01 means true value lies between 104.21 and 104.19
One of the key measurements of positional accuracy is root mean squared error (MSE); equals squared difference between observed and expected value for observation i divided by total number of observations, summed across each observation i
This is just a standardized measure of error—how close the predicted measure is to observed
Introduction to GIS
©2005 Austin Troy
Positional Error• Different agencies have different standards for
positional error
• Example: USGS horizontal positional requirements state that 90% of all points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000
Introduction to GIS
©2005 Austin Troy
Positional Error• USGS Accuracy standards on the ground:
1:4,800 ± 13.33 feet
1:10,000 ± 27.78 feet
1:12,000 ± 33.33 feet
1:24,000 ± 40.00 feet
1:63,360 ± 105.60 feet
1:100,000 ± 166.67 feet
Introduction to GIS
See image from U. Colorado showing accuracy standards visually
Hence, a point on a map represents the center of a spatial probability distribution of its possible locations
Thanks to Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder for links
©2005 Austin Troy
Positional Error• A critical point is to remember that “zooming” in a
digital map does not increase the level of accuracy
• The accuracy and precision are based on the scale of the digital layer’s original parent source
• To see this, let’s look at river data derived from sources at three scales and three levels of precision
• 1:2,000,000- small scale
• 1:100,000- medium scale
• 1:24,000-large scale
Introduction to GIS
©2005 Austin Troy
Positional Error-some examples
Introduction to GIS
©2005 Austin Troy
Attribute Precision• Attribute accuracy and precision refer to quality of
non-spatial, attribute data
• Precision for numeric data means lots of digits
• Example: recording income down to cents, rather than just dollars
• Precision for categorical data means lots of categories
• Example: Anderson LU level 3 versus level 1
Introduction to GIS
©2005 Austin Troy
Conceptual Accuracy• Misclassification result from differences in judgment
or in the automated classification tools
• The accuracy of classifications will depend on the precision.
• The less precise your classifications, the less likely there will be errors
• If just classifying as “land and water”, that is not very precise, and not likely to result in an error
Introduction to GIS
©2005 Austin Troy
Other measures of data quality• Logical consistency
• Completeness
• Data currency/timeliness
• Accessibility
• These apply to both attribute and positional data
Introduction to GIS
©2005 Austin Troy
Logical Consistency• Do data follow rules of logic?
• Attribute Example: is something classified as both water and as commercially zoned land?
• Geospatial example: Do lines intersect when they should not (eg. With power lines)? Do polygons not close on themselves
Introduction to GIS
©2005 Austin Troy
Completeness• Is a data layer complete or lacking in coverage?
• Examples: does a layer on roads leave out some roads? If so, does it do so systematically or randomly? Does a database of buildings in a city leave out some buildings?
• Examples where completeness is crucial: a database of houses used to notify neighbors when a noxious facility is proposed? Imagine if a bunch of people were left out?
Introduction to GIS
©2005 Austin Troy
Currency and Timeliness• Since some things change faster than others, the
importance of timeliness in data depends on what is being displayed
• By the time they have been digitized, they are often out of date ; e.g. tax parcels
• Updates are key, but the frequency of updates should depend on what is being displayed.
• Temporal validity must be stated: this tells someone using a map how long the data are considered valid
Introduction to GIS
©2005 Austin Troy
Currency and Timeliness
Introduction to GIS
©2005 Austin Troy
Currency and Timeliness
Introduction to GIS
• Streets are another data set where currency is important; blue represents all the additional streets built between 1990 and 2000
©2005 Austin Troy
Conflation• When one layer is better in one way and another is
better in another and you wish to get the best of both
• Way of reconciling best geometric and attribute features from two layers into a new one
• Very commonly used for case where one layer has better attribute accuracy or completeness and another has better geometric accuracy or resolution
• Also used where newer layer is produced for some theme but is has lower resolution than older one
Introduction to GIS
©2005 Austin Troy
Two general types of Conflation• Attribute conflation: transferring attributes from
an attribute rich layer to features in an attribute poor layer
• Feature conflation: improvement of features in one layer based on coordinates and shapes in another, often called rubber sheeting. User either transforms all features or specifies certain features to be kept fixed
Introduction to GIS
©2005 Austin Troy
Attribute conflation• More spatially accurate layer is referred to as the
base, coordinate or target layer
• Layer with more accurate attribution is referred to as the reference, or non-base layer
• TIGER line files: good attribution, poor accuracy; USGS DLGs: opposite. Attribute conflation is frequently used by third party vendors to assign the rich attribute data of TIGER to the positionally accurate DLGs. Nodes are matched by iteratively rubber sheeting the reference layer to the base layer until matching nodes fall within certain tolerance. Then line features are matched up.
Introduction to GIS
©2005 Austin Troy
Conflation examples
Introduction to GIS
Source: Stanley Dalal, GIS cafe