Geographic data validation
description
Transcript of Geographic data validation
![Page 1: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/1.jpg)
Geographic data
validation
![Page 2: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/2.jpg)
Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations
![Page 3: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/3.jpg)
Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations
![Page 4: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/4.jpg)
Basic concepts• Quality• Faithful representation of a feature• Quality of data related to quality of output• GIGO principle• Data have the potential to be used in ways
unforeseen when collected.• The value of the data is directly related to the
fitness for a variety of uses.
![Page 5: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/5.jpg)
Basic concepts• Fitness-for-use• The suitability of a set of data for a specific
purpose• A.K.A. usability• Should not be confused with quality• Quality: Abstract• Usability: Specific• Low-quality dataset may be of a high usability
![Page 6: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/6.jpg)
Basic concepts• Precision
o Closeness of repeated measurements to a given value, either correct or not
• Accuracyo Closeness of a measurement to the true value
![Page 7: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/7.jpg)
Precision vs Accuracy
![Page 8: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/8.jpg)
Basic concepts• Precision
o Closeness of repeated measurements to a given value, either correct or not
• Accuracyo Closeness of a measurement to the true value
• Precision is an intrinsic value• Accuracy depends on knowing the true value of
the variable• Data validation: assessing the accuracy• Compare against a reference value
![Page 9: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/9.jpg)
Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations
![Page 10: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/10.jpg)
Why do we need validation?
![Page 11: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/11.jpg)
Why do we need validation?
![Page 12: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/12.jpg)
Why do we need validation?
• This was a striking example, but more subtle issues can (and actually do) happen
• We need to develop techniques and methodologies to explore the data
• In other words, we need to validate the data• Validating gives a sense of the reliability of the
records, and clues on how to improve it
![Page 13: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/13.jpg)
Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations
![Page 14: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/14.jpg)
How to assess?• Depending on the aim of the assessment,
different techniques• Remember that high quality datasets are more
likely to show high fitness-for-use• Ideally, check for quality• If we know the purpose, check for its fitness
![Page 15: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/15.jpg)
How to assess?• Work with geographic information a la
DarwinCore• Work with individual records as well as collections
of data• Start with the most basic pieces of information• Look for coherence with other pieces of
information• If not, why?• Make modifications of information to see if they
fit• In more advanced levels, make use of available
taxonomic or temporal information
![Page 16: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/16.jpg)
How to assess?• Tools• Spreadsheet: Microsoft Excel, LibreOffice Calc…
o Well-known environmento Visually easy
• Open Refineo Spreadsheet-like, but with some enhanced features
• Scriptso Database scripts: work directly at the sourceo Other programming language: enhanced capabilities
• GIS softwareo Often linked with other tools, such as spreadsheets or scripts
![Page 17: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/17.jpg)
![Page 18: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/18.jpg)
Visualizations• Visual exploration of record set• Useful for a first-level assessment• Primary visualization for geographic data: maps• Next picture has several issues that can be
detected using a map…
![Page 19: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/19.jpg)
![Page 20: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/20.jpg)
Coordinate transposition
• This happens when latitude is stored in longitude field and vice-versa
• Usually difficult to detect on a one-by-one basis• But when looked at the whole picture…
![Page 21: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/21.jpg)
Zero vs Null• One of the most common issues• Storing 0 (zero) instead of leaving the field empty• This happens with some data management
systems• Latitude 0 and longitude 0 are stored meaning
“unknown coordinates”• But we do not know that, that is not what the
standard says
![Page 22: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/22.jpg)
Negation• Forgetting or altering the positive/negative of the
coordinates• Usually forgetting the minus sign• The most common source: transforming from
DMS to DD, without taking “W” or “S” into account
![Page 23: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/23.jpg)
Check against country• The easiest way of checking these issues is to
check if the coordinates fall inside the specified country…
• Of course, if we have a country value to check against
• Two ways• Use GIS software• Use webservices like geonames (we will see this
in the openRefine session)
![Page 24: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/24.jpg)
Georeferencing• Intermediate check• If we have locality information and coordinates,
we can check if they match• Georeferencing is a tough task, and prone to
uncertainties, so some level of imprecision is to be expected
• Make good use of the “uncertainty” fields in DarwinCore!
• But still…
![Page 25: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/25.jpg)
55.932576, 13.132359Anahuac NWR (UTC 049)GrandvillePOINT(-1.3223333 53.44958)Marine Nature Study Area78º 47’ 52” S; 35º 50’ 31” EStewart ParkPOINT(-1.1735004 53.358746)BackyardMy Habitat55.932576, 13.132359Wilderness Park, north of 14th St.28054Delaney Conservation Area57.3, 11.9
![Page 26: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/26.jpg)
Multi-domain checks• Using information from different sources to check
quality• Especially use taxonomic information to improve
geospatial data• Most basic example: check data against range
map• If point falls inside range map of the specified
species, OK• Sometimes, temporal information is useful
![Page 27: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/27.jpg)
Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations
![Page 28: Geographic data validation](https://reader035.fdocuments.net/reader035/viewer/2022062517/56813c7f550346895da620ec/html5/thumbnails/28.jpg)
Considerations• NEVER modify the original data• Data cleaning is a human task, and thus, it is not
error-free• Information we believe is wrong may be right• Make an “improved copy” of the data• Or “flag” the records as inaccurate
• Re-share the improvements• With the community: so that others don’t have to re-
invent the wheel• With the original owners of the data: so that they can
correct the errors at the source