Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université...
-
Upload
joy-richard -
Category
Documents
-
view
225 -
download
0
Transcript of Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université...
![Page 1: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/1.jpg)
Darwin Core Archive (DwC-A) validation: A New Collaborative
EffortChristian Gendreau, Université de Montréal / CanadensysDavid P. Shorthouse, Université de Montréal / Canadensys
Marie-Élise Lecoq, GBIF FranceTim Robertson, GBIF
![Page 2: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/2.jpg)
Darwin Core Archive (DwC-A)
DarwinCore standard does not impose strong rules on the content associated with any DarwinCore terms.
![Page 3: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/3.jpg)
Current GBIF DwC-A Validator
Original goal“… test Darwin Core Archives as specified in the Darwin Core Text Guide.”
http://tools.gbif.org/dwca-validator/
![Page 4: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/4.jpg)
Current GBIF DwC-A Validator
Original targetDwC-A are simple and can be created using simple custom scripts.
“… make sure GBIF and others can read the information as expected.”
![Page 5: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/5.jpg)
Current GBIF DwC-A Validator
• Validates archive structure• Offer web presence– Report viewer– API
![Page 6: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/6.jpg)
Next GBIF DwC-A Validator?
New goalExtends validation to the content of the archive
https://github.com/gbif/dwca-validator
![Page 7: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/7.jpg)
Current content validators
• Atlas of Living Australia sandbox• VertNet – Spatial quality• GBIF Spain – Darwin Test• Encyclopedia of Life – dwc-validator• Scratchpads – dwca-validator• GlobalNames – dwc-archive ruby gem• … much more
See Appendix 1 for links
![Page 8: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/8.jpg)
What we need?
• Accommodate different scopes• Configuration/customizations– Use more knowledge when available
• Web access (page and API)
![Page 9: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/9.jpg)
Scopes
• Data entry• Desktop software– Scientific Work Flow – Statistical software
• Integrated Publishing Toolkit (IPT)• National nodes• Aggregators
![Page 10: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/10.jpg)
Configuration/Customization
• Where the validator will be used?• Can we provide more information?– e.g. I know all the dates in my file should be ISO
![Page 11: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/11.jpg)
Components
• Library• Web• Extension Support
![Page 12: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/12.jpg)
Library
• Define structure for validation process• Provide a validation framework enabling
sharing• Close to DarwinCore specification
![Page 13: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/13.jpg)
Web
• Web page to submit archive or URL• Report viewer• API
![Page 14: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/14.jpg)
Extension Support
• Include domain knowledge• Propose interpreted data
![Page 15: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/15.jpg)
Internals
• Validation types– Structure• Metadata
– Records : Rows• Fields data (e.g. date, coordinates)
– Records : Columns• ID uniqueness
![Page 16: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/16.jpg)
Internals – Record level
• Validation chain– Composed by chain elements– Possible parallelism
![Page 17: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/17.jpg)
Internals – Record level
• Immutable Chain element– Self contained• Never relies on another chain element
– Ordering independent• Same behaviour wherever the element is used in the
chain
But what if I need really ordering?
![Page 18: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/18.jpg)
Internals - Composition
• Composed chain element• Exposed as one chain element
![Page 19: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/19.jpg)
Composition example
• Mandatory Latitude/Longitude– Check record completion on lat/long– Check decimal lat/long value
![Page 20: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/20.jpg)
Configuration example
• Select mandatory DarwinCore terms– scientificName must be provided
• Restrict bounding box– decimalLatitude and decimalLongitude must be
between
![Page 21: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/21.jpg)
Customization example
• Apply your own controlled vocabulary– Use your own dictionary for a term– ControlledVocabularyEvaluationRule
![Page 22: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/22.jpg)
Extension Example
• Suggester, link to narhwal-processor– Suède –> ISO 3166-2:SE – URI –> http://sws.geonames.org/2661886
![Page 23: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/23.jpg)
Collaborative
• Share configuration• Share customization (dictionary)• Implement new reusable component– e.g. validation on specific Dwc-A extension
![Page 24: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/24.jpg)
Collaboration
• Where to go?– https://github.com/gbif/dwca-validator
• Who can contribute?– Everyone
• What is needed?– Ideas, constructive comments– Code review, feedback
![Page 25: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/25.jpg)
Project status
• Not yet released• Command line interface available
Follow the project on GitHub
![Page 26: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/26.jpg)
Acknowledgments
![Page 27: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/27.jpg)
Special thanks
• SiB Colombia• SiB Brazil• Peter Desmet• John Wieczorek• Dag Endresen• …
![Page 28: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/28.jpg)
Appendix 1DwC Content validators
Atlas of Living Australia sandboxhttp://sandbox.ala.org.au/datacheck/
VertNet – Spatial qualityDisplayed on occurrence pages athttp://portal.vertnet.org/search
GBIF Spain – Darwin Testhttp://www.gbif.es/darwin_test/Darwin_Test_in.php
Encyclopedia of Life – dwc-validatorhttp://services.eol.org/dwc_validator/
![Page 29: Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.](https://reader035.fdocuments.net/reader035/viewer/2022062321/56649ebe5503460f94bc8aac/html5/thumbnails/29.jpg)
Appendix 1 - continue
Scratchpads – dwca-validatorhttps://github.com/edwbaker/dwca_validator/
GlobalNames – dwc-archive ruby gemhttps://github.com/GlobalNamesArchitecture/dwc-archive