Risks to Digital Geospatial Data
description
Transcript of Risks to Digital Geospatial Data
Long-Term Preservation of At-Risk Digital Geospatial Data:
The North Carolina Geospatial Data Archiving Project
Steve MorrisNCSU Libraries
Note: Percentages based on the actual number of respondents to each question 2
Risks to Digital Geospatial Data
.shp
.mif
.gml
.e00
.dwg
.dgn
.bsb
.bil
.sid
Note: Percentages based on the actual number of respondents to each question 3
Risks to Digital Geospatial Data
Producer focus on current dataAlso, archiving data does not guarantee “permanent access”
Future support of data formats in questionNeed to migrate formats or allow for emulation
Data failure“Bit rot”, media failure
Preservation metadata requirementsDescriptive, administrative, technical, DRM
Shift to “streaming data” for access
Note: Percentages based on the actual number of respondents to each question 4
Note: Percentages based on the actual number of respondents to each question 5
Note: Percentages based on the actual number of respondents to each question 6
Note: Percentages based on the actual number of respondents to each question 7
Note: Percentages based on the actual number of respondents to each question 8
Today’s geospatial data as tomorrow’s cultural heritage
Note: Percentages based on the actual number of respondents to each question 9
Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC
Note: Percentages based on the actual number of respondents to each question 10
Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002
Note: Percentages based on the actual number of respondents to each question 11
NC Geospatial Data Archiving Project
Partnership between university library (NCSU) and state GIS agency (NCCGIA)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory informationObjective: engage existing state/federal geospatial data infrastructures in preservation
Note: Percentages based on the actual number of respondents to each question 12
Local Government GIS in NC
Data resources are highly distributed and subject to frequent updateMore detailed, current, accurate than federal/state data resourcesNorth Carolina local agency GIS environment
100 counties, 95 with GIS85 counties with high resolution orthophotographyGrowing number of municipal systemsHundreds of millions of dollars investment
Note: Percentages based on the actual number of respondents to each question 13
NCGDAP Targeted Content
Resource TypesGIS “vector” (point/line/polygon) dataDigital orthophotography Digital mapsTabular data (e.g. assessment data)
Content ProducersMostly state, local, regional agenciesSome university, not-for-profit, commercialSelected local federal projects
Note: Percentages based on the actual number of respondents to each question 14
Work plan in a Nutshell
Work from existing data inventoriesNC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt” Partnership: work with existing geospatial data infrastructures (state and federal)Technical approach: blend emerging digital library technologies with geospatial technologies
Metadata: METS, FGDC, PREMIS?, GeoDRM? Repository: Dspace and others
Note: Percentages based on the actual number of respondents to each question 15
Big Challenges
Format migration paths
Management of data versions over time
Preservation metadata
Harnessing geospatial web services
Preserving cartographic representation
Keeping content repository-agnostic
Preserving spatial databases
More …
Note: Percentages based on the actual number of respondents to each question 16
Vector Data Format Issues
Vector data much more complicated than image data
‘Archiving’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access
Piles of XML need to be widely understood piles
GML: need widely accepted application schemas (like OSMM?)
The spatial database conundrumExport feature classes, and lose topology, annotation, relationships, etc.
… or use the spatial database as the primary archival platform (some are now thinking this way)
Note: Percentages based on the actual number of respondents to each question 17
Managing Time-versioned Content
Many local agency data layers continuously updated
E.g., some county cadastral data updated daily—older versions not generally available
Individual versioned datasets will wander off from the archive
How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”?
How do we certify concurrency and agreement between the metadata and the data?
Note: Percentages based on the actual number of respondents to each question 18
Preservation Metadata Issues
FGDC MetadataMany flavors, incoming metadata needs processing
Cross-walk elements to PREMIS, MODS?
Metadata wrapperMETS (Metadata Encoding and Transmission Standard) vs. other industry solutions
Need a geospatial industry solution for the ‘METS-like problem’
GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)
Note: Percentages based on the actual number of respondents to each question 19
Harnessing Geospatial Web Services
Note: Percentages based on the actual number of respondents to each question 20
Harnessing Geospatial Web Services
Automated content identification ‘capabilities files,’ registries, catalog services
WMS (Web Map Service) for batch extraction of image atlases?
last ditch capture option
preserve cartographic representation
retain records of decision-making process
… feature services (WFS) later.
Rights issues in the web services space are ambiguous … GeoDRM in development
Note: Percentages based on the actual number of respondents to each question 21
Preserving Cartographic Representation
Note: Percentages based on the actual number of respondents to each question 22
Preserving Cartographic Representation
The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data:
Intellectual choices about symbolization, layer combinations
Data models, analysis, annotations
Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration
Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem
Note: Percentages based on the actual number of respondents to each question 23
Interest in how geospatial content interacts with widely available digital repository software
Focus on salient, domain-specific issues
Challenge: remain repository agnosticAvoid “imprinting” on repository software environment
Preservation package should not be the same as the ingest object of the first environment
Tension between exploiting repository software features vs. becoming software dependent
Repository Architecture Issues
Note: Percentages based on the actual number of respondents to each question 24
Preserving Spatial Databases
Spatial databases in general vs. ESRI Geodatabase “format”
Not just data layers and attributes—also topology, annotation, relationships, behaviors
ESRI Geodatabase archival issuesXML Export, Geodatabase History, File Geodatabase, Geodatabase Replication
Growing use of geodatabases by municipal, county agencies
Some looking to Geodatabase as archival platform (in addition to feature class export)
Note: Percentages based on the actual number of respondents to each question 25
NCGDAP Philosophy of Engagement
Take the dataas in the mannerIn which it can be obtained
Provide feedback to producer organizations/inform state geospatial infrastructure
“Wrangle”and archivedata
Note the ‘Project’ in ‘North Carolina Geospatial Data ArchivingProject’– the process, the learning experience, and the engagementwith geospatial data infrastructures are more important than the archive
Note: Percentages based on the actual number of respondents to each question 26
Questions?
Contact:
Steve Morris
Head of Digital Library Initiatives
NCSU Libraries
Phone: (919) 515-1361
NCGDAP website: http://www.lib.ncsu.edu/ncgdap/