Risks to Digital Geospatial Data

26
Long-Term Preservation of At-Risk Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steve Morris NCSU Libraries

description

Long-Term Preservation of At-Risk Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steve Morris NCSU Libraries. Risks to Digital Geospatial Data. .shp. .mif. .gml. .e00. .dwg. .dgn. .bsb. .bil. .sid. Risks to Digital Geospatial Data. - PowerPoint PPT Presentation

Transcript of Risks to Digital Geospatial Data

Page 1: Risks to Digital Geospatial Data

Long-Term Preservation of At-Risk Digital Geospatial Data:

The North Carolina Geospatial Data Archiving Project

Steve MorrisNCSU Libraries

Page 2: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 2

Risks to Digital Geospatial Data

.shp

.mif

.gml

.e00

.dwg

.dgn

.bsb

.bil

.sid

Page 3: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 3

Risks to Digital Geospatial Data

Producer focus on current dataAlso, archiving data does not guarantee “permanent access”

Future support of data formats in questionNeed to migrate formats or allow for emulation

Data failure“Bit rot”, media failure

Preservation metadata requirementsDescriptive, administrative, technical, DRM

Shift to “streaming data” for access

Page 4: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 4

Page 5: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 5

Page 6: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 6

Page 7: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 7

Page 8: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 8

Today’s geospatial data as tomorrow’s cultural heritage

Page 9: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 9

Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC

Page 10: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 10

Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002

Page 11: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 11

NC Geospatial Data Archiving Project

Partnership between university library (NCSU) and state GIS agency (NCCGIA)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory informationObjective: engage existing state/federal geospatial data infrastructures in preservation

Page 12: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 12

Local Government GIS in NC

Data resources are highly distributed and subject to frequent updateMore detailed, current, accurate than federal/state data resourcesNorth Carolina local agency GIS environment

100 counties, 95 with GIS85 counties with high resolution orthophotographyGrowing number of municipal systemsHundreds of millions of dollars investment

Page 13: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 13

NCGDAP Targeted Content

Resource TypesGIS “vector” (point/line/polygon) dataDigital orthophotography Digital mapsTabular data (e.g. assessment data)

Content ProducersMostly state, local, regional agenciesSome university, not-for-profit, commercialSelected local federal projects

Page 14: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 14

Work plan in a Nutshell

Work from existing data inventoriesNC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt” Partnership: work with existing geospatial data infrastructures (state and federal)Technical approach: blend emerging digital library technologies with geospatial technologies

Metadata: METS, FGDC, PREMIS?, GeoDRM? Repository: Dspace and others

Page 15: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 15

Big Challenges

Format migration paths

Management of data versions over time

Preservation metadata

Harnessing geospatial web services

Preserving cartographic representation

Keeping content repository-agnostic

Preserving spatial databases

More …

Page 16: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 16

Vector Data Format Issues

Vector data much more complicated than image data

‘Archiving’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access

Piles of XML need to be widely understood piles

GML: need widely accepted application schemas (like OSMM?)

The spatial database conundrumExport feature classes, and lose topology, annotation, relationships, etc.

… or use the spatial database as the primary archival platform (some are now thinking this way)

Page 17: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 17

Managing Time-versioned Content

Many local agency data layers continuously updated

E.g., some county cadastral data updated daily—older versions not generally available

Individual versioned datasets will wander off from the archive

How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”?

How do we certify concurrency and agreement between the metadata and the data?

Page 18: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 18

Preservation Metadata Issues

FGDC MetadataMany flavors, incoming metadata needs processing

Cross-walk elements to PREMIS, MODS?

Metadata wrapperMETS (Metadata Encoding and Transmission Standard) vs. other industry solutions

Need a geospatial industry solution for the ‘METS-like problem’

GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)

Page 19: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 19

Harnessing Geospatial Web Services

Page 20: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 20

Harnessing Geospatial Web Services

Automated content identification ‘capabilities files,’ registries, catalog services

WMS (Web Map Service) for batch extraction of image atlases?

last ditch capture option

preserve cartographic representation

retain records of decision-making process

… feature services (WFS) later.

Rights issues in the web services space are ambiguous … GeoDRM in development

Page 21: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 21

Preserving Cartographic Representation

Page 22: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 22

Preserving Cartographic Representation

The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data:

Intellectual choices about symbolization, layer combinations

Data models, analysis, annotations

Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration

Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

Page 23: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 23

Interest in how geospatial content interacts with widely available digital repository software

Focus on salient, domain-specific issues

Challenge: remain repository agnosticAvoid “imprinting” on repository software environment

Preservation package should not be the same as the ingest object of the first environment

Tension between exploiting repository software features vs. becoming software dependent

Repository Architecture Issues

Page 24: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 24

Preserving Spatial Databases

Spatial databases in general vs. ESRI Geodatabase “format”

Not just data layers and attributes—also topology, annotation, relationships, behaviors

ESRI Geodatabase archival issuesXML Export, Geodatabase History, File Geodatabase, Geodatabase Replication

Growing use of geodatabases by municipal, county agencies

Some looking to Geodatabase as archival platform (in addition to feature class export)

Page 25: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 25

NCGDAP Philosophy of Engagement

Take the dataas in the mannerIn which it can be obtained

Provide feedback to producer organizations/inform state geospatial infrastructure

“Wrangle”and archivedata

Note the ‘Project’ in ‘North Carolina Geospatial Data ArchivingProject’– the process, the learning experience, and the engagementwith geospatial data infrastructures are more important than the archive

Page 26: Risks to Digital Geospatial Data

Note: Percentages based on the actual number of respondents to each question 26

Questions?

Contact:

Steve Morris

Head of Digital Library Initiatives

NCSU Libraries

[email protected]

Phone: (919) 515-1361

NCGDAP website: http://www.lib.ncsu.edu/ncgdap/