Parul Sharma Sally Vermaaten Right Combination
-
Upload
future-perfect-2012 -
Category
Technology
-
view
565 -
download
1
description
Transcript of Parul Sharma Sally Vermaaten Right Combination
The Right Combination:Using DDI and PREMIS for data preservation
Parul Sharma & Sally Vermaaten
March 2012
1. The context – drivers for preservation2. The problem – challenges faced when trying to re-
use data3. Our solution – metadata for data management &
preservation4. Our recommendations– strategies for making the
right metadata choices
2
Outline
1. THE CONTEXT: DRIVERS FOR PRESERVATION
3
Data is a cross-domain concern
Geospatial dataScientific data
4
Statistical dataFinancial and commercial data
5
There are many drivers for data preservation
Legal mandates
Verification
Uniqueness of data
Cost of data collection
Data re-use
6
An example of data re-use at Statistics New Zealand
2. THE PROBLEM: CHALLENGES FACED WHEN TRYING TO RE-USE DATA
7
Common challenges to re-use/preservation of any type of digital object
Common challenges to re-use/preservation of any type of digital object
I can’t find it I can’t open it (wrong hardware/software) I’m not sure it is the right thing
Unique challengesto re-use/preservation of structured data
11
I’m not sure it is the authoritative dataI don’t understand the meaning of the data - data is not self-descriptive I can’t use the data because I can’t harmonize it with other data
Unique challengesto re-use/preservation of structured data
3. OUR SOLUTION: METADATA FOR DATA MANAGEMENT & PRESERVATION
12
13
Our solutions
14
Our solutions
15
Our solutions
16
Our solutions
17
Our solutions
18
To support these processes…Metadata is keyWe could invent our own standard for recording metadata but there is a better way …
How?
19
+ +
Describe!
Data Documentation Initiative (DDI)
Dublin Core
PREservation Metadata: Implementation Strategies (PREMIS)
Discover !
Preserve!
Comparison of standards coverage
20
Dublin Core DDI PREMIS
Discovery information about a resource (e.g. Title, Creator, Publication date)
Surveys and outputs (Series and Studies)
Objects (significant characteristics, checksums, basic identifying information)
Methodology & quality information
Events (preservation actions)
Classifications used Agents
Dataset descriptions Rights
Variables used
Links to documentation
Metadata to support re-use
21
DDIPREMIS
4. OUR RECOMMENDATIONS: STRATEGIES FOR MAKING THE RIGHT METADATA CHOICES
22
Metadata Top Tips
1. Create structures that will allow you to re-use metadata tools
2. Use standards that are fit for your content so users can re-use
3. Consider overlap between standards so you’re using the right standard for the right job
4. Provide standard based tools and capture at point of creation to improve quality and efficiency
23
1. Create structures that will allow you to re-use metadata tools
Set yourself up to be able to use the same tools to harvest and mine your metadata (e.g. handy reports, searching across content types) by:
– developing a standard structure that can support all your content types
– and recording generic information in generic metadata standards
24
25
Data_1500
DublinCore.xml
PREMIS.xml
Original
data.sas7bdat
questionnaire.doc
ArchiveMaster
Data
data.csv
Documentation
questionnaire.pdf
Metadata
DDI.xml
Database_0120
DublinCore.xml
PREMIS.xml
Original
database.mdb
ArchiveMaster
Header
metadata.xsd
metadata.xml
Content
Schema1
Table1
table.xsd
table.xml
Non-format specific metadata
Format specific structure &
metadata
2. Use standards that are fit for your content so users can re-use
26
Enable future re-use and understanding by recording format or content-specific metadata in fit-for-purpose standards e.g.
DDI for statistical dataSIARD for databasesMIX for images
3. Consider overlap between standards so you’re using the right standard for the
right job
27
Information DDI PREMIS Dublin Core Useful to duplicate?
Basic identifying information
•Title•Creator•PublicationDate•ID
•Title•Creator•Date•Identifier
yes
Access information
•Access Conditions •Rights entity •Rights No – PREMIS is most expressive and generic location
4. Provide standard based tools and capture at point of creation to improve quality and efficiency
At first, you may need to capture or collate all metadata about data yourself Think ahead about tools you might be able to provide to data experts to allow them to record the information directly in the standard if possible
28
29
Takeaways
1. Organisations have many reasons to re-use data over time 2. There are unique challenges to preserving data3. Where possible, save yourself some work and make your
metadata more harvestable and data more understandable by using international standards like DDI and PREMIS
4. When you use metadata standards like DDI and PREMIS together:• create generic structures• use fit-for-purpose standards for specific content• consider information overlap • ‘delegate’ metadata capture where possible
30