The future of the DCC
-
Upload
chris-rusbridge -
Category
Technology
-
view
401 -
download
0
description
Transcript of The future of the DCC
a centre of expertise in data curation and preservation
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
The future of the DCC
Chris Rusbridge
E-Science Workshop April 2009
a centre of expertise in data curation and preservation
E-Science Workshop
Contents• Curation & integrated science• Poetry & Philosophy of D H Rumsfeld• Designated Community & Knowledge Base• DCC services• Future of the DCC
a centre of expertise in data curation and preservation
E-Science Workshop
Curation• Wikipedia
• Curator: a content specialist responsible for an institution's collections and, together with a publications specialist, their associated collections catalogs.
• Digital Curation: the curation, preservation, maintenance, collection and archiving of digital assets
• Sheer curation: an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets.
• DCC: Digital curation is maintaining and adding value to a trusted body of digital information for current and future use.
a centre of expertise in data curation and preservation
E-Science Workshop
Integrated Science• The application of multiple scientific
disciplines to one or more core scientific challenges
• Examples of integrated sciences?• Archaeology• Environmental sciences
a centre of expertise in data curation and preservation
E-Science Workshop
Integrated Science implications• Scientists will be using unfamiliar data,
therefore• Data curators and managers must make their
data available for unfamiliar users!
• And now for something unfamiliar?
a centre of expertise in data curation and preservation
E-Science Workshop
Poetry & Philosophy of D H Rumsfeld
Hart Seely, April 2, 2003, SLATE http://www.slate.com/id/2081042/
a centre of expertise in data curation and preservation
E-Science Workshop
A Confession‘Once in a while,I'm standing here, doing something.And I think,"What in the world am I doing here?"It's a big surprise.’—May 16, 2001, interview with the New York Times
a centre of expertise in data curation and preservation
E-Science Workshop
The Unknown‘As we know,There are known knowns.There are things we know we know.We also knowThere are known unknowns.That is to sayWe know there are some thingsWe do not know.But there are also unknown unknowns,The ones we don't knowWe don't know.’—Feb. 12, 2002, Department of Defense news briefing
a centre of expertise in data curation and preservation
E-Science Workshop
The 4th Rumsfeld?• 3 epistemological classes (???)
• Known knowns• Known unknowns• Unknown unknowns
• 4th class?• Uknown knowns?• Critical issue for cross-disciplinary sciences
a centre of expertise in data curation and preservation
E-Science Workshop
Some OAIS Concepts?• Knowledge Base: allows a consumer to understand
something• Designated Community: the set of consumers for whom
the archive curates something• Representation Information: helps you interpret a data
object yielding an information object• The amount and nature of RepInfo required is dependent on
the Knowledge Base of the Designated Community• If you curate for project colleagues in the short term, little if any
RepInfo required• If you curate for those unfamiliar with the data, more RepInfo is
needed• (All broadly interpreted!) •CCSDS (2002). Reference Model for an Open Archival Information System (OAIS).
•Retrieved. from http://public.ccsds.org/publications/archive/650x0b1.pdf.
a centre of expertise in data curation and preservation
E-Science Workshop
Time• KB is f1(DC, t)• DC is f2(t)• RepInfo needed is f3(f1(DC, t), f2(t))
• (but none of these concepts can be precisely defined!)
• If DC is small and t is short (months to year or so), then both may be ignored, and RepInfo be assumed part of the KB
• If DC is extensive (eg cross-discipline) and t is long (5 years to 25 plus), then RepInfo must be articulated
• If t is very long, most bets are off (post-hoc reconstruction likely to be needed)
a centre of expertise in data curation and preservation
E-Science Workshop
What might RepInfo include• Structure information: file format definitions, etc • Semantic information: data dictionaries, code books etc• Robust methods (working code?)• Not to mention many kinds of metadata, provenance,
documentation of hidden assumptions, etc• Cross-domain schemas one approach to articulating
RepInfo?• (Never perfect, of course)
a centre of expertise in data curation and preservation
E-Science Workshop
What about Rumsfeld 4?• Biggest concern with unfamiliar user is
clashing concepts, eg different baselines, units, geographies, granularity• Especially where terms are ambiguous or
differently interpreted• The KBs of two DCs conflict, potentially silently• Happens all the time, of course
• The unspoken: tacit knowledge, unknown knowns!
a centre of expertise in data curation and preservation
E-Science Workshop
Timing• Curation starts before creation
• Before project proposal!
• Data acquisition should not happen at the end• Continuous acquisition much better?
• Enforcement… or credit for data?
a centre of expertise in data curation and preservation
E-Science Workshop
Other curation issues of concern• Sustainability (work on your survival)• Succession (what happens to your data if you don’t)• Data audit (know what you’ve got)• Data risk assessment (assess your chances of loss)• Repository external audit???• Provenance & computational lineage• Archiving database changes• Community proxy roles: help your communities develop
data standards & data practices
• DCC has tools & support for some of these…
a centre of expertise in data curation and preservation
E-Science Workshop
… and Research Outputs?• Need more semantically aware texts to
support cross-community understanding• Coded up (cf microformats, RDFa)
• People• Citations & references• Science features (eg chemicals, reactions)• Graphs, spectra, tables linking to • Supplementary data
• PDF is pretty bad at this
a centre of expertise in data curation and preservation
E-Science Workshop
DCC Phase 3• Post January 2010?• Smaller (2/3 budget if we’re lucky)• Joint planning with JISC• More tightly managed (hub and spoke)• No development (says JISC)• Core services plus optional additional services• 1st draft seen by JSR• Evaluation reported to JISC• Feedback session next week
a centre of expertise in data curation and preservation
E-Science Workshop
Proposed core services• Reference Resources and Exemplars• Training and Staff Development• Expertise, Advice, Consultancy and Hands-on
Support• Community-building and Information-sharing
activities• Data Management and Sharing Plans• Policy and Strategic Development• Providing Access to Tools and Toolkits
a centre of expertise in data curation and preservation
E-Science Workshop
Possible additional services• Development of Tools, Toolkits, Wizards and
Templates• Infrastructure Services• Model licences for data• Data citation guidelines
a centre of expertise in data curation and preservation
E-Science Workshop
Relationship to UKRDS?• Overlap of territory• Aiming for complementarity rather than
conflict• DCC becomes core part of UKRDS• Some issues about the vision, though
a centre of expertise in data curation and preservation
E-Science Workshop
What do you want from the DCC?