Planning and Managing Digital Library & Archive Projects

Post on 05-Dec-2014

2.293 views 0 download

Tags:

description

Presented at METRO on March 23, 2011

Transcript of Planning and Managing Digital Library & Archive Projects

Metropolitan New York Library Council ~ March 23, 2011Dr. Anthony Cocciolo ~ Assistant Professor

Pratt Institute ~ School of Information and Library Science

Workshop Schedule

10a – 1pm10a – 1pm

Introduction & Workshop OverviewIntroduction & Workshop Overview

Developing a Strategy for SuccessDeveloping a Strategy for Success

Managing Digital Assets: Born-digital and conversionManaging Digital Assets: Born-digital and conversion

1pm – 2pm – Lunch!1pm – 2pm – Lunch!

2 – 4pm2 – 4pm

Creating an Infrastructure: Technical, Organizational and ResourcesCreating an Infrastructure: Technical, Organizational and Resources

Evaluating your ProjectEvaluating your Project

What is a Digital Library?

focused collection of digital objects, including focused collection of digital objects, including text, video, and audio, along with methods for text, video, and audio, along with methods for access and retrieval, and for selection, access and retrieval, and for selection, organization, and maintenance of the organization, and maintenance of the collection.collection.

Witten, Bainbridge and Nichols (2010)Witten, Bainbridge and Nichols (2010)

Digital Archives

Geostoryteller

Introductions

NameName

What are you currently up to? (Student, What are you currently up to? (Student, Working as Librarian, Archivist, etc. at Working as Librarian, Archivist, etc. at X X Institution, Looking for work)Institution, Looking for work)

Why are you interested in this class? (Starting Why are you interested in this class? (Starting a Digital Library, my boss made me, etc.)a Digital Library, my boss made me, etc.)

Planning & Managing

Digital Library & Archive Projects

Developing a Strategy

for Success

Digital Libraries and Archives are Socio-technical systems.

Setting an agenda for a Digital Library/Archive Project

Trends in Information UseTrends in Information Use

If it’s not easy to get at... If it’s not easy to get at...

Social media, social nature of informationSocial media, social nature of information

Community Needs AssessmentCommunity Needs Assessment

Survey, make it representativeSurvey, make it representative

Focus groups, InterviewsFocus groups, Interviews

Problems with…Problems with…

Use your institution's creativity; hold a design Use your institution's creativity; hold a design event.event.

Sample Size Calculator

Design Event

Have someone(s) facilitate the event; be responsible for Have someone(s) facilitate the event; be responsible for moving the event forward. Schedule for a 2.5-4 hour moving the event forward. Schedule for a 2.5-4 hour event, with working lunch in the middle.event, with working lunch in the middle.

Assemble various stakeholders from across the institution. Assemble various stakeholders from across the institution. Provide Provide background information. .

Divide into groups with members of diverse backgroundsDivide into groups with members of diverse backgrounds

Icebreaker activity, warm-up activities (looking at good & Icebreaker activity, warm-up activities (looking at good & bad digital libraries with targeted questions), and design bad digital libraries with targeted questions), and design the digital library user experience, using simple materials the digital library user experience, using simple materials (markers, etc.) (markers, etc.)

Present out to the group as a wholePresent out to the group as a whole

PocketKnowledgeLogin | About PocketKnowledge

Teachers College, Columbia University

______________________ Search

Communities Tags Authors Uploaders

Sub Community Money5 items

my pocket | add to pocket | create community pocket | browse all pockets

all pockets > money class

Money Class (edit)

Welcome to the money class, the richest

Group of students at TC.

PIC

XML view: thumbnail | list

sort: alphabetical | date | popularity

role: all | student | staff | faculty | other

Community A52 items

Intersect with

View all

Community B32 items

Intersect with

View all

Community C32 items

Intersect with

View all

0 comments

RSS Document 1Firstname Lastname

Date

A good strategy should…

be focused on your users and how it will benefit be focused on your users and how it will benefit them.them.

Focus on the needs of the collection, divorced from Focus on the needs of the collection, divorced from this factor, could lead you to a product with no users.this factor, could lead you to a product with no users.

Grant funders: worst thing is to create something that Grant funders: worst thing is to create something that just sits there (no impact, low use).just sits there (no impact, low use).

• How will this digital project impact your How will this digital project impact your community?community?

On Strategy

• What will community members learn from this What will community members learn from this project? How will you know if they have project? How will you know if they have learned something from your project?learned something from your project?

• Why would someone be intrinsically motivated Why would someone be intrinsically motivated to use your digital library?to use your digital library?

• How will your project advance specific How will your project advance specific learning outcomes (class goals), or more learning outcomes (class goals), or more general learning outcomes (critical thinking, general learning outcomes (critical thinking, illiteracies)? illiteracies)?

Talking Strategy

• Get into groups of 4Get into groups of 4

• Pick a digital project you have worked on or are Pick a digital project you have worked on or are hoping to start working on. What is your strategy hoping to start working on. What is your strategy for success?for success?

– Who is your community? How will it impact your Who is your community? How will it impact your community? What will individuals learn from using it? community? What will individuals learn from using it? Why is it an important project? Why do you think Why is it an important project? Why do you think your strategy is a good one? How will you know if it is your strategy is a good one? How will you know if it is successful?successful?

Planning & Managing

Digital Library & Archive Projects

Managing Digital Assets:

Born-digital and conversion

Living in a hybrid world

• Two paradigms:Two paradigms:

– Digitizing artifacts paradigmDigitizing artifacts paradigm

• History / Old StuffHistory / Old Stuff

• Finite Finite

• Not something that will go on forever (although to Not something that will go on forever (although to some degree we will always discover old objects; some degree we will always discover old objects; archaeology) archaeology)

– Capturing digital material paradigmCapturing digital material paradigm

– Bizarre middle ground Bizarre middle ground

Born digital

• Does the person own the material they are giving Does the person own the material they are giving to you?to you?

– Is it copyrighted? How about Creative Commons Is it copyrighted? How about Creative Commons licensing?licensing?

• Terms of use – what will the creator allow you to do Terms of use – what will the creator allow you to do with it? with it?

• Formats- do you have the best copy?Formats- do you have the best copy?

• Who will create metadata for it?Who will create metadata for it?

Digital Conversion

• Can you digitize? Who can you make that digitization Can you digitize? Who can you make that digitization available to?available to?

– Legal Legal • Preservation- If it is falling apart (e.g., audio, film)Preservation- If it is falling apart (e.g., audio, film)

• Public Domain – life of author +70 yearsPublic Domain – life of author +70 years

• International Publication, Only make available to your International Publication, Only make available to your community community

• DMCADMCA

• Litigious Persons – Dance ProjectLitigious Persons – Dance Project

– Ethical – LHA projectEthical – LHA project

Making Digital Images

• Create Digital MastersCreate Digital Masters

– Can create a variety of derivatives from the Can create a variety of derivatives from the master for access needsmaster for access needs

• What scanning settings to choose?What scanning settings to choose?

– Use the Cornell approach (using Quality Index)Use the Cornell approach (using Quality Index)

– Choose an already developed standard for type Choose an already developed standard for type of visual mediaof visual media

Bitonal: ppi= 3QI/.039hColor/Gray: ppi= 2QI/.039h

QI: barely legible (3.0), marginal (3.6), good (5.0), and excellent (8.0); h is height in mm of smallest detail

Some problems• Would not be a problem if this was a derivative of a digital master.Would not be a problem if this was a derivative of a digital master.

• Uses Arial font, not invented until 1982 (1906 document)Uses Arial font, not invented until 1982 (1906 document)

• Lost page numbersLost page numbers

• Headers and footers? Usually include a bit of citation information.Headers and footers? Usually include a bit of citation information.

• Formatting is not faithful to originalFormatting is not faithful to original

• Other info? Advertisements?Other info? Advertisements?

• Lose any traces of how this was bound as a book (context it was used). Makes you Lose any traces of how this was bound as a book (context it was used). Makes you start to question the authenticity, especially if the PDF gets disconnected from the rest start to question the authenticity, especially if the PDF gets disconnected from the rest of the collection (e.g., this PDF was “discovered”). Would a historian want to use this? of the collection (e.g., this PDF was “discovered”). Would a historian want to use this?

• Human Error & Computer error of changing image to digital textHuman Error & Computer error of changing image to digital text

• CS way of thinking: but all the data is there!CS way of thinking: but all the data is there!

Digitizing Audio

• The minimum:The minimum:

– 44.1 kHZ44.1 kHZ

– 16-bit16-bit

– Stereo, 2-ChannelStereo, 2-Channel

– More info in More info in Sound DirectionsSound Directions book (web book (web reference)reference)

Metadata

DACS

EAD

MARC

Other outputformats

Dublin Core

1. TITLE 2. CREATOR 3. SUBJECT 4. DESCRIPTION 5. PUBLISHER 6. CONTRIBUTORS 7. DATE 8. TYPE

9. FORMAT 10. IDENTIFIER 11. SOURCE 12. LANGUAGE 13. RELATION 14. COVERAGE 15. RIGHTS MANAGEMENT

Computer generated metadata• Determining the language of a digital Determining the language of a digital

document is very accurate (99+% correct)document is very accurate (99+% correct)

Most Digital Libraries are run on a CMS• The user interface for the database The user interface for the database

management system (like MySQL), making the management system (like MySQL), making the DB user-friendly and appropriate for website’s DB user-friendly and appropriate for website’s function.function.

• Usually a public-side and staff side; varying Usually a public-side and staff side; varying degrees of control of the CMS.degrees of control of the CMS.

• YouTube is a big CMS.YouTube is a big CMS.

• A CMS runs on one or more servers. A CMS runs on one or more servers.

• ServerServer

– Running an OS, such as Running an OS, such as Linux, MacOSX Server, Linux, MacOSX Server, Windows Server 2008. Dif.Windows Server 2008. Dif.

– Database server: like MySql, Database server: like MySql, OracleOracle

– Content Management Content Management System: like Omeka, DspaceSystem: like Omeka, Dspace

– File System: Containing File System: Containing digital files (.wav, .pdf, etc.)digital files (.wav, .pdf, etc.)

Switches and Routers, connected to Internet Service Providers or other Wide Area Networks, Academic Networks

Internet (same thing as the other blob below)

CMS Infrastructure

• LAMPLAMP

– Linux – the operating system – like Windows or Mac Linux – the operating system – like Windows or Mac OS X except good for web serversOS X except good for web servers

– Apache – the webserver – responses to http requestsApache – the webserver – responses to http requests• The Microsoft equivalent is IIS – Internet Information The Microsoft equivalent is IIS – Internet Information

Server. Apache is run mostly on Linux and Mac Server. Apache is run mostly on Linux and Mac Server, and occasionally on Windows.Server, and occasionally on Windows.

– MySQL – the relational database management systemMySQL – the relational database management system– PHP – the programming language that the CMS is PHP – the programming language that the CMS is

written inwritten in

• Contrast with WAMP, Server vs. Personal ComputerContrast with WAMP, Server vs. Personal Computer

Outsourcing

• Create a detailed projected timelineCreate a detailed projected timeline

– What date you can expect each deliverable. What date you can expect each deliverable.

– Don’t let the timeline slip; hold the vendor Don’t let the timeline slip; hold the vendor accountable for the timeline; ask for discounts if accountable for the timeline; ask for discounts if slips from timelineslips from timeline

• Create a detailed budgetCreate a detailed budget

– Itemize each componentItemize each component

Handout example

Planning & Managing

Digital Library & Archive Projects

Creating an infrastructure: Technical,

Organizational & Resource

Hollywood

• Fewer than half of the feature films before Fewer than half of the feature films before 1950 have survived1950 have survived

– Less than 20% survive from the 1920sLess than 20% survive from the 1920s

• One of the One of the biggest biggest movies of movies of 1954.1954.

• Nominated Nominated for 6 for 6 Academy Academy Awards, Awards, winner of 2winner of 2

• Winner of 2 Winner of 2 Golden Golden GlobesGlobes

Archival Masters

• With the advent of TV and ability to re-With the advent of TV and ability to re-broadcast movies on TV, followed by advent of broadcast movies on TV, followed by advent of VHS players, Hollywood began to realize that VHS players, Hollywood began to realize that there was a monetary incentive to keep there was a monetary incentive to keep archival masters so the film could be archival masters so the film could be reproduced onto different media (TVs, VHS reproduced onto different media (TVs, VHS tape, DVD).tape, DVD).

Film Preservation

• ““Film in the Freezer”, “Store and Ignore”Film in the Freezer”, “Store and Ignore”

• Private VaultsPrivate Vaults

Long term access

• Hollywood: Want to ensure archival masters Hollywood: Want to ensure archival masters for at least 100 yearsfor at least 100 years

– Most libraries and archive strive for something Most libraries and archive strive for something like “eternal” access.like “eternal” access.

Challenge

• There is no hardware and software that can ensure long There is no hardware and software that can ensure long term access alone; the media will break down anywhere term access alone; the media will break down anywhere from 5 to 10 years.from 5 to 10 years.

• ““Store and ignore” while concentrating on Store and ignore” while concentrating on environmental conditions (like humidity & temperature) environmental conditions (like humidity & temperature) will not work.will not work.

– For example, magnetic hard drives cannot be stored on For example, magnetic hard drives cannot be stored on a shelf for longer periods of time. This is because the a shelf for longer periods of time. This is because the internal lubrication will be affected by “stiction,” where internal lubrication will be affected by “stiction,” where internal components lock up. Magnetic hard drives internal components lock up. Magnetic hard drives should be powered on a spinning. Still have a limited should be powered on a spinning. Still have a limited operational lifetime. operational lifetime.

Doing Digital Preservation• Permanence in the digital sense means Permanence in the digital sense means

ongoing and systematic preservation process; ongoing and systematic preservation process; an active management approach is required.an active management approach is required.

• It is more like maintaining a car, than putting It is more like maintaining a car, than putting a book on a shelf.a book on a shelf.

Implications (1)

• That means that the data will be migrated on a scheduleThat means that the data will be migrated on a schedule

– Factor migration time (labor), costs in budget and in Factor migration time (labor), costs in budget and in strategic plansstrategic plans

• Should be talking in terms of $/TB/year Should be talking in terms of $/TB/year

– Labor and electricity costs should be factored in, not just Labor and electricity costs should be factored in, not just media costsmedia costs

– Should be including backup and other multiple copies Should be including backup and other multiple copies you will be makingyou will be making

• Example last week was misleading, must always factor Example last week was misleading, must always factor in time.in time.

Implications (2)

• Media (CDs, DVDs, Blurays, Gold DVDs), hard Media (CDs, DVDs, Blurays, Gold DVDs), hard drives, on a shelf or under a desk is not good drives, on a shelf or under a desk is not good digital archive strategy.digital archive strategy.

– If you see this, know that it is bad practice, and If you see this, know that it is bad practice, and work to change it.work to change it.

• (Trusted) Digital Repository that is (almost) (Trusted) Digital Repository that is (almost) always powered, redundant, and backed-up is always powered, redundant, and backed-up is the best strategy.the best strategy.

Implications (3)

• Heavy use is one of the best defenses against Heavy use is one of the best defenses against digital loss.digital loss.

– Patrons will notice if something is amiss.Patrons will notice if something is amiss.

– Complete opposite of physical preservation.Complete opposite of physical preservation.

Managing Digital Content

• Physical media is almost never an appropriate Physical media is almost never an appropriate digital preservation strategy. Most commercial digital preservation strategy. Most commercial sites aren’t either.sites aren’t either.

Trusted Digital Repository

• You can make your You can make your own Trusted Digital own Trusted Digital Repository or join a Repository or join a group that has one.group that has one.

Organizational Infrastructure• Policy frameworkPolicy framework

– Mission statementMission statement

• Financial sustainability/framework Financial sustainability/framework (Columbia example)(Columbia example)

• Organizational viabilityOrganizational viability

– Have a succession planHave a succession plan

Technology

• Redundant hard disksRedundant hard disks

• Backup, move to offsite, securityBackup, move to offsite, security

• Physical security, staff w/securityPhysical security, staff w/security

• Physical environment (Air conditioning, above 80 deg F, Physical environment (Air conditioning, above 80 deg F, redundant)redundant)

• Electricity (UPS, Backup generator, surve, voltage regulartor), Electricity (UPS, Backup generator, surve, voltage regulartor), Power is always on.Power is always on.

• Piggy back on what IT is already doing, if they are doing a Piggy back on what IT is already doing, if they are doing a enterprise records management system (e.g., Banner, PeopleSoft, enterprise records management system (e.g., Banner, PeopleSoft, Datatel).Datatel).

Evaluating your Project

Planning & Managing

Digital Library & Archive Projects

On Evaluating

Evaluation is usually started after something Evaluation is usually started after something has completed or have had time to be used.has completed or have had time to be used.Used to inform decisions (replication, Used to inform decisions (replication,

discontinuation, refinements, more investment, discontinuation, refinements, more investment, etc.)etc.)

Alternative is to do mini-evaluations with user Alternative is to do mini-evaluations with user community as you develop.community as you develop.This can be a challenge if you don’t have a user This can be a challenge if you don’t have a user

community yet (e.g., have your mom try it out).community yet (e.g., have your mom try it out).Evaluation is not the same as usabilityEvaluation is not the same as usability

Evaluation Methods

Quantitative: Analysis of numerical data (surveys, Quantitative: Analysis of numerical data (surveys, logs)logs)Criticized for not getting at what people really thinkCriticized for not getting at what people really think

Qualitative: Analysis of words (e.g., interview Qualitative: Analysis of words (e.g., interview transcript), pictures, objectstranscript), pictures, objectsCriticized for being biased, not representativeCriticized for being biased, not representative

Mixed Methods: Depending on decisions that you Mixed Methods: Depending on decisions that you are trying to make, you may want to triangulate are trying to make, you may want to triangulate (use multiple methods to get at what you are (use multiple methods to get at what you are looking for). Example: Survey, Focus Groups & looking for). Example: Survey, Focus Groups & Transaction Log Analysis. Of course, ability to do Transaction Log Analysis. Of course, ability to do all that is budget & time constraints. all that is budget & time constraints.

Sampling

• Whichever method you use, sampling is importantWhichever method you use, sampling is important

– Get a representative sample that accurately represents the Get a representative sample that accurately represents the entire populationentire population

• Sampling is not important where you capture 100% of the Sampling is not important where you capture 100% of the data, such as in transaction log analysisdata, such as in transaction log analysis

• Qualitative MethodsQualitative Methods

– You can remove the interpretive bias by using formal You can remove the interpretive bias by using formal qualitative data analysis methodsqualitative data analysis methods• Use independent coders of transcripts to see the extent to which Use independent coders of transcripts to see the extent to which

your interpretations coincide.your interpretations coincide.

Compare alongside past projects

Thank you.

Anthony Cocciolo

acocciol@pratt.edu