Collaborative Digital Libraries: Their Virtual Collections ... · heterogeneous collections of...
Transcript of Collaborative Digital Libraries: Their Virtual Collections ... · heterogeneous collections of...
Collaborative Digital Libraries
1
Collaborative Digital Libraries: Their Virtual Collections and Aggregating Their Metadata
Collaborative Digital Libraries: Their Virtual Collections and Aggregating Their Metadata
Research Proposal
Bonnie MacGregor
San Jose State University
Libr 285
Spring 2010
Collaborative Digital Libraries
2
Introduction
Libraries, museums, and archives offer a rich medley of information, artifacts, and
primary source materials that reflect our shared human interests and history. While each
institution in performing their traditional role remains essential, many are focusing their efforts
in creating collaborative digital libraries and making their virtual collections visible and
accessible to the world. A digital library that blends resources from varying institutions
depends on collaborative exchanges and contributions. Working together, these specialized
digital libraries and their virtual collections can be distributed across different servers, be
owned by different organizations, and be displayed in many different orderings and
arrangements. Defining and describing these virtual collections is an important function in
making these collections visible and accessible to our users but not all institutions describe their
resources in the same way, nor do all institutions rely on the same standards which govern
description.
Although the advent of the Open Archives Initiative Protocol for Metadata Harvesting
(OAI-PMH) has facilitated sharing of item-level descriptive metadata and harvesting across
institutional lines, one concern is that when item-level metadata is created it retains implicit
contextual information associated to the local setting in which it was created. When that item-
level metadata is removed from that context, inherent and referential information is lost
(Foulonneau et al., 2005). Furthermore when that data is harvested to other larger
heterogeneous collections of records, users may find it difficult to retrieve needed results when
records loose this contextual information after aggregation.
Collaborative Digital Libraries
3
“Contextual information about the authority of a resource, its relationship to other resources,
its format and type, geographic and temporal coverage, and restrictions and usage rights can be
lost when item-level metadata is aggregated without the retention of implicit context”
(Foulonneau et al, 2005, 32).
The National Science Digital Library (NSDL) is one example of a highly specialized union
database which aggregates resources from varying institutions and offers organized access to
high quality resources and tools. Currently the NSDL Cornell University team has created a new
open-source library platform called NCore (for NSDL Core). One of the key tools used in their
new architecture is the use of aggregations. While there are several complexities maintained
within the data model, of particular interest is the aggregator objects – which are a special type
of data schema’s which collect and provide key contextual information contained and retained
in the harvested item-level metadata, as well as the collection level metadata.
Research Question
Do NSDL NCore’s aggregation objects improve a digital object’s contextual metadata?
Literature Review
According to the Digital Library Reference Model (DELOS), a “digital library is an
organization that comprehensively collects, manages and preserves for the long term rich
digital content, and offers to its user communities specialized functionality on that content of
measurable quality” (Chang et al., 2004, p. 335). Digital libraries are freed from the boundaries
of physical space and media and operate as rich and adaptive networked systems. Digital
libraries and their virtual collections offer users a plethora of enriched access points and
Collaborative Digital Libraries
4
alternative methods for browsing and exploration, while maintaining a functionality that allows
their collections to be segmented, rearranged, annotated, enhanced, and integrated in ways
not possible before. Thus digital libraries offer the advantage of providing access to multiple
objects existing in separate collections and repositories, and aggregating them in different ways
to coexist. Yet a commonly noted risk is the loss of contextual metadata when aggregating
objects from differing repositories. “Item-level metadata records are typically written at a level
of descriptive granularity most appropriate to a local application, when item-level metadata is
removed from that context, implicit and referential information is lost” (Foulonneau, Cole,
Habing, and Shreeves, 2005, p. 32). In the essay, Using Collection Descriptions to Enhance an
Aggregation of Harvested Item-Level Metadata, contextual information on the authority of a
resource, its relationship to other resources, its format and type, its geographic and temporal
coverage, and its use restrictions and rights can be lost (Foulonneau et al., 2005). This
information is vital for articulating the scope, intent, and function of a record; not only at a
collection-level but also at the item-level of an object.
Defining context
Contextual information is achieved by the creation of metadata; it can be bibliographic
data, provenance data, and/or social and cultural data. More recently it has come to be
understood in a more expansive view as a means to understand patterns of use; pedagogical
goals, the nature of learners' educational systems; learners’ abilities, preferences and prior
knowledge. It can also refer to capturing opinions, comments, and reviews about library
resources and their history of use (Lagoze, Kraft, Payette, and Jesuroga, 2005). According to
Collaborative Digital Libraries
5
Kraft, Birkland, and Kramer (2008) many libraries have identified user-contributed content,
personalization, and re-purposing of content as essential value-add features of “Next
Generation” digital libraries, improving the context around those resources, and enriching them
with new information and relationships that express the usage patterns and knowledge of the
library community. “The digital library then becomes the milieu for information collaboration
and accumulation – much more than just a place to find information and access it" (Lagoze et
al., 2006, p.2). Many digital libraries have relied on an information model based on the
simplicity of a union catalog, such as a ‘search and access’ model which at the core, collects,
index, and provides queries over a catalog of metadata records (Geisler et al., 2002). There has
been recent consensus that a more expansive view on digital libraries is necessary, one that
identifies digital libraries as collaborative, adaptive, and reflexive systems, and one that
elaborates the definition of user-contributed contextual information.
“They should be collaborative, allowing users to contribute knowledge to the library, through annotations, reviews, and the like, or passively through their patterns of resource use. In addition, they should be contextual, expressing the expanding web of inter-relationships and layers of knowledge that extend among selected primary resources. In this manner, the core of the digital library should be an evolving information base, weaving together professional selection and the "wisdom of crowds”(Lagoze et al., 2006, p.57).
The basic ‘search and access’ record-oriented model employed by most digital (and traditional)
libraries has a limited ability to fully model this multi-dimensional information context (Lagoze
et al., 2005).
Collaborative Digital Libraries
6
Defining Aggregators
The National Science Digital Library (NSDL) aims to push the frontiers and capabilities of
digital library technology through its creation of an open-source architecture software platform
NCore. NCore is a techno-ecosystem that “can support digital library/repository needs ranging
from cultural heritage materials in the arts and humanities, to scholarly communication and
collaboration, to education at every level in every discipline” (Kraft et. al., 2008, p.313). The
central data model and architecture of NCore is quite complex yet it was designed to represent
multiple types of descriptions offered by their contributors. Resources themselves are not
homogeneous. A digital library will collect a variety of resources, i.e. images, audio, simulations,
and multi-media learning objects. Supporting this diversity raises the modeling complexities, in
particular, how to best accommodate information at the user interface level while
simultaneously representing the special characteristics of each type of resource, also known as
its context. “In such an environment, data surrounding a resource, such as a subject’s metadata
or membership in an aggregation, does not purely originate from a single cohesive and
consistent curation policy, but from a variety of independent agents with their own
motivations” (Kraft et al., 2008, p. 314).
What is of importance in regards to our examination of NSDL NCore model lies in the
systems implementation of ‘aggregator’ objects. These aggregations are first-class objects that
occupy a central role in representing and mediating context within the system. Five primary
objects are involved in the schema including “the resource object that contains or specifies
content, a metadata object that contains structured statements about a resource, an
Collaborative Digital Libraries
7
aggregation object that collects resources with other aggregations into a set, a metadata
provider object that provides provenance information, and finally an agent object that specifies
the source for the metadata statements and the selector for aggregations” (314). Through the
use of these aggregation objects - all function as the building blocks of many complex structures
occurring within the digital library. The second major release of the NSDL technical
infrastructure, NSDL 2.0, supports creating this web of context around the resources in the
library in effect claiming that users will be able to discover resources by their context.
Typically a user must examine a resource’s information included in the catalog or else
examine the resource itself. Over several years of operation, NSDL has consistently received
suggestions asserting that users do not want a simple list of resources but rather want to
understand how to use them. The context of a resource - what benchmarks or educational
standards it meets; how it relate to other resources; how teachers have incorporated it into a
lesson plan; and what teachers, scientists’ and librarians have to say about it- are all critical in
making the digital library effective (The National Science Digital Library [NSDL], 2006).
Contextualization is a critical component in active learning. Gaining an understanding of a
concept includes the process of relating in a meaningful way to an idea, of seating it cognitively
in personal experience or understanding (pg.18). These critical features of NSDL 2.0: will easily
represent the web of related information around and among library resources, and it will make
it very easy for qualified library users to understand and add new contextualization to content
within the library. Therefore this study seeks to evaluate whether The National Science Digital
Library’s (NSDL) NCore aggregation objects have in fact improved contextual metadata, based
Collaborative Digital Libraries
8
on the evaluation and analysis of respondents level of satisfaction with records retrieved from
their system.
Methodology Study Population
Our targeted study population will attempt to attract students currently enrolled in the
San Jose State University school of Library and Information Science master’s program. Due to
the subject matter of the curriculum, students will be knowledgeable about the technical
information infrastructures, classifications, and terminology associated with computer
engineering. This study population will also have specialized understanding in fields concerning
information retrieval processes and human-computer interactions. Respondents involved in
this exploratory study must have completed 25 units or more within the graduate program, to
ensure that respondents have an adequate level of knowledge needed to reliably complete this
evaluation study.
Sampling Design
This study will rely on a systematic sampling technique with a random start to select
respondents. By locating and evaluating the University’s current enrollment records, a list will
be compiled that contains all potential respondents meeting the sampling frame requirement
mentioned above. The hope of this study is to identify at least 100 persons that fit the profile
and recruit at least 50 respondents to participate in the study. To ensure any bias in the sample,
a random sampling interval will be selected to jump start the selection procedure specifically a
numerical value between one and ten. Once the interval has been selected, every kth unit in
Collaborative Digital Libraries
9
our list will be chosen for inclusion (Babbie, 2009). For example, if 7 is the chosen sampling
interval, every seventh name on the list will be chosen for inclusion. This probability sampling
technique will ensure that all members of the population will have an equal chance of being
selected and be representative of the population in which it has been selected.
While it important to note that not all probability samples can ever be perfectly
representative, there is another danger involved with systematic sampling and that is
periodicity. The arrangement of elements, specifically if they are arranged in a cyclical pattern
may be biased if they coincide with the sampling interval (Babbie, 2009). Facilitators of this
study will be aware of such a problem and if patterns begin to emerge that are predictable or
ascribe to periodicity, a new sampling interval will be chosen and/or a newly generated list of
names will be produced.
A letter of intent (See Appendix: A) will be electronically mailed to each randomly
selected respondent. This cover letter will include information about the purpose of the study,
details regarding when and where the study will be administered, confidentiality terms and
conditions, and compensation. In addition to the letter of intent, a more in-depth introduction
and literature review will be dispersed as well in order to bring more focus on the intentions of
the study and what it hopes to accomplish.
Data Collection Instruments
Standardized survey questionnaires will be administered during face-to-face interviews
and will be the primary means of data collection for this evaluation study (See Appendix: B).
Interviews will be conducted in groups of ten . Because individuals are the unit of analysis and
Collaborative Digital Libraries
10
their level of satisfaction about contextual metadata of a particular record is under evaluation,
survey research provides the best method in measuring the attitudes and opinions of each
respondent (Babbie, 2009). Interviews will be conducted in a semi-structured manner where
both efficiency and probing can occur.
Study subjects will be required to perform basic IR functions on the NCore platform
administered through NSDL. Subjects will be asked to query the system and retrieve one book
record, one image record, and one primary document record. Respondents will then observe
the characteristics of each record, paying special attention to ‘contextual information’ (as
defined in the literature review above). A 30 minute timeframe will be set, limiting all
respondents to search within this allocated amount of time.
Once respondents have completed basic IR processes on the NSDL NCore platform,
respondents will then be asked to sit down with the lead investigator and/or other properly
trained interviewer’s where a series of open-ended and close-ended questions will be ask by
way of a prepared questionnaire. Each interviewer will be responsible for digitally recording
each interview administered as well as transcribing additional notes where more elaborate
responses are required and given by subjects. Interviewers will act as neutral mediums and
their presence should not affect a respondent’s perception. Each interviewer must transcribe
responses verbatim. “No attempt should be made to summarize, paraphrase, or correct bad
grammar” (Babbie, 2009, p. 265).
According to Babbie, there are several advantages to implementing this method-
claiming interview surveys have higher response rates, and obtain higher completion rates.
Collaborative Digital Libraries
11
Interviews also decrease the number of incomplete answers and the interviewer has the ability
to make observations such as respondent’s reactions to questions (Babbie, 2009). Regardless,
survey interviews are particularly flexible. Many questions can be asked on a given topic, giving
you considerable flexibility in your analysis.
Data analysis techniques
The process begins by quantifying the data into a numerical form before any statistical
analysis can be performed. Code categories will be developed after the data collection process
has been completed in order to identify categories that reflect our research purpose as well as
reflect the logic that emerges from the data (Babbie, 2009). Because our survey-interview
contains both open and close-ended questions, relying on what emerges from our data
collection will be essential in determining code categories that are both exhaustive and
mutually exclusive. While hired interviewer’s will be responsible for data collection, data
analysis will be performed solely by the principle investigator, therefore eliminating the need to
train coders in the definitions of code categories and showing them how to use those
categories properly. In an attempt to eliminate any discrepancies in the coding scheme the
principle investigator will rely on the assistance of a fellow colleague and have them code a
sample of the data in order to establish whether similar assignments are being made and
highlight any discrepancies.
A codebook will then be created; converging data categories into numerical codes.
Within the codebook the location of variables will be organized, giving the investigator the
ability to locate the connotation of the codes which ultimately represent the different
Collaborative Digital Libraries
12
attributes of the variables under evaluation (Babbie, 2009). In essence the codebook tells the
researcher where to find the variables and what the code assigned to the variable – represents.
The principle investigator will review each questionnaire and begin coding the data directly
onto questionnaire. After completion, the investigator will take up data entry into an Excel
spreadsheet that can later be uploaded into some type of software that performs statistical
analysis.
Once the data has been fully quantified- quantitative analysis will commence. Univariate
analysis will be performed on the data that will involve analysis of a single variable (Babbie,
2009). In presenting this univariate data a measure of central tendency, such as averages will
be implemented. The most frequently occurring attribute, also known as mode, will be the
primary mean in calculating the average. The advantage with using averages’ lies in the
inherent reduction of raw data to the most manageable form, meaning a single number (or
attribute) can represent all the detailed data collected (Babbie, 2009).
Project Schedule
The table below showcase’s the main objectives or tasks that need to be accomplished. The
study has allocated a full year to complete this study. Objectives will fall under four categories
which represent particular stages in the overall project design and schedule. Each objective
identified will have an approximate deadline that will coincide within a particular month.
Because specific processes are highly iterative, setting a flexible deadline will allow revisions to
take place yet tasks are expected to be completed at the end of the time allocated.
Collaborative Digital Libraries
13
Objective
Approx. Scheduled Completion
2010
Initial Stage
Form research team
Identify research topic, purpose, objectives, and outcomes
Define research methods
January - February
Stage I.
Complete thorough literature review
Define research methods and measurement techniques
Complete research proposal
Submit proposal to library director for approval
Submit proposal to SJSU IRB Board for approval
Design questionnaire
March - May
Stage II.
Contact respondents
Schedule interviews
Train interviewer’s
Conduct data collection
June - August
Stage III.
Code data for analysis
Perform data analysis
Complete draft of research findings
September - November
Stage VI.
Review any changes or corrections
Submit for publication
December
Collaborative Digital Libraries
14
Qualifications
The principle investigator is currently a graduate student in the School of Library and
Information Science at San Jose State University. With a keen interest in library, museum, and
archival practices, she continually evaluates new technologies and studies that seek to blur
institutional practices in order to create more dynamic and collaborative libraries. In addition
to working at two academic libraries as well as in a museum archival department, the
investigator possesses specialized knowledge concerning professional museum standards as
well as curatorial methods, procedures, and techniques alongside more traditional and
specialized library practices. Her research interests include digital or hybrid libraries, open
source architectures, and special collections.
Significance of Work & Summary
“The ultimate goal of digital library evaluation is to study how digital libraries transform
research, education, learning and life” (Sudatta, Chowdhury, Landoni, Gibb, and Forbes, 2006,
p. 659). Digital libraries are difficult to evaluate due to their richness, complexity, and variety of
uses and users. Recent developments in the field have significantly influenced the ways in
which user’s access and use electronic information, and the issues explored typically have to do
with information retrieval and usability studies. To date there is no standard model for digital
library evaluation, nor is there a comprehensive set of models and toolkits that can be used by
digital libraries (Suddatta et al., 2006).
There is a need for more studies that focus on other factors involved with digital library
creation such as implementation issues, hardware, software, networking, data formats, access
and transfer times, failure rates, and development and maintenance costs . This study seeks to
Collaborative Digital Libraries
15
contribute quantified data to the fields concentrated on hardware and implementation issues
faced by digital libraries. By basing the study in a real-world application such as NCore, this
study aims to provide a current quantitative analysis measuring how well NSDL NCore
aggregators improve contextual information. The results will be analyzed based on a
respondent’s level of satisfaction with a record’s metadata.
Studies such as this one- that deal with the complex hardware and software elements
are important not only in contributing to the scientific field but also in performing a leadership
type role, where other organizations can adopt methods and practices created by NCore.
Studies such as this one can also assist with strategic planning with respect to services and
management issues but also investigate the use and impact of this open-source information
architecture and suggest ways in which existing services can be improved.
Collaborative Digital Libraries
16
References
Babbie, E. R. (2009). The Practice of Social Research (12th ed.). Pacific Grove, CA: Wadsworth Publishing
Change, M., Legget, J., Furuta, R., Kern, A., Williams, P., Burns, S., & Bias, R. (2004, June).
Collection understanding. Presented at the Joint Conference on Digital Libraries,
(Tucson, Arizona), ACM, 334-342.
Foulonneau, M., Cole, T., Habing, T., & Shreeves, S. (2005, June). Using collection descriptions to
enhance an aggregation of harvested item-level metadata. Presented at the Joint
Conference on Digital Libraries, (Denver, Colorado), ACM, 32-41.
Geisler, G., Giersch, S., McArthur, D., & MeClelland, M. (2002 July). Creating virtual collections
in digital libraries: Benefits and implementation issues. Presented at the Joint
Conference on Digital Libraries, (Portland, Oregon), ACM, 210-218.
Krafft, D., Birkland, A., & Cramer, E. ( 2008 June). NCore: Architecture and implementation of a
flexible, collaborative digital library. Presented at the Joint Conference on Digital
Libraries, (Pittsburgh, Pennsylvania), ACM, 313-322.
Lagoze, C., Kraft, D., Cornwell, T., Eckstrom, D., Jesuroga, S., & Wilper, C. (2006). Representing
contextualized information in the NSDL. Presented at the European Conference on
Digital Libraries, (Alicante, Spain), Springer, 1-12
Lagoze, C., Kraft, D., Cornwell, T., Eckstrom, D., Jesuroga, S., & Wilper, C. (2006). Metadata
aggregation and “automated digital libraries”: A retrospective on the NSDL experience.
Presented at the Joint Conference on Digital Libraries, (Chapel Hill, NC), ACM, 33-67.
Lagoze, C., Kraft, D., Payette, S., & Jesuroga, S. (2005). What is a digital library anymore,
anyway? Beyond search and access in the NSDL. D-lib Magazine, 11 (11). Accessed via
http://www.dlib.org/dlib/november05/lagoze/11lagoze.html
Marshall, Y., Zhang, H., Chen, A., Lally, R., Shen, E., Fox, A., & Cassel, L. (2003). Convergence of
knowledge management and e-Learning: The getsmart experience. Presented at
ACM/IEEE Joint Conference on Digital Libraries, (Houston, TX), ACM, 49-67.
Sudatta, C., Chowdhury, S., Landoni, M., Gibb, M., & Forbes, A. (2006). Usability and impact of
digital libraries. Online Information Review, 30(6), 656-680.
The National Science Digital Library. (2006). NSDL 2006 Annual Report: Leveraging Collaborative Networks. Retrieved from http://nsdl.org/news/?pager=publication
Collaborative Digital Libraries
17
Appendix A: Recipient’s letter of Intent
Hello!
You have been selected through a random sampling method to participate in a study sponsored by the Association of College and Research Libraries (ACRL) in collaboration with San Jose State University School of Library and Information Science.
This study seeks to evaluate whether The National Science Digital Library’s (NSDL) NCore aggregation objects have improved contextual metadata based on users level of satisfaction with records retrieved from their system.
You will be asked to perform a simple information retrieval (IR) process on the NSDL digital library for an allocated 30 minutes. Once records have been retrieved and evaluated, you will be asked to sit down with an interviewer where he/she will administer a questionnaire and your responses will be recorded. From start to finish the study will take approximately one hour. The results of this study will help further the understanding contextual metadata plays when digital libraries aggregate their collections from differing repositories.
This survey is voluntary, and you may refuse to participate if you wish. Participation in this
study does not pose any direct benefits or risks to you . All respondents who complete the study will be given a one year subscription to WIRED magazine. All participants’ responses will be kept confidential, and all identifiable information will be removed from the results. Only researchers involved in this study will have access to the data collected from the survey.
If you are interested in participating in our study please contact the principle
This project has been reviewed and approved by the SJSU Institutional Review Board. Questions about your rights as a participant may be sent to IRB Coordinator Alena Filip by email at [email protected] or by phone at (408) 555-2479. If you have any questions about this study, please contact the principal investigator Bonnie MacGregor by email at [email protected]
Thank you for your participation! Best regards, ACRL & SJSU
Collaborative Digital Libraries
18
Appendix B: Survey/Questionnaire The National Science Digital Library’s NCore metadata
Name of interviewer: ____________________ Date of interview: ______________________
Serial No. ______________________________
1) Were you able to retrieve all three records from the NSDL online catalog? Yes No
2) Based on your observation of the records information(s), circle the number that best represents your level of satisfaction with the data provide? [The Likert scale attributes 1 to be the lowest level of satisfaction and 5 being the highest]
1 2 3 4 5
3) Besides the records bibliographic data, for example, author, title, year published, etc - please circle all other types of data that were present in the record.
Provenance data Descriptive statements Comments Opinions Reviews Restrictions Usage rights
4) Based on your answer from the previous question, please describe what you felt was missing from the data or what you would of found helpful that was not included in the record. (Open-ended)
5) After retrieving the three required records, do you feel that each record’s contextual information adequately reveals relationships to other materials within the collection?
Yes No
Collaborative Digital Libraries
19
6.) Were you able to add content to the record?
If yes, please explain how If no, please explain why
7.) When observing a record’s metadata, did you feel that there was too much information provided or too little?
Too much Too little Undecided
8.)Where you able to identifying the originating provider (repository) that a record belonged to? Yes No
9.) Was the format and type of record evident just from observing the metadata? Or were more steps in the IR process needed to locate such information? Please explain. [Open-ended]
10.) NSDL produced rich and dynamic results based on my information request?
Strongly Agree Agree Strongly Disagree No Opinion
Collaborative Digital Libraries
20