The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University...
-
Upload
lucinda-glenn -
Category
Documents
-
view
224 -
download
0
Transcript of The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University...
The Basics of OAI
An Introduction to the Protocol for Metadata Harvesting
Sarah ShreevesUniversity of Illinois at Urbana-Champaign
Basics and Beyond July 27, 2004
July 27, 2004 Basics and Beyond 2
Outline
What the OAI protocol is & what it is not Place in digital library infrastructure How it works (basically) Challenges for data / service providers
July 27, 2004 Basics and Beyond 3
OAI- PMH is a tool
Moves metadata (not content) from a data provider to a service provider (or harvester)
A set of rules that defines the communication between two systems (like FTP and HTTP)
Build once, use for many applications – a building block for digital library services
Facilitates the federation of metadata
July 27, 2004 Basics and Beyond 4
OAI-PMH is not….
Metadata
A search tool
A database
Open Access
July 27, 2004 Basics and Beyond 5
Who uses OAI?
Approximately 400 data providers
Basic building block of the National Science Digital Library (NSDL); OAIster
Incorporated into D-Space and Eprints.org
Part of CONTENTdm, Michigan’s DLXS, and other products
International use
July 27, 2004 Basics and Beyond 6
Basic OAI-PMH Concepts
“Aggregated search” rather than “Federated search”
Data providers – support OAI PMH as a means to expose metadata
Service providers – ‘harvests’ metadata from data providers via the OAI-PMH
OAI-PMH based upon HTTP and XML
OAI-PMH requires use of simple Dublin Core BUT supports and encourages use of other metadata schemas
Unique and Persistent Identifiers and a Datestamp for each OAI record
July 27, 2004 Basics and Beyond 7
AggregatedMetadata
Dig.Man
a Sys.
OA
I D
ata
Pro
vid
er
DataBase
OA
I D
ata
P
rovid
er
XML files
OA
I D
ata
Pro
vid
er
OAI Request
OAI Response
OAI Request
OAI Response
OAI Response
OAI Request
OAI Data Provider
SERVICES
O
A
I
H
A
R
V
E
S
T
E
R
July 27, 2004 Basics and Beyond 8
Examples of OAI Service Providers
OAIster: http://oaister.umdl.umich.edu/o/oaister/
Engineering, Computer Science, and Physics: http://g118.grainger.uiuc.edu/engroai/
Open Language Archives Community:http://www.language-archives.org/
July 27, 2004 Basics and Beyond 9
How OAI Works (Technically)
6 distinct ‘verbs’ or requests
OAI requests are sent via HTTP
Responses are sent in valid XML
Dig.
Mngt.
Sys.
OAI
HARVESTER
OAIData
PROVIDER
Service Provider Data Provider
HTTP Request
(OAI Verb)
HTTP Response
(Valid XML)
AGGREGATED
METADATA
July 27, 2004 Basics and Beyond 10
An OAI Record- <record xmlns="http://www.openarchives.org/OAI/2.0/">- <header>
<identifier>oai:docsouth.unc.edu:12</identifier> <datestamp>2003-04-24T13:15:52Z</datestamp> <setSpec>4</setSpec>
</header>- <metadata>
- <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">
<title>Advice to Soldiers</title> <creator>William Royal</creator> <subject>United States -- History -- Civil War, 1861-1865 -- Religious aspects.</subject> <subject>Confederate States of America -- Religion.</subject> <subject>Soldiers -- Religious life -- Confederate States of America.</subject> <subject>Soldiers -- Confederate States of America -- Conduct of life.</subject> <subject>Confederate States of America -- Church history.</subject> <subject>Sin.</subject> <publisher>[Raleigh, N. C.: s. n., between 1861 and 1865]</publisher> <date>2003-04-24T13:15:52Z</date> <type>Text</type> <format>text/html</format> <identifier>http://docsouth.unc.edu/royal/royal.html</identifier> <language>en-us</language> </oai_dc:dc>
</metadata> </record>
July 27, 2004 Basics and Beyond 11
OAI “VERBS”
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
July 27, 2004 Basics and Beyond 12
Identify
Purpose Return general information about the archive
and its policies (e.g., datestamp granularity)
Parameters None
Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?ve
rb=Identify
July 27, 2004 Basics and Beyond 13
ListSets
Purpose Provide a listing of sets in which records may be
organized (may be hierarchical, overlapping, or flat)
Parameters None
Sample URL: http://aerialphotos.grainger.uiuc.edu/oai.asp?verb
=ListSets
July 27, 2004 Basics and Beyond 14
ListMetadataFormats
Purpose List metadata formats supported by the archive as
well as their schema locations and namespaces
Parameters identifier – for a specific record (O)
Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb
=ListMetadataFormats
July 27, 2004 Basics and Beyond 15
ListIdentifiers
Purpose List headers for all items corresponding to the specified
parameters Parameters
from – start date (O) and/or until – end date (O) set – set to harvest from (O) metadataPrefix – metadata format to list identifiers for
(R) resumptionToken – flow control mechanism (X)
Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListId
entifiers&metadataPrefix=oai_dc
July 27, 2004 Basics and Beyond 16
GetRecord
Purpose Returns the metadata for a single item in the form of an
OAI record Parameters
identifier – unique id for item (R) metadataPrefix – metadata format for the record (R)
Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetR
ecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc
July 27, 2004 Basics and Beyond 17
ListRecords
Purpose Retrieves metadata records for multiple items
Parameters from – start date (O) until – end date (O) set – set to harvest from (O) resumptionToken – flow control mechanism (X) metadataPrefix – metadata format (R)
Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListR
ecords&metadataPrefix=oai_dc
July 27, 2004 Basics and Beyond 18
Other Pieces of OAI
Flow Control
Sets
Multiple metadata schemas
July 27, 2004 Basics and Beyond 19
Challenges for the OAI Community
Relatively recent protocol but no best practices (yet)
‘Shareablity of metadata’ Heterogeneity of items described Loss of Context / Information loss Knowledge structures differ so….
Native metadata schemas differ Controlled vocabularies differ Use and presentation of items differ
July 27, 2004 Basics and Beyond 20
Metadata for different communities
http://digital.lib.umn.edu/IMAGES/reference/mswp/MPW00476.jpg
July 27, 2004 Basics and Beyond 21
Metadata for different communities
http://images.library.uiuc.edu:8081/cgi-bin/viewer.exe?CISOROOT=/tdc&CISOPTR=746
July 27, 2004 Basics and Beyond 22
Loss of Context: Record in OAI aggregation
July 27, 2004 Basics and Beyond 23
Context: Record in native database
July 27, 2004 Basics and Beyond 24
Loss of context / data
July 27, 2004 Basics and Beyond 25
Loss of context / data
July 27, 2004 Basics and Beyond 26
Sense / Completeness of Metadata
identifier:http://images.umdl.umich.edu/cgi/i/image/image-idx?view=entry;subview=detail;cc=fish3ic;entryid=X-0802;viewid=1004_112
publisher: UMMZ Fish Division format: jpeg type: image subject: 1926-05-18 subject: 1926;0812;18;Trib. to Sixteen Cr. Trib. Pine River, Manistee
R.;R10W;S26; S27;JAM26-460;05;T21N;1926/05/18 language: UND description: Flora and Fauna of the Great Lakes Region;
July 27, 2004 Basics and Beyond 27
July 27, 2004 Basics and Beyond 28
Granularity of Description: Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design"
Digital Image of "Cotton Coverlet with Emboridered Butterfly Design"
Description: Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.
Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.
Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.
Coverage: —
Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?
Type: Image
July 27, 2004 Basics and Beyond 29
Granularity of Description: Excerpt of Metadata Record Describing “American Woven Coverlet”
Digital Image of "American Woven Coverlet"
Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973
Source: —
Format: 228 x 169 x 1.2 cm (1,629 g)
Coverage: Euro-American; America, North; United States; Indiana? Illinois?
Date: Early 19th c. CE
Type: cultural; physical object; original
July 27, 2004 Basics and Beyond 30
Range of vocabularies in use
ElementTop three used Controlled Vocabulary (% of respondents who identified C.V.)
SubjectLCSH (73%); LC TGM I (27%); AAT
(17%)
FormatLC TGM II (17%); AAT (10%); MIME
types (8%); AACR2 (8%)
TypeLC TGM II (21%); DCMI Type (13%);
AACR2 (10%)
Personal names
LC Name Authority File (67%)
Geographic names
LCSH (27%); LC Name Authority File (25%); Getty Thesaurus of Geographic Names (15%)
July 27, 2004 Basics and Beyond 31
Data providers can:
Create metadata for interoperability
Reusable metadata - think beyond your local users and environment
Use well structured and defined schemas; move beyond simple DC
Use and identify controlled vocabularies
July 27, 2004 Basics and Beyond 32
Service Providers can…
Analyze metadata and cluster and normalize some aspects
Communicate with data providers about their metadata
Custom interfaces and selective views for target audiences / domains
July 27, 2004 Basics and Beyond 33
Resources
OAI for beginners tutorialhttp://www.oaforum.org/tutorial/
OAI Frequently Asked Questionshttp://www.openarchives.org/documents/FAQ.html
IMLS Digital Collections and Content Projecthttp://imlsdcc.grainger.uiuc.edu/
July 27, 2004 Basics and Beyond 34
Recap
OAI protocol is a tool
OAI is easy - metadata is hard
Better metadata = better interoperability
July 27, 2004 Basics and Beyond 35
Sarah Shreeves
Project CoordinatorIMLS Digital Collections and ContentUniversity of Illinois Library at Urbana-ChampaignEmail: [email protected]: 217-244-7809Website: http://imlsdcc.grainger.uiuc.edu/
Contact Information