Lsr vpresntation

15
Problems and Issues in Selecting, Harvesting, and Cataloging Web Resources Joanne Archer and John Schalow University of Maryland Libraries

Transcript of Lsr vpresntation

Page 1: Lsr vpresntation

Problems and Issues in Selecting, Harvesting, and Cataloging Web

Resources

Joanne Archer and John SchalowUniversity of Maryland Libraries

Page 2: Lsr vpresntation

Jargon

CrawlerWeb Harvesting

Seed

Harvest

Crawl

Page 3: Lsr vpresntation

Wayback Machine

Page 4: Lsr vpresntation

Options for Web Harvesting

In House Program

i.e. Pandora, Web Curator Tool

Pro: flexibility

Con: $$$

i.e. HTTrack, Adobe Web Capture

Pro: inexpensive

Con: not-scalable

Off the Shelf

Software

Third Party

Subscription

i.e. Web Archiving Service

Archive-It

Pro: Ease-of-use

Con: $

Page 5: Lsr vpresntation

Key Questions for Harvesting Projects

unique

ness

ephemerality

research valueharvest frequency

scope

Page 6: Lsr vpresntation

Maryland’s Pilot Harvests(2008-2010)

Historic Preservation Maryland State Documents

Page 7: Lsr vpresntation

Why harvest these areas?

• Collections are unique

• Builds on existing strengths in print collections

• Large amount of material migrating to the web

Page 8: Lsr vpresntation

Key Questions for Harvesting Projects

unique

ness

ephemerality

research valueharvest frequency

scope

Page 9: Lsr vpresntation

Harvesting

Page 10: Lsr vpresntation

Harvesting Challenges:• Javascript• Streaming media• Form and database driven content• Password protected sites• Robot.txt files• Multiple hosts/subdomains

Page 11: Lsr vpresntation

Single host = www.preservemd.org

Multiple hosts = www.umd.edu

www.lib.umd.edu

Page 12: Lsr vpresntation

End-User Access

Page 13: Lsr vpresntation

End-User Access

collection note

subjectheading

general material designation

URLs

uniform title

Page 14: Lsr vpresntation

Conclusions

Challenges• Start up costs• What to collect• Metadata creation

BUT We are well prepared to meet the challenges

Page 15: Lsr vpresntation

Questions?

• Joanne Archer: [email protected]

• John Schalow: [email protected]