The Future of Archiving

79
Some rights reserved by mattdork hello. Monday, September 19, 2011 Hi, I’m Raj Kumar, and this is George Oates. We work at the Internet Archive, and we’re here today to talk to you about digital archiving, what the Internet Archive is, and how it might help you in your work. There’ll be a little time at the end for Q&A. The Internet Archive, - a 501(c)(3) non-profit, - building a digital library - Like a paper library, we provide free access to researchers, historians, scholars, and the general public. - “universal access to all knowledge”

description

Raj Kumar and I (from the Internet Archive), and Allison Vanderslice (from SF Heritage YP) gave a talk as part of the SF Architectural Heritage lecture series.From the blurb: "Come hear from the Internet Archive’s George Oates about how digital archiving works, see highlights from their San Francisco history collections, and learn about how these resources will influence the future of preservation. Perhaps even Heritage’s own collection could be digitized in the future…the possibilities are endless!"http://www.sfheritage.org/upcoming_events/lecture-series/

Transcript of The Future of Archiving

Page 1: The Future of Archiving

Some rights reserved by mattdork

hello.

Monday, September 19, 2011

Hi, I’m Raj Kumar, and this is George Oates. We work at the Internet Archive, and we’re here today to talk to you about digital archiving, what the Internet Archive is, and how it might help you in your work. There’ll be a little time at the end for Q&A.

The Internet Archive, - a 501(c)(3) non-profit, - building a digital library- Like a paper library, we provide free access to researchers, historians, scholars, and the general public.- “universal access to all knowledge”

Page 2: The Future of Archiving

Why digitize?Monday, September 19, 2011

Why digitize?

- Because it’s a inexpensive way to preserve something forever.- 10 cents a page, including digitization costs, OCR, and lifetime storage costs

Page 3: The Future of Archiving

Why digitize?Monday, September 19, 2011

Why digitize?- It becomes easy to increase public access to archival material.- Don't have to travel to a library- Accessible audio versions of books.- Full text search across almost 3 million texts, and the web archive

Page 4: The Future of Archiving

Some rights reserved by heather

Monday, September 19, 2011

- not a traditional library- all of our materials are available online on archive.org

Page 5: The Future of Archiving

By rkumar

Monday, September 19, 2011

- 2.88 petabytes of hard drives - enough storage for about 2 billion books.- we have 10.5 petabytes online- paired storage

Page 6: The Future of Archiving

archive.orgMonday, September 19, 2011

All our materials are accessible on archive.org- 500,000 movies and videos- 1,000,000 audio recordings- 3 million scanned texts- 150,000,000,000 web pages

Page 7: The Future of Archiving

Monday, September 19, 2011

- Known as the “Wayback Machine”- 165 Billion URLs- Started collecting web pages in 1996- We now crawl the web for LoC and many national libraries (UK, france, spain, chile, Australia)  , for 43 US states, and about 200 other partners.

Page 8: The Future of Archiving

Monday, September 19, 2011

August 17, 2000

Page 9: The Future of Archiving

TV, Movies, Audio

Monday, September 19, 2011

- 500,000 moving images- full length movies, tv shows, home movies, advertisements

- anyone can upload their movie for free

- San Francisco-specific collections:  - Prelinger archive    - Trip down Market St    - Lost Landscapes  - SFGTV and SFGTV2 (board of supervisiors, planning commission meetings, etc)  - UCSF Tabacco archives, BAVC, Ourmedia

Page 10: The Future of Archiving

http://www.archive.org/details/TV-SFGTV

Monday, September 19, 2011

http://www.archive.org/details/TV-SFGTV

New shows are available online an hour after they air.

Page 11: The Future of Archiving

archive.org/911Monday, September 19, 2011

Archive.org/details/911

Understanding 9/11 – Television news archive

Present one week of TV news for study, research, and analysis

- “Television is our pre-eminent medium of information, entertainment and persuasion, but until now it has not been a medium of record. This Archive attempts to address this gap by making TV news coverage of this critical week in September 2001 available to those studying these events and their treatment in the media.”

- 3000 hours of TV news footage from 20 channels around the world

Page 12: The Future of Archiving

Monday, September 19, 2011

http://www.archive.org/search.php?query=san%20francisco%20AND%20mediatype%3Aetree

- 1,000,000 audio recordings- Anyone can upload for free- almost 100,000 live concert recordings  - popularized by the Grateful Dead  - growing by 50/day

- Librivox – 5000 audio books

- Old Time Radio

Page 13: The Future of Archiving

Book Scanning

Monday, September 19, 2011

http://www.archive.org/stream/sanfranciscobloc1906octbloc#page/n7/mode/2up

- Almost 3 million text items- Mostly public-domain books before 1923 with audio (tts) versions

- 300,000 modern audio books for those with NLS print-disabled credentials

Page 14: The Future of Archiving

Monday, September 19, 2011

1,000 books scanned EVERY day24 scanning centers in 5 countries, and we hope for more.high‐resoluCon archival‐quality color scans

Page 15: The Future of Archiving

Monday, September 19, 2011

Zoom in with online bookreaderSearchable PDFs with OCR, Original uncropped camera images available

Page 16: The Future of Archiving

Monday, September 19, 2011

We’re also scanning microfilm, which is much faster than individual books. Here’s an example of the record of the populaCon census from 1790 to 1930. Scanned from microfilm from the collecCons of the Allen County Public Library and originally from the United States NaConal Archives Record AdministraCon.

Page 17: The Future of Archiving

Monday, September 19, 2011

Examples of Cross Writing from Boston Public Library

Page 18: The Future of Archiving

Monday, September 19, 2011

Physical archive- Don't want books to be thrown away after they are digitized- We want libraries that are de-accessioning their materials to send them to us before they send them to a landfill- The physical is the authentic and original version- Goal is 10 Million books

Page 19: The Future of Archiving

Monday, September 19, 2011

Books, boxes, pallets, shipping containers...

Over to you, George!

Page 20: The Future of Archiving

openlibrary.orgMonday, September 19, 2011

http://openlibrary.org/

Hi - I’m George Oates and I run the Open Library project at the Internet Archive. I’d like to talk to you a bit about what can happen once you’ve digitized things. As well as work from the Internet Archive, I’d also like to show you some examples of other digital preservation projects around the web that explore digital preservation...

Page 21: The Future of Archiving

A “Wikipedia for Books”Monday, September 19, 2011

There’s a twist though... this library catalog is editable, by anyone, like a Wikipedia for books.

Page 22: The Future of Archiving

Monday, September 19, 2011

http://openlibrary.org/subjects/search

Page 23: The Future of Archiving

Monday, September 19, 2011

http://openlibrary.org/subjects/place:san_francisco

Page 24: The Future of Archiving

Monday, September 19, 2011

http://openlibrary.org/subjects/place:san_francisco

Page 25: The Future of Archiving

Monday, September 19, 2011

http://openlibrary.org/subjects/place:san_francisco

Page 26: The Future of Archiving

California, San Francisco (Calif.), United States, San Francisco Bay Area, Chinatown (San Francisco, Calif.), New York, Hunters Point (San Francisco, Calif.), San Francisco Bay Area (Calif.), South of Market (San Francisco, Calif.), Mission District (San Francisco, Calif.), Western Addition (San Francisco, Calif.), Hetch Hetchy Valley (Calif.), Presidio of San Francisco (Calif.), Diamond Heights (San Francisco, Calif.), Golden Gate Park (San Francisco, Calif.), New York (State), North Beach (San Francisco, Calif.), Los Angeles, Northern California, Bayview (San Francisco, Calif.)

Monday, September 19, 2011

http://openlibrary.org/subjects/place:san_francisco

Page 27: The Future of Archiving

Monday, September 19, 2011

http://openlibrary.org/borrow

Page 28: The Future of Archiving

Monday, September 19, 2011

Page 29: The Future of Archiving

Monday, September 19, 2011

Page 30: The Future of Archiving

Monday, September 19, 2011

De Young

Page 31: The Future of Archiving

Monday, September 19, 2011

The Zamorano Club is a group of bibliophiles and collectors based in LA. A jewel in their collection is the “Zamorano 80” - the books they feel best represent California history. Named after Agustin Zamorano, most noted for bringing the first printing press to California.

This year, I’ve been working with Mary Elings at the Bancroft library to try to digitize the entire set of these 80 titles. We’re nearly there! And, I’ve collected them into an Open Library list for easy reference and access.

Interesting to note here how related subjects are aggregated from the consitutent titles. The system does that work for us.

http://openlibrary.org/people/george08/lists/OL6387L/Zamorano_80_Editions

Page 32: The Future of Archiving

Monday, September 19, 2011

The annals of San Francisco by Frank Soulé, John H. Gihon, James Nisbet first published in 1855

http://www.archive.org/stream/annalsofsanfranc00soul#page/n27/mode/2up

Page 33: The Future of Archiving

Monday, September 19, 2011

Colonel John Geary, last alcalde & first mayor of San Franciscohttp://www.archive.org/stream/annalsofsanfranc00soul#page/n745/mode/1up

1849 - unanimously elected to the post of First Alcalde - Big Cheese.

Colonel Geary immediately set about the organization of the city, and the establishment of an efficient police force. The task was herculean. Pandemonium had to be quieted - chaos reduced to order. Here was a large maritime city, with a population of about twenty thousand persons, and embracing a strange medley of dangerous and desperate characters - without a solitary officer, or a single law to govern or control them. All these rebellious elements had to be subdued, and good citizens made of daring bravados. This task fell upon the alcalde, who had to perform the duties of every one of the customary officers of a city and county jurisdiction.

On that happy note, I’d like to take a quick tour of some other useful digital preservation projects out there on the internet...

Page 34: The Future of Archiving

flickr.com/commonsMonday, September 19, 2011

Page 35: The Future of Archiving

Monday, September 19, 2011

Page 36: The Future of Archiving

Monday, September 19, 2011

Photograph of the Effect of Earthquake on Houses Built on Loose or Made Ground After the 1906 San Francisco Earthquake, 1906 By The U.S. National Archives

http://www.flickr.com/photos/usnationalarchives/5553722800/in/photostream/

Page 37: The Future of Archiving

Monday, September 19, 2011

By Museum of Photographic Arts Collections in San Diego- circa 1880http://www.flickr.com/photos/mopa1/5711511770/in/photostream/

Page 38: The Future of Archiving

Monday, September 19, 2011

The City from California Street By Museum of Photographic Arts Collections - circa 1880

http://www.flickr.com/photos/mopa1/5710949415/sizes/l/in/photostream/

Page 39: The Future of Archiving

burritojustice.comMonday, September 19, 2011

http://burritojustice.com/2011/06/27/1905-sf-sanborn-maps-now-in-color/

Canadian guy, loves The Mission.

Page 40: The Future of Archiving

Monday, September 19, 2011

http://burritojustice.com/2011/06/27/1905-sf-sanborn-maps-now-in-color/

Page 41: The Future of Archiving

Monday, September 19, 2011

http://www.davidrumsey.com/luna/servlet/view/search?sort=Pub_List_No_InitialSort%2CPub_Date%2CPub_List_No%2CSeries_No&q=Pub_Title%3D%22Insurance+Maps.+San+Francisco%2C+California.+Published+by+Sanborn-Perris+Map+Co.+Limited%2C+115+Broadway%2C+New+York.+1899.+Scale%2C+50+Ft.+to+an+Inch.+Copyright+1899%2C+by+the+Sanborn-Perris+Map+Co.+Limited.%22&pgs=50&res=1

Page 42: The Future of Archiving

Monday, September 19, 2011

http://www.davidrumsey.com/

Page 43: The Future of Archiving

“You can pry my burrito out of my cold, dead hand.”

Monday, September 19, 2011

Jon began studying the old Southern Pacific train station at Valencia and 25th

Page 44: The Future of Archiving

Monday, September 19, 2011

Jon began studying the old Southern Pacific train station at Valencia and 25th

http://burritojustice.com/2011/06/27/1905-sf-sanborn-maps-now-in-color/

Page 45: The Future of Archiving

BernalDweller permalinkJune 27, 2011 10:19 pm

Lots of street renamings in SW Bernal. Jarboe was Jefferson, Tompkins was Union, Ogden was Old Hickory. I’ve spent some time researching street name origins in Bernal…must delve further. Great resource!

Monday, September 19, 2011

The thread is full of interested people throwing in all sorts of information.

Page 46: The Future of Archiving

Monday, September 19, 2011Some rights reserved by Paul Hagon

Mike Migurski put out a call... to help “geo-rectify” the pages of the Sanborn atlas; to conect them with contemporary map tiles, and stamp them with a latitude and longitude.I jumped in to help with the interaction design, how to make it easy to align an old map with a new one.

Page 47: The Future of Archiving

maptcha.orgMonday, September 19, 2011

Page 48: The Future of Archiving

maptcha.orgMonday, September 19, 2011

Page 49: The Future of Archiving

maptcha.orgMonday, September 19, 2011

Page 50: The Future of Archiving

maptcha.orgMonday, September 19, 2011

Page 51: The Future of Archiving

maptcha.orgMonday, September 19, 2011

It was amazing. Within about 2 days of Mike announcing the Sanborn release, about 400 pople added all 700 pages to the contemporary map. (There’s still a bit of confirmation happening, but overall - amazingly fast!)

Page 52: The Future of Archiving

maptcha.orgMonday, September 19, 2011

If you click on any of the little thumbnails, you’ll get to a bigger version and be able to see maps & pages nearby.

Page 53: The Future of Archiving

oldsf.orgMonday, September 19, 2011

OLD SF is a project built by Dan Vanderkam and raven keller. Dan went through the SFPLs phptography collection and “geo-coded” photos wherever he could. That means adding latitude/longitude data. That allowed him to add their photos to a map, like you see here.http://www.oldsf.org/about

Page 54: The Future of Archiving

Monday, September 19, 2011

looking back to that similar view we saw before from the Museum of Photographic Arts Collections

Corner California and Mason looking down Mason to Bay1906 April 27

OldSF.org

http://www.oldsf.org/#ll:37.791835|-122.410818,e:AAC-3157|672,m:37.79001|-122.41202|16

Page 55: The Future of Archiving

Monday, September 19, 2011

View of downtown San Francisco from Stockton and California streetsca. 1920

http://www.oldsf.org/#ll:37.792244|-122.407558,e:AAB-3087|526,m:37.79001|-122.41202|16

Page 56: The Future of Archiving

menus.nypl.orgMonday, September 19, 2011

http://menus.nypl.org/

With approximately 40,000 menus dating from the 1840s to the present, The New York Public Library’s restaurant menu collection is one of the largest in the world, used by historians, chefs, novelists and everyday food enthusiasts. Trouble is, the menus are very difficult to search for the greatest treasures they contain: specific information about dishes, prices, the organization of meals, and all the stories these things tell us about the history of food and culture.

As of Monday September 12, 2011, there have been 542,029 dishes transcribed from 9,557 menus (that’s how many they’ve digitized to date).

Page 57: The Future of Archiving

menus.nypl.orgMonday, September 19, 2011

Page 58: The Future of Archiving

menus.nypl.orgMonday, September 19, 2011

Page 59: The Future of Archiving

menus.nypl.orgMonday, September 19, 2011

Page 60: The Future of Archiving

menus.nypl.orgMonday, September 19, 2011

Corned beef on 3,142 menus, so far.

Page 61: The Future of Archiving

zooniverse.orgMonday, September 19, 2011

zooniverse.org

home to the internet's largest, most popular and most successful citizen science projects

Page 62: The Future of Archiving

oldweather.orgMonday, September 19, 2011

http://www.oldweather.org/vessels/4caf8530cadfd3419700d28d

Page 63: The Future of Archiving

oldweather.orgMonday, September 19, 2011

Page 64: The Future of Archiving

digitization

description

distribution

translation

Re-presentation

Monday, September 19, 2011

To conclude... digital preservation is not just about turning paper into pictures. There’s a lot more opportunity than that.

It’s important to consider how digital materials are described and distributed. - No Known Restrictions / digital proliferation

Enthusiasts out there can supplement your metadata, sometimes to a voracious degree! They can also help with the heavy lifting of transcription. In the digital world, you want *more* descriptions of things than less. The more ways people can find your content in the network, the better. You can see amazing examples of this sort of description working incredibly well on sites that allow tagging and other metadata creation by the public.

Transforming “old data” into new, like attaching a lat/lon to a photo, will allow that digital artifact to be re-presented and re-mixed with other things, and will provide additional context.

And now, I’ll hand over to Allison, from SF Heritage YP, to talk through a case study on using materials from IA and OL...

Page 65: The Future of Archiving

Using the Internet Archive A Case Study: The San Francisco Waterfront

Monday, September 19, 2011

Page 67: The Future of Archiving

Prelinger Collection: 1934 Strike

Monday, September 19, 2011

Page 68: The Future of Archiving

Prelinger Collection: 1934 Strike

Monday, September 19, 2011

Page 69: The Future of Archiving

Prelinger Collection: San Francisco Scenes, 1920s

Monday, September 19, 2011

Page 71: The Future of Archiving

Harbor Rules, Regulations and rates

Monday, September 19, 2011

Page 72: The Future of Archiving

San Francisco City Directories

Monday, September 19, 2011

Page 73: The Future of Archiving

The California Architect and Building News

Monday, September 19, 2011

Page 74: The Future of Archiving

SUBJECTS

AUTHORS ADD A BOOK

LISTS RECENTLY ABOUT US One web page for every book.

Search Results13 hits Relevance | Most Editions | First Published | Most Recent

Ferry Building complex by San Francisco (Calif.). Dept. of City Planning.1 edition - first published in 1983

Union depot and ferry house, San Francisco by San Francisco PortCommission.1 edition - first published in 1978

The Ferry Building by Nancy Olmsted1 edition - first published in 1998

Ferry Building marketplace by William Wilson & Associates.1 edition - first published in 1998

Ferry Building State Park by Joint Committee of the Northern CaliforniaChapter of the American Institute of Architects and the California Association ofLandscape Architects.1 edition - first published in 1955

Request for qualifications by San Francisco Port Commission.1 edition - first published in 1978

City Walks: San Francisco by Christina Henry de Tessan1 edition - first published in 2004

Remembered Treasures of San Francisco by Tro Harper1 edition - first published in 2003

Aircraft accident report by United States. National Transportation SafetyBoard.99 editions - first published in 1975

Fort Point by Mary K. Grassick3 editions - first published in 1994

Zoom InFocus your results using these filters

EBOOK?

yes 0no 13

AUTHOR

Mary K. Grassick 3San Francisco Port Commission. 2Tro Harper 1San Francisco (Calif.). Dept. of CityPlanning. 1United States. NationalTransportation Safety Board. 1more

SUBJECTS

Buildings, structures 7Ferry Station Post Office Building (SanFrancisco, Calif.) 6History 3Waterfronts 3Historic sites 2more

PLACES

California 8San Francisco 7San Francisco (Calif.) 5Golden Gate National Recreation Area(Calif.) 2United States 2more

TIMES

1983 120th century 1

FIRST PUBLISHED

1978 21998 21905 1

Ferry Building San Francisco Search Only show ebooks

Search insideover 2 million books

Search More search options Only show eBooks

Log in / Sign Up

Monday, September 19, 2011

Page 75: The Future of Archiving

Open Library: History of the San Francisco District

Monday, September 19, 2011

Page 76: The Future of Archiving

Monday, September 19, 2011

Page 77: The Future of Archiving

Monday, September 19, 2011

Page 78: The Future of Archiving

Monday, September 19, 2011