A Systemwide View of Library Collections

35
Ithaka A Systemwide View of Library Collections Brian Lavoie, OCLC Research Roger C. Schonfeld, Ithaka CNI Spring Task Force Meeting April 5, 2005

description

A Systemwide View of Library Collections. Brian Lavoie, OCLC Research Roger C. Schonfeld, Ithaka CNI Spring Task Force Meeting April 5, 2005. Systemwide View of Library Collections. - PowerPoint PPT Presentation

Transcript of A Systemwide View of Library Collections

Ithaka

A Systemwide View of Library Collections

Brian Lavoie, OCLC ResearchRoger C. Schonfeld, Ithaka

CNI Spring Task Force Meeting April 5, 2005

Ithaka

Systemwide View of Library Collections

Print collections have been changing, as the distinction between local and external resources is increasingly blurred due to resource sharing

Digitization combined with network technologies creates opportunities for one “copy” of a resource to be shared across many libraries

These forces inevitably are going to lead to a shift in focus to the resources of the “system,” rather than individual library collections

Ithaka

Mass Digitization

Great deal of public and private investment in digitization programs … e.g., JSTOR, ARTstor - and of course mass digitization spearheaded via GooglePrint

Digitization opportunities unlimited; resources are not …• How to determine priorities? What programs of

digitization will be necessary to meet the needs of the scholarly community?

Ithaka

Print Preservation

From a systemwide perspective, what preservation framework makes most sense for print resources?

How have preservation frameworks changed over time?

As retrospective materials become increasingly available in digital form, will new frameworks for print preservation be necessary?

Ithaka

What Are We Going to Do Today?

The kinds of collaborations necessary to begin to take advantage of a systemwide perspective are very hard, both from economic and political standpoints

We will not be proposing any answers!

Instead, we thought to take advantage of the WorldCat resource – which affords the broadest view of print collections – to build a bridge from a local perspective to the beginnings of a systemwide perspective

Today’s presentation focuses on print books

Ithaka

Data Sources

WorldCat: world’s largest and most comprehensive bibliographic database• > 20,000 libraries worldwide have contributed to the

development of WorldCat

Copy of WorldCat from January 2005:• ~55 million records

Copy of WorldCat holdings file from January 2005:• ~950 million holdings

Ithaka

Data Source Limitations

Not all published materials are cataloged in WorldCat

Not all library holdings are represented in WorldCat

Largely reflects North American library collections

So … WorldCat does not embody the whole universe of library collections and holdings – but it’s a very good approximation!

Ithaka

1. The “Systemwide Collection”

Size Age

Ithaka

54,831,000

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

Total WorldCat Records Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations, in print

format only

How Many “Books” Are Held in the Systemwide Collection?

Ithaka

How Many “Books” Are Held in the Systemwide Collection?

45,269,000

54,831,000

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

Total WorldCat Records Language-based or manuscriptmonographs

Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations, in print

format only

Ithaka

How Many “Books” Are Held in the Systemwide Collection?

35,251,000

45,269,000

54,831,000

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

Total WorldCat Records Language-based or manuscriptmonographs

Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations

Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations, in print

format only

Ithaka

How Many “Books” Are Held in the Systemwide Collection?

31,923,00035,251,000

45,269,000

54,831,000

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

Total WorldCat Records Language-based or manuscriptmonographs

Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations

Language-based or manuscriptmonographs, excluding

government documents andtheses/dissertations, in print

format only

Ithaka

Works and Manifestations

FRBR (Functional Requirements for Bibliographic Records):• Hierarchy of bibliographic entities • Works, Expressions, Manifestations, Items

Work: distinct intellectual or artistic creation• e.g., Macbeth

Manifestation: physical embodiment of an expression of a work• e.g., Macbeth, Folger Shakespeare Library edition, published in

paperback by Washington Square Press (2004)

WorldCat records describe FRBR manifestations

Works identified using OCLC “FRBRization” algorithm• Converts MARC21 bibliographic databases into FRBR “work-sets”• http://www.oclc.org/research/software/frbr/

Ithaka

Most Book Works Have Few Manifestations

31,923,000

26,025,000

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

Manifestations Works

Language-based or manuscript monographs, excluding government documents and theses/dissertations, in print format only

Ithaka

Print Book Manifestations and Works – and Digital Manifestations

31,923,000

26,025,000

121,6890

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

Manifestations Works Digital Manifestations

Language-based or manuscript monographs, excluding government documents and theses/dissertations, in print format only

Ithaka

How Old Are the Components of the Systemwide Collection? Cumulative Book Works/Manifestations Over Time

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

1700

1710

1720

1730

1740

1750

1760

1770

1780

1790

1800

1810

1820

1830

1840

1850

1860

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000

Manifestations

Works

Ithaka

How Old Are the Components of the Systemwide Collection? Book Works/Manifestations per Year

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

1700

1710

1720

1730

1740

1750

1760

1770

1780

1790

1800

1810

1820

1830

1840

1850

1860

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000

Manifestations

Works

Ithaka

Age of Works and Manifestations: Relative to 1923 (millions)

0

5

10

15

20

25

30

Manifestations Works

Pre-1923

1923andAfter

18%

82%

17%

83%

Ithaka

2. Individual Collections Cumulate to Form the System

How will digitization bring them together virtually?

Ithaka

Minimal OverlapBook Works Held by X or More Libraries (in millions)

0

5

10

15

20

25

30

1 ormore

2 ormore

3 ormore

4 ormore

5 ormore

6 ormore

7 ormore

8 ormore

9 ormore

10 ormore

100 ormore

Number of Libraries

Ithaka

Works Held BroadlyBook Works Held by X or More Libraries (in millions)

0

1

2

3

4

5

6

7

10 ormore

50 ormore

100 ormore

200 ormore

300 ormore

400 ormore

500 ormore

Number of Libraries

Ithaka

Works Held BroadlyBook Works Held by X or More Libraries, as Percent of Total Book Works

24%

9%6%

4%2% 2% 1%

0%

5%

10%

15%

20%

25%

30%

10 ormore

50 ormore

100 ormore

200 ormore

300 ormore

400 ormore

500 ormore

Number of Libraries

Ithaka

The Virtual System in Practice

GooglePrint digitization initiative

Questions:• How many print books does this initiative potentially impact?• What proportion of “systemwide print book collection” does this

represent?• Overlap (how much held broadly? how much held uniquely?)

Forthcoming paper from OCLC researchers that will offer some perspective on these questions

Hopefully, work like this will help to establish set of important questions/metrics that need to be addressed when:• Considering digitization initiatives• Considering implications of a changing world of research and

learning for collections

Ithaka

3. How Is Rareness Distributed through the System?

Ithaka

Systemwide Holdings of Print Works

1 holding37%

2 holdings14%

3-5 holdings16%

More than 5 holdings

33%

Ithaka

More than 9 millions works are held only once

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

1 holding 2 holdings 3 holdings 4 holdings 5 holdings 6 to 10holdings

11 to 20holdings

21-50holdings

51-100holdings

100+holdings

Ithaka

4. What Systemwide Preservation Frameworks Have Served Us?

Ithaka

The Growth and Peak in Average Holdings Over Time

0

5

10

15

20

25

30

35

40

45

0 25 50 75 100 125 150 175 200

Age in Years

Av

era

ge

Ho

ldin

gs

Manifestations

Works

Ithaka

Steady, Gradual Nineteenth Century Growth in Works Held Many Times…

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

18

01

-18

10

18

11

-18

20

18

21

-18

30

18

31

-18

40

18

41

-18

50

18

51

-18

60

18

61

-18

70

18

71

-18

80

18

81

-18

90

18

91

-19

00

2 to 10

11 to 50

51 to 100

101 to 200

201 to 400

400 to 1000

1000+

Ithaka

…Rapid Postwar Increase in Works Held Many Times

0

500,000

1,000,000

1,500,000

2,000,000

2,500,0001

91

1-1

92

0

19

21

-19

30

19

31

-19

40

19

41

-19

50

19

51

-19

60

19

61

-19

70

19

71

-19

80

19

81

-19

90

19

91

-20

00

2 to 10

11 to 50

51 to 100

101 to 200

201 to 400

400 to 1000

1000+

Ithaka

Of Works with Multiple Holdings, Steady Increase Through the 1960s in the Proportion Held Many Times

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

18

01

-18

10

18

11

-18

20

18

21

-18

30

18

31

-18

40

18

41

-18

50

18

51

-18

60

18

61

-18

70

18

71

-18

80

18

81

-18

90

18

91

-19

00

19

01

-19

10

19

11

-19

20

19

21

-19

30

19

31

-19

40

19

41

-19

50

19

51

-19

60

19

61

-19

70

19

71

-19

80

19

81

-19

90

19

91

-20

00

1000+

400 to 1000

201 to 400

101 to 200

51 to 100

11 to 50

2 to 10

Ithaka

Summary and Discussion

Ithaka

Summary: Findings

1. Roughly 26 million print title works, represented in 32 million print title manifestations, are held by OCLC member libraries. This should be seen as a minimum in considering the number of printed books over time. Half of the books date from the period since 1977. How can a mass digitization strategy effectively manage the intellectual property ramifications of this finding?

2. Publications are distributed across a wide number of libraries, and any mass digitization strategy that ignores this distributional reality is likely to omit numerous works. How should this finding impact the library system’s planning for a massive format migration?

Ithaka

Summary: Findings

3. Rareness is very common within the system. This has been recognized by many librarians but is not always taken into account in policy development. How will any future print preservation strategy address this reality? Can data on rareness help to inform digitization strategies?

4. Redundancy in holdings across the system has changed over time. How has this led our framework for preservation to become more or less secure? What lessons should be drawn as we consider other print preservation strategies, particularly in the era of mass digitization, such as paper repositories? What lessons might there be for digital preservation?

Ithaka

More information …

More in-depth article forthcoming …

Contact us with comments and questions:• Brian Lavoie: [email protected]• Roger C. Schonfeld: [email protected]