IMPACT Final Conference - Richard Boulderstone

16
IMPACT Conference 2011 Richard Boulderstone Director, eStrategy & Programmes October 2011

description

Richard Boulderstone gives first keynote - Strategic Digital Overview at the BL

Transcript of IMPACT Final Conference - Richard Boulderstone

Page 1: IMPACT Final Conference - Richard Boulderstone

IMPACT Conference 2011

Richard Boulderstone

Director, eStrategy & ProgrammesOctober 2011

Page 2: IMPACT Final Conference - Richard Boulderstone

Fantastic Project!

2

Highly collaborative Addressing common set of issues across Europe Will have multi-year benefits for organisations that do

digitisation Will result in much richer and more value-added

applications Will benefits the citizens of Europe for many years to come

Could finish here,…However, would like to talk about:

My views on print, digitisation, OCR, apps and the future…..

Page 3: IMPACT Final Conference - Richard Boulderstone

3

The British Library

Exists for everyone who wants to do research – for academic, personal, and commercial purposes.

Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences…

Receives a copy of every item published in the UK.

Holds over 150 million items, with 3 million items added each year.

Used by over 16,000 people each day (on site and online).

Page 4: IMPACT Final Conference - Richard Boulderstone

4

2020 Mission & Vision

Digitisation provides long-lasting digital copy

Digital content can Support advanced analysis

Digital is easier to access

Digital content has much greater reach

We can only accomplish theseobjectives with partners

Page 5: IMPACT Final Conference - Richard Boulderstone

5

Physical Collections

Physical Item

British Library has 150M Items in Collection Estimated Number Of Pages 5,000M Therefore Average Number of Pages per Item = 33 CENL (Conference Of European National Libraries) Survey

2006 400M Items in National Libraries Estimate 13,200M pages (33 * 400M) Lots to Digitise!

Page 6: IMPACT Final Conference - Richard Boulderstone

6

Digital not Digitalis

Born-Digital Normally contemporary material that we acquire in digital-

form (eJournals, eBooks, Web Sites, &etc).

Digitised Digital image of physical collection item (Newspapers,

Books, Manuscript, Journals, Audio, &etc.)

Not….Digitalization The administration of digitalis (fox glove) or one of its active

constituents to a patient or an animal so that the required physiological changes occur in the body; also, the state of the body resulting from this. (Oxford English Dictionary)

Page 7: IMPACT Final Conference - Richard Boulderstone

7

Digitisation – Create Images

Physical ItemDigitised Item

Digitisation

BL has digitised 57M Objects, around 1% of physical collection However, partnership with Brightsolid - digitising newspaper collection

(fee service) – Up to an additional 40M pages Google to digitise 250,000 books (80M pages)

Cost to digitise, initially much more than £1 per page, more recently less than £1 per page

For entire BL collection – estimated storage required @10Mbytes / page is 50 Petabytes (5 * 10^16)

CENL Survey 2006: 4.8M Items; 2012 Projection: 17M Items (~4%)

Page 8: IMPACT Final Conference - Richard Boulderstone

8

OCR – Gateway to Advanced Digital Functionality

Physical ItemDigitised Item

Digitisation

<?xml version="1.0" encoding="UTF-8" ?>

- <mets:mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:mets="http://www.loc.gov/METS/" xsi:schemaLocation="http://www.loc.gov/

METS/ http://www.loc.gov/standards/mets/ver

sion18/mets.xsd info:lc/xmlns/premi

s-v2

Digital Item

Optical

Character

Recognition

OCR Works very well for modern collections with high accuracy rates However, some way to go for older material (Going Grey? Comparing the OCR

Accuracy Levels of Bitonal and Greyscale Images, Tracy Powell & Gordon Paynter NLNZ)

Vital for Advanced Digital Functionality Impact has made significant progress in this area

How good can it get? Rose Holley NLARequire high accuracy for researchers to trust.Good 98-99%Poor below 90%

Page 9: IMPACT Final Conference - Richard Boulderstone

9

Adding Value To Collection Items

Physical ItemDigitised Item

Digitisation

<?xml version="1.0" encoding="UTF-8" ?>

- <mets:mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:mets="http://www.loc.gov/METS/" xsi:schemaLocation="http://www.loc.gov/

METS/ http://www.loc.gov/standards/mets/ver

sion18/mets.xsd info:lc/xmlns/premi

s-v2

Digital Item

Optical

Character

Recognition

Indexing

Basic Search & Discovery

Text Analysis

Text Mining

Image Comparison

Specialist Applications

Application Programming Interface (API)

Social Networking

Colle

ct &

Sto

re:

Comm

ents

,

Annot

atio

ns,

Additi

ons

Do we need all these applications?

Are they value for money?

Page 10: IMPACT Final Conference - Richard Boulderstone

10

Commercial Break…..

Page 11: IMPACT Final Conference - Richard Boulderstone

11

Value of Digitisation

Splashes and Ripples: Synthesizing the Evidence on the Impact of Digital Resources, 2011 - Eric T. Meyer, Oxford Internet Institute

JISC Funded Review of the Value of Digitisation Projects Examined 12 JISC-Funded Digitisation Projects Various Types of Benefits Analysed:

Quantitative Analytics Income Log Files Scientometrics Surveys Webometrics

Qualitative Content Analysis Feedback Focus Groups Interviews Referrer

Page 12: IMPACT Final Conference - Richard Boulderstone

12

Webometrics for 12 JISC-funded Digitisation Projects

Monthly statistics for 12 JISC-funded Digitisation Projects

Does this tell us whether we should do these projects?....

Page 13: IMPACT Final Conference - Richard Boulderstone

13

Print vs Digital

Factor Print Digital Winner

Durability Good (some not so good – newspapers) but eventually destroyed through use

Requires specialist system to retain for ever – but possible

Tie

Look & Feel Original Item Good simulations possible – also multi-layer digitisation; electronic comparisons provide additional utility

Tie

Search Only catalogue With good ocr - Full Text Digital

Distribution Slow, expensive & cumbersome Fast, cheap, entire internet Digital

Linking, Text mining, social networking

Not Possible Potentially Digital

Revenue Very limited opportunities Already have a number of revenue generating apps

DigitalDigital W

ins!!!

Page 14: IMPACT Final Conference - Richard Boulderstone

14

CENL Survey Digitised Items: Potential

Enormous Potential for digitisation

“If we match the total physical holdings national libraries against digital holdings (objects) of a library it becomes clear that content digitisation still in its infancy and how enormous the potential for digitisation of content in National Libraries is.”

Page 15: IMPACT Final Conference - Richard Boulderstone

15

My Vision

Cost reductions in storage technologies, mass digitisation processes and application development make it possible for the first time to imagine digitising the entire holdings of major Libraries.

This creates the opportunity to allow all citizens to experience, enjoy, learn from and build on the World’s Knowledge.

Page 16: IMPACT Final Conference - Richard Boulderstone

16

Concluding Comments

Digitisation projects have created a fantastic resource for scholars, researchers and the public

European National Libraries, including the British Library, will have digitised around 4% of their collections by 2012

Funding, standards, copyright, technology and interoperability will remain major issues

However these programmes have the potential to radically improve the access to collections across Europe and beyond

We will need to work together to unleash the potential of these resources…..

is a great example of this collaboration