Data Mining the Largest Library Database in the World

23
Data Mining the Largest Library Database in the World Roy Tennant OCLC Research Leveraging WorldCat

description

Presented at the OCLC EMEA Regional Council Meeting, 26 February 2013, Strasbourg, France

Transcript of Data Mining the Largest Library Database in the World

Page 1: Data Mining the Largest Library Database in the World

Data Mining the Largest Library Database in the World

Roy TennantOCLC Research

Leveraging WorldCat

Page 2: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Algorithmically constructed from WorldCat records

Page 3: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

A Union database of authority records

Page 4: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

The Responsible Party

Thom HickeyChief Scientist

OCLC Research

Page 5: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

290+ million records

Page 6: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Language Coverage

Percentage of records for non-English materials

30 June 2012

60.2%

274 million

36.5 million

25.5 million11.3

million4.7 million4.3 million3.6 million3.5 million

Total

GermanFrenchSpanishItalianDutch Russian Latin

Page 7: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Page 8: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 9: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 10: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

(J.K. Rowling)

(Diana Gabaldon)

(Galileo)

Page 11: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 12: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 13: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

Page 14: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

VIAF Participants

Page 15: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 16: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

“Super” Authority File

Page 17: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 18: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 19: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Our Cataloging Future

“Moving from cataloging to catalinking”

Eric Miller, Zepheira

Page 20: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 21: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Some Lessons• Widespread collaboration is essential• Normalizing the data is essential• Normalizing the data is complicated• Everything is interrelated:

– You can’t bring names together if titles don’t match

– You can’t bring titles together if names don’t match

• Batch mode processing still rules (but we’re getting better and faster at it)

Page 22: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Conclusions

• Data mining isn’t just useful, it’s essential• Extracting data from MARC that is useful in

other contexts is possible, but will require sophisticated processing

• Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work

• Thankfully, we are doing it, but there is much more to be done

Page 23: Data Mining the Largest Library Database in the World

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Roy Tennant

[email protected]

@rtennant

roytennant.com