Authorities Futures T Hickey OCLC. Why authorities?

Post on 27-Mar-2015

232 views 3 download

Tags:

Transcript of Authorities Futures T Hickey OCLC. Why authorities?

Authorities Futures

T Hickey OCLC

Why authorities?

Searching

Browsing

Variations on Tchaikovsky

• NACO: Tchaikovsky, Peter Ilich,‡1840-1893• German: Cajkovskij, Petr I.‡1840-1893• French: Cajkovskij‡Piotr Ilʹic‡1840-1893• Cyrillic: Чайкoвский, Пётр Ильич (1840-1893)

More ways to say Chajkowskii

Ciaikovsky, Piotr Ilic 1840-1893Tschaikowsky, Peter Iljitch 1840-1893Tchaikowsky, Peter Iljitch 1840-1893Ciaikovsky, Pjotr Iljc 1840-1893Cajkovskij, P. I. 1840-1893Tsjaikovsky, Peter Iljitsj 1840-1893Czajkowski, Piotr 1840-1893Chaikovsky, P. I. 1840-1893Csajkovszkij, Pjotr Iljics 1840-1893Tsjaikovskiej, Pjotr Iljietsj 1840-1893Tjajkovskij, Pjotr Ilitj 1840-1893Caikovskis, P. 1840-1893Chaikovskii, Petr Ilʹich 1840-1893Tchaikovski, P. 1840-1893Tchaikovski, Piotr Ilyitch 1840-1893Chaikovskii, P. 1840-1893Tchaikovsky, P. 1840-1893Tchaikovsky, Piotr Ilitch 1840-1893Tschaikowsky, Pjotr Iljitsch 1840-1893

Tschajkowskij, Pjotr Iljitsch 1840-1893Tchaikovski, P. I. 1840-1893Ciaikovskij, Piotr 1840-1893Ciaikovskji, Piotr Ilijich 1840-1893Tschaikowski, P. I. 1840-1893Tschaikowski, Peter Illic 1840-1893Tjajkovskij, Peter 1840-1893Chaikovski, Pʹotr Ilich 1840-1893Tschaikousky 1840-1893Tschaijkowskij, P. I. 1840-1893Tschaikowsky, P. I. 1840-1893Chaikovski, P. I. 1840-1893Tchaikovski, Petr Ilitch 1840-1893Ciaikovski, Peter Ilic 1840-1893Tschaikowski, Pjotr 1840-1893Tchaikowsky, Pyotr 1840-1893Sinopov, P. 1840-1893Tchaikovskij, Piotr Ilic 1840-1893柴可夫斯基

Wider coveragePublished, unpublished, objects, licensed, archival

Multiple sourcesMachine generatedInfo. professionals, scholars, researchers, enthusiasts

Broader use of APIsMultiple viewsBetter contextBetter navigationMore mashups

Authorities touch everything

33 Nodes132 CPUs528 Gigabytes memory33 Terabytes disk

100-fold speed up

1 hour → <1 minute 1 day → 15 minutes1 month → 8 hours

Controlling WorldCat Virtual International Authority File WorldCat Identities

Controlling names in WorldCat

• Has been done semi-manually– Encourages review of all links

• For Identities we did this automatically– Research copy of WorldCat– Very aggressive matching

• How to move links to WorldCat?

Pretend you are a Connexion Client

• Program to:– Log in– Search for record– Verify heading hasn’t changed– Insert authorized form– Add link– Do replace

Then just replace 26 million records

• Each update takes two transactions– Retrieve the record– Replace the record

• If it takes 2 seconds/update– 52,000,000 seconds– ~ 2 years

But, we can run multiple clients

• Connexion can handle 40+ of these clients– ~ 20 records/second

• Offline processing has limited capacity– Run 32 clients for 12 hours for 16 updates/second– ~700,000 overnight– Up to a million/day

• 3 million/week• 2-3 months elapsed time

Virtual InternationalAuthorityFile

VIAF

DNB Bib & Authority BnF Bib & Authority LC Bib & Authority

VIAF

• ~7.5 million personal name authority records• ~25 million bibliographic records• ~1.2 million links between files

Match on

• Names and dates in headings• Standard numbers• Titles• Coauthors• Publishers• Personal name as subject

Matching situations

Hickey, Thomas Butler, ‡d 1947-

Tchaikovsky, Peter Ilich

Cajkovskij, Petr I.

Cajkovskij, Petr I./Tchaikovsky, Peter Ilich/Чайкoвский, Пётр ИльичЧайкoвский, Пётр Ильич

What makes a match?

1,338,606 Title 526,234 Double date 67,749 Joint author 47,499 LCCN 15,867 Partial date and partial title 6,454 Partial date and publisher 4,673 Partial title and publisher 4,116 Name as subject 2,158 Standard number

Next steps for VIAF

Merged display Better documentation More participants Geographics

Australian Identities (in WorldCat)

51,399 Keneally, Thomas42,679 Fox, Mem30,301 Travers, P. L.28,998 Lindsay, Jack19,179 Marsden, John16,688 Stead, Christina15,041 Malouf, David14,717 Jennings, Paul13,769 Lawson, Henry12,612 Winton, Tim

Editing

Merged result

Immediately visible in Identities Persistent in Identities Information fed into established channels

Implementation

SRU/SRW server (Z39.50 for the Web) XML returned XSLT style sheets transform it to HTML

Syndication

Searchable via SRU, OpenURL Sitemaps for harvesters HTML for harvesters and mobile devices Links in Wikipedia

More Identities

Thomas HickeyChief ScientistOCLC

hickey@oclc.orghttp://worldcat.org/identities/lccn-n82-54463http://orlabs.oclc.org/viaf/LC|n82054463