Adolf KnollNational Library of the Czech Republic
MANUSCRIPTORIUMSEAMLESS ACCESS TO OLD EUROPEAN
WRITTEN HERITAGE
Digitizing manuscripts 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work 2000 – launch of national programme for
digitization of old manuscripts 2003 – launch of Manuscriptorium DL 2007-2009 – EU ENRICH project to
support aggregation service Today – growing on
Metadata framework 1996 – own SGML approach (a kind of predecessor of
XML) – DOBM language (in 1999 recommended by UNESCO for the Memory of the World programme)
2002 – TEI P4 extended MASTER approach (masterx.dtd)
2009 – TEI P5 schema for description of manuscripts (enrich.xsd) / METS rejected
2012/2013 – inclusion of long-term preservation metadata
Two migrations of complex digital documents until co-development of the fully international solution based on TEI P5.
Providing access In the beginning only off-line Several manuscripts mounted on the
web Researchers showed interest in on/line
access Manuscriptorium Digital Library
launched 10 years ago Manuscript owners had to agree
Manuscriptorium Digital Library
Central database Remote data repositories: those of Manuscriptorium and of partner digital libraries
Metadata
TEI P5 enrich.dtd internal format Document description Structural map Possibly image description
Data
WWW recommended formats (JPG, PNG, GIF)
Tile solution for maps Full texts (TXT, TEI)
The problem Dispersed rare collections in space Users need to travel:
Physically from one place to anotherVirtually from one application to another (different
behaviours, rights, tools, opportunities, etc.) Solution: to take everything under one
interface:Portal: users are navigated to remote applicationsDigital Library: users work in one place
Digital library
Central model, e.g. World Digital Library
Distributed model, Manuscriptorium
Metadata are in the central database
Data (images, full texts) are in the central data repository
Metadata are in the central database
Data (images, full texts) are in partner repositories
Growth secured through repeated harvests of descriptions and structures
Parallel re-use of data
Virtual aggregation
Central database
MNSData
repository
Pz
Pm
P1
P…
PnPo Px
P3P2
P – image repository
Seamless aggregation All metadata indexed in the central
database incl. the structure Images from partner repositories called
into the unique presentation interface Browsing as if everything were on one
place Enhanced use of images
Cooperation OAI harvest of agreed profiles Profiles as large as possible Internal TEI P5 format able to
accommodate:Library descriptions (MARC-based)Scientific descriptions (TEI-based)
Off-line batch ingest where OAI inapplicable
Production for Manuscriptorium
Production for Manuscriptorium Partner has images without suitable
metadata (description & structure) M-TOOL application, now online,
producing TEI P5 (enrich.dtd) compatible files
M-CAN application for upload, control, and offer of xml files (behaviour as if in real Manuscriptorium), while images stored on home servers
User personalization User personal library for:
His virtual collections○ Static○ Dynamic
His virtual documents (any file from any partner library can become a component part of a new document; this one can be described in M-TOOL online in conformity with TEI P5 specification for description of manuscrips – enrich.dtd)
Manuscriptorium placement
MNS
P1P1
P1P1
P1
Pw
P1P1
P1Px
P1P1
P1P1Py
P1P1
P1P1Pz
EUROPEANA
TEL
PRIMO
SUMMON
EBSCO DSCZgateway
CERLMSS
From whom do the data comeCzech Republic Abroad National Library (3320) Moravian Library (470) Strahov Monastery (319) National Museum Library
(272) …
Universidad Complutense, Madrid (2902) Свято-Троицкая Сергиева Лавра (2668) UnivLib Wroclaw (1839) UnivLib Köln (1634) – several administered
collections NL, Italy, Firenze (1566) NL, Spain, Madrid (1444) Reykjavík (1176) – NL + Arne Magnusson
Found. UnivLib Vilnius (1085) UnivLib Heidelberg (1025) eCodices* Switzerland (889) NL, Romania, Bucureşti (393) UnivLib Bratislava (241) UnivLib Zielona Góra (231) …..
23,655 digitized docs, from which 18,077 from abroad, ie. 76.4% (Dec. 2013)
Traffic generators: all visits
1. Direct: 23,47%2. Google: 21,783. Europeana: 13,89%4. NL CZ: 5,95%5. Seznam: 3,58%6. Cs.wikipedia.org: 2,58%7. Vychodoceskearchivy.cz: 2,41%8. Dasp.at: 1,16%9. Facebook: 0,80%10. ....other partners….. 16. TEL: 0,49%
August 2012 – July 1013
Traffic generators: referencing pages 50,52%
1. Europeana: 27,50%2. NL CZ: 11,77%3. Wikipedia CZ: 5,11%……. 6. Facebook: 1,59%13. TEL: 0,98%
August 2012 – July 1013
From which countries do the users come
2009 - 2012 2011 - 2012
1. Inland (CZ) – 54.3%2. Germany – 5.5%3. Poland – 4.3%4. U.S.A. – 4.0%5. France – 2.8%6. Slovakia – 2.7%7. Italy – 2.7%8. Spain – 2.6%9. Austria – 2.5%10. Romania – 2.1%
1. Inland (CZ) – 52.5%2. Germany – 5.5%3. Poland – 4.4%4. U.S.A. – 3.9%5. Italy – 3.2%6. Spain – 2.9%7. Austria – 2.8%8. France – 2.8%9. Romania – 2.5%10. Slovakia – 2.4%
Known problems
Technical/organizational Political/cultural
Partner servers do not function
Permanent URLs of images have been changed without update of the OAI harvested profiles
Funding esp. for faster development
We are not sure about enclosure of documents from Eastern Asia
Some people, institutions or some countries may dislike aggregation operated by a Czech institution
Some people are unwilling to make their collections widely accessible
Near future if funded enough for development … Further aggregation Solution to linguistic problems
Graphemes variationExternal thesauri
Imaging: centrally stored images can be pre-processed to create metadata for search of objects within them
Mark-up of music documents New and more user-friendly interface
www.manuscriptorium.eu The Manuscriptorium Digital Library is
operated by AiP Beroun Ltd. on behalf of the National Library of the Czech Republic
The National Library:does not generate any income from
Manuscriptorium servicesis today the only funding body of
Manuscriptorium operation and development (directly or via projects)
www.manuscriptorium.eu Virtual research environment:
Seamless aggregation, i.e. real-time work on geographically dispersed resources
Saving time and money of researchers (neither physical nor virtual travelling/navigation)
Integrated on-line tools You are welcome to join us [email protected]
August 2013: 24,892 digitized docs; more than 600 fulltexts; 303,542 descriptive records
Top Related