The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents...

34
The Digitisation Process at the Biblioteca Nacional de España. HISPANIC DIGITAL LIBRARY Last update 26/03/2013

Transcript of The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents...

Page 1: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the

Biblioteca Nacional de España.

HISPANIC DIGITAL LIBRARY

Last update 26/03/2013

Page 2: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page2

1. CONTENTS

The Digitisation Process at the Biblioteca Nacional de España. ......... 1 HISPANIC DIGITAL LIBRARY ............................................................. 1 1. Contents ........................................................................................ 2 2. The Hispanic Digital Library (BDH) ................................................ 3 3. Prior to digitisation........................................................................ 4 4. Digitisation .................................................................................... 9 5. Master and Derivative Files……………………………………………………14 6. Quality Control……………………………………………………………………..17 7. Metadata……………………………………………………………………………..19 8. Technological Environment…………………………………………………….25 9. Master File Transfer………………………………………………………………28 10. Search Engine……………………………………………………………………..29 Glossary of Terms and Abbreviations…………………………………………..32

Page 3: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page3

2. THE HISPANIC DIGITAL LIBRARY (BDH)

The Hispanic Digital Library (BDH) is an online resource enabling users to consult tens of thousands of documents conserved in the holdings of the Biblioteca Nacional de España (BNE) free of charge. This portal was created in 2008, its aim being to contribute to the BNE's task of conserving, managing and disseminating Spain's Bibliographic Heritage on any medium, and also to showcase a systematic digitisation project which began at that time. It originally consisted of the collection of digitised works gradually built up by the BNE over the years as the result of isolated digitisation or copying projects carried out by the Preservation and Conservation Department. The launch of the portal in 2008 was a clear indication of the BNE's intention to both create a unique site for consulting digital works and to systematically digitise its holdings. The project was driven forward by additional funding from Telefónica, which sponsored the undertaking for five years from 2008 to 2012. At this stage, systematic digitisation of BNE collections should be considered not so much a project and more a process which affects almost all the BNE Departments and Areas. Today, the whole process, from deciding which works to digitise to making these accessible from the portal, has been implemented in nearly all the different services of the Library. This process should therefore be understood not only in terms of improving the services offered to users, but in terms of the effort required by both the BNE and its staff to adapt to the change. Although originally based on existing collections, since its launch the project has called for innovations at every stage of its development. We have had to define work procedures, set selection criteria, quality controls, design and implement a new search interface, etc. In addition to the difficulties inherent to the task, it is important to remember that technology is constantly evolving, and criteria and processes have to be continually reviewed and updating. For some time now, the BNE has been involved in another large project involving the systematic digitisation of old newspapers, co-ordinated by the Newspaper and Magazine Department. This project, the Digital Periodical and Newspaper Library, has succeeded in scanning millions of pages of great historical value that are highly appreciated by users. Since January 2011, BDH users have been able to consult the publications digitised under this project, thereby achieving the initial goal of giving access to all digitised documents in the BNE from a single point of entry. As a result, the BDH has, in turn, become a small (or not so small) library that has to organise its holdings, describe them, maintain a catalogue, promote access and provide user services; tasks which, as we mentioned earlier, in one way or another involve all BNE departments.

Page 4: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

The BDH was developed under a public-private financial partnership, the scope of which was also highly innovative. This agreement has greatly benefited the BNE by enabling it to receive significant funding from Teléfonica while maintaining the autonomy required to organise the project and to establish the quality criteria demanded by an institution of this level. The BNE has also retained full ownership of the digitised images. These documents represent the BNE's contribution to Europeana, the European Digital Library, the aim of which is to enable users to access the digital resources held in archives, libraries, museums and audio-visual archives all over Europe from a single interface. It is also represented in Hispana, the Ministry of Culture's digital object aggregator. In this document we describe the workflow required to enable a holding to be made available to a BDH user, together with the selection, technical and quality criteria used throughout the process.

3. PRIOR TO DIGITISATION

The creation of digital collections involves many different departments and can broadly speaking be divided into the following stages:

3.1 Selection Criteria

As mentioned above, this was the first time the BNE embarked on a digitisation project of this scale, and during the process we have had to create procedures and establish criteria which have at times evolved to satisfy new demands and take advantage of new techniques.

From the beginning, collections were chosen using a thematic approach, and this enabled documents to be organised according to shared features. As different kinds of documents were added, the very concept of what makes a collection had to evolve and broaden as the number of digitised documents increased and new material came in.

The BNE's digitisation project is focussed solely on holdings of public domain, i.e. free of copyright. This refers to works conserved in the BNE whose authors died 70 or 80 years ago (depending on their date of birth) and, of course, provided that they are not new editions also protected by the Consolidated Text of the Intellectual Property Act (Act 23/2006, of 7 July) currently in force in Spain.

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page4

Page 5: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page5

We should however point out that the BNE has undertaken a pilot project to offer copyright protected content. This project, Enclave, carried out in partnership with publishers, has enabled 2812 works to be added which can now be found on the BDH portal. On the portal, users can consult around 20% of the work, and if interested, can navigate to the publisher's page where they can purchase it.

Aside from public domain, there are other general guidelines for document selection:

• The relevance of their content. Many of the collections have been selected following detailed research by the Bibliographic Information Service, which created lists of documents that are particularly pertinent to a particular topic: Leisure, travel, science, Latin American independence. At other times they are selected to offer a complete collection by one particular author.

• The interest of the material. The BNE's Reading Rooms selected documents that are interesting in themselves: Manuscripts, incunabula, architectural drawings, German engraving…

• Another general criterion which comes into play when it comes to selecting documents is the interest they may have for users. This criterion can be applied on the basis of our librarians' knowledge of their collections and users.

• Another criterion determining document selection is heritage value, the aim being that works traditionally considered to be masterpieces should be represented in the BDH.

• Aspects regarding the physical conservation of the document digitised. Given that a document that has been digitised will be consulted less frequently, digitisation is an excellent way of preserving these delicate works.

• The application of different criteria may sometimes lead to different decisions being taken. This is the case, for instance, in the choice of the editions to be digitised. In collections focussed on a single subject, a single edition was digitised. However, when presenting the entire works of a particular author, successive editions of a work are digitised on the understanding that the variations may be of interest to a specialist.

• In the historical newspaper digitisation project the general criteria are largely the same, with the addition of some minor details. The aim is to present the evolution of the Spanish press from its beginnings up to the early 20th century, always in compliance with Spanish intellectual property laws. The criterion used has been to select newspapers and magazines representative of their time; those that illustrate the broad subject matter of Hispanic periodical and newspaper publication, and of which there are complete collections. Visitors to the periodical and newspaper library will find political, satirical, humorous, scientific, religious, illustrated, entertainment, sports, artistic and literary publications, amongst others.

Page 6: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page6

3.2 Removal from the automated catalogue

The titles selected are marked on Unicorn (SIGB) in a local MARC field (899), with a code assigned for that purpose, and then removed.

Once a record is considered ideal for the collection, it is important to check whether it has already been digitised by the BNE.

When the title selection phase concludes, the Automation and Process Organisation Area will remove the records marked following the criteria set by the Digital Library Area.

Once removed, the records are uploaded to an internal database, and the task of digesting a particular work begins.

3.3 Lending works

The staff in charge of each room are responsible for lending the selected titles, as well for supervising placement of the holdings, the return and transfer of the holdings to the different people involved in the work flow (services in charge of the technical process and/or digitisers).

In the digitisation project holdings are lent in two stages:

o Selection of holdings for digitisation: At this point holdings are handed over to the Technical Processing Department (modern holding) or to the Department specialising in the particular type of material (Bibliographic Heritage, Fine Arts and Cartography, Music and Audio-visual materials), for the documents to be reviewed and the candidate copy to be selected. Each room selects an average number of holdings to be removed daily, according to the volume of work that can be handled by the digitising and the materials available. For modern holdings, this average is 40 works a day. An essential prerequisite for this mass movement of holdings is to attach the bar code containing the copy's IDITEM.

o Gathering the holdings that make up the daily batch(es) for digitisation.

3.4 Review and selection of copies

In this phase, the Technical Processing Department (in the case of modern holdings) or the specialist department (for unique works, including antique items) creates the bibliographic record. This task consists of reviewing the works of interest, detecting duplicate records and registering and deregistering records and/or call numbers. From this it can be seen that the digitisation process also involves overhauling the bibliographic catalogue.

Following this, the copies are examined and the most suitable is selected for digitisation. In the case of modern holdings, this manual inspection is performed directly by an expert from the Preservation and Conservation Department. In other cases, experts are called in whenever required by the staff of the other departments.

Page 7: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page7

3.4.1 Guidelines for deciding whether a copy is suitable for digitisation:

If the book has recently been microfilmed and the state of conservation and characteristics of the original (angle of opening) has enabled suitable microfilms to be taken, the copy rather than the original should be digitised, except in cases in which the original document has colour pictures, photographs or engravings.

o Scanning should not be performed en masse, and care should be taken when selecting the following documents:

1. Brittle books. If the pages have parts missing, tear easily or fall out of the

book, it cannot be scanned automatically. If possible, a copy must be found that is in better condition or, failing that, it must be scanned by staff from the Preservation and Conservation Department using their equipment.

2. Books with brittle spines: Gutta percha or, perfect bindings, paperbacks

and stiff spine bindings (especially between the 15th and 18th centuries). Excessive reinforcement along the spine and gluing may cause the binding to break. All copies that cannot be opened easily to an angle of 135º are rejected.

3. Some bindings use acid glue along the spine. The seam of the copy is

examined and all copies with loose sheets that have torn at the fold will be rejected. This problem is usually more frequent on the first and last pages.

4. Bindings broken in the region of the joint (hinge) of the front or back

cover.

5. Copies seriously damaged by micro-organisms or insects, evidencing fragments of paper, loose or weak pages.

6. Copies with serious physical problems such as torn out, loose or missing

sheets.

7. Copies with fold-out maps or engravings.

8. Copies seriously deformed due to water or incorrect placement. Prints from the 17th and 18th century may show significant deformations in the text.

9. Stiff paper. Although free from acidity problems, books made of thick

paper that does not bend easily cannot be scanned.

10. When there are several copies of a work and one of them is a miscellaneous edition, we will choose another option..

11. When there are several copies of the same work and one of them is bound

in the “Agapito” style (instead of a spine (leather or cardboard) there are remains of rubber) this will be the copy to choose, once it has been confirmed to be in a suitable state of conservation.

12. When dealing with pamphlets – call number VC– and there are several

copies, we will always choose the one that is bound.

Page 8: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page8

13. When there are several copies of the same work and one of them has the

uncut pages, another copy will be chosen. If this is not possible, the incident is noted and the book is sent to the Preservation and Conservation Department.

14. When there is only one copy of a particular work, it must be digitised with

special care. It will require special marking or identification.

15. If there are no other options and copies enclosed in conservation boxes have to be used, special care is required, because the boxes may be protecting copies with valuable bindings, very deteriorated or unique copies.

In these cases we can suppose that: If the copy has a red dot (withdrawn from reference):

- If it is a unique copy and it is in good condition, it has been given a red dot to guarantee its conservation. In this case, it should be digitised from the existing microfilm, except where the original document contains colour pictures, photographs or engravings.

- If it is not a unique copy, it will have been withdrawn from

reference because it is acid or deteriorated. In that case, another copy that is in better condition should chosen, or it should be digitised from the existing microfilm.

If the copy has a green dot – - If it is a unique copy it must be scanned with particular care by

specialised staff.

- If it is not a unique copy, it would be better to choose another copy for digitisation because in principle the green dot would indicate the best copy, which is the one that has been copied onto microfilm.

3.5 Cataloguing and Classification

Once the most suitable copy has been selected for scanning, the staff in charge of cataloguing and classifying the records:

- Correct and complete the bibliographic description

- Assign the work to a subject-based collection following the abbreviated subject schema used by the BDH, which is based on the UDC (Universal Decimal Classification).

Page 9: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page9

3.6 Planning and batches

Once the titles have been reviewed and the copies selected, in the Digital Library Area:

o Associated descriptive metadata files are produced in MARC 21 format. Before they are sent to the company responsible for uploading them to the BDH Digital Object Management System (DOMS), the files are checked for 899 markers and the presence/absence of 856. We thus minimise the risk of introducing works into the digitisation chain for which there is already a quality copy. These files are obtained through UNICORN, and they are associated to each of the digitisation batches. They are subsequently transformed into MARC21XML format, with the data required to be uploaded to the BNE's Digitial Object Management System, or DOMS.

o Daily batches to be digitised are formed, keeping the single works and the multiple volume works in separate batches, and organised by collection. Batches are prepared based on two parameters:

The maximum number of pages to digitise per day.

The maximum number of copies a day the lending rooms can provide.

These current limits are: 12,000 pages and/or 80 copies a day. Once the batches have been put together, all those involved in the process receive a list with the basic details to enable them to identify the works and the copies for digitising each day. The areas involved are: Lending rooms, the Preservation and Conservation Department and digitisers.

3.7 Loan of works to the digitiser

Once the list of batches has been drawn up, the formalities to allow the digitiser to borrow the works are processed, both physically and online, and the actual scanning process can begin.

4. DIGITISATION

We shall now go on to describe the general process followed, from digitisation to uploading to Digitool, the BNE's DOMS:

1. Firs,t the work is scanned, obtaining images in TIFF format, at an optical resolution of 300-400 dpi, in grayscale or in colour, depending on the type of work. During scanning the work will be handled according to the conservation procedure in place in the BNE.

Page 10: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page10

2. The TIFF MASTER images are individually subjected to a quality control

process to detect any skipped pages or blurred images; any such mistakes are corrected immediately by re-scanning the image. The TIFF MASTER images are immediately stored on a server.

3. The images are then straightened using ACDsee, and any corrections required are made using PhotoShop.

4. After scanning, the TIFF MASTER images are cut in half, i.e. one file for each

page. They are cut using either WinCorte or PhotoShop, thus producing a derivative of the original images. Two types of uncompressed TIFF images are produced, one with a colour chart and metric scale, and a second cut to a single page, with no colour chart or metric scale.

Page 11: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

5. Each digitised image (both MASTER TIFF and trimmed TIFF), is marked with

the Biblioteca Nacional copy call number. 6. The images are then rechecked to ensure that no black borders remain after

automated trimming. Any borders found are trimmed manually. 7. The image is then processed in order to improve the text without losing

information, removing any stains and dirt acquired over time.

8. The next process analyses the tilt of the page, which is corrected if

necessary. The text is centred manually and each image is resized to a standard size, except for maps and colour prints.

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page11

Page 12: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

This is done by using an average page size to achieve as true an image of the book as possible. The programme was expressly created for this purpose in the BNE.

9. The images are automatically processed to convert them to black and white. This process eliminates any dirt or stains that may remain after the TIFF images are processed.

10. Images for publication are generated in PDF or JPEG format. They then go through the OCR (Optical Character Recognition) process, markers are created on the PDF files and the BNE watermark is inserted using a GIF image file.

11. The titles of the images are then verified using a process that checks that all the files for each copy have exactly same name, their sequential number starts at 0000, and that no numbers are missing.

12. A PREMIS preservation metadata structure is generated from each MASTER TIFF file.

13. Each PDF/JPEG file is then associated with its MARC record(s), generating the corresponding METS/MARC/COMPLEX/SIMPLE structure.

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page12

Page 13: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page13

14. Before being uploaded to the DOMS, 20% of the titles in the batch are submitted to a quality control review, and within that sample 20% of the pages, including markers, are checked.

Image reliability must be 99.25%. And markers must be 100% reliable. If these quality standards are not met, the image is redigitised.

15. Once quality control has been passed, the digitised works are uploaded to the DOMS (Digitool).

4.1 Marker Creation Criteria Depending on whether the book has an index or not and the number of pages,

the criteria for marker creation are as follows:

o If the book does NOT have an index, markers are generated that correspond to the following physical or logical structure of the book:

Binding

Cover (title page and author)

General index

Illustrations section (when they appear together)

Bibliography

Prologue introduction

Appendix

Errata

Intellectual division of the contents

Markers will be made respecting the logical order of the book and will be created if the book comprises such parts.

o If the book DOES have an index, markers are set according to the number of pages:

If it has less than 500 pages, 25 markers are made.

If it has more than 50 pages, markers are created for 5% of the pages in the book.

Page 14: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

Example of a marker indicating the preliminaries of a work

5. MASTER AND DERIVATIVE FILES

After a work is digitised, two types of files are produced:

5.1 Preservation Files

A master preservation file (TIFF master) is that which has been prepared to the maximum quality possible for the purposes defined in each case.

DOCUMENT TYPE OBJECTIVE RESOLUTION COLOUR DEPTH NOTES

Image of the Text

300 ppp minimum

8-bit grey scale * Printed text WITHOUT illustrations, newspapers, pamphlets, typed pages

Text with OCR 400 ppi 8-bit grey scale *

Access to the content

300 ppp minimum

8-bit grey scale * Music: Music scores, annotated scales, music manuscripts

Recognition of its material characteristics

400 ppi 8-bit grey scale *

*Colour (24 bits) when the colour is an important feature of the document

Access to the content

300 ppp minimum

8-bit grey scale * Manuscripts: Hand-written, typewritten copies

Recognition of its material characteristics

400 ppp 8-bit grey scale *

*Colour (24 bits) when the colour is an important feature of the document

Search 250 ppp minimum *

24-bit colour

Maps: Printed characters, printed colour up to a size of 56 cm x 87 cm

Copy 400 ppp 24-bit colour minimum

*The resolution (ppp) depends on the size of the map, above all in cases in which the map sections have

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page14

Page 15: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

to be joined together and the file is heavier than 500 MB

Access to the content

300 ppp minimum

8-bit grey scale *

Photographs: Continuous tone, colour Copy

Maximum possible

24-bit colour minimum

*Colour (24 bits) when the colour is an important feature of the document

Access to the content

300 ppi minimum

8-bit grey scale *

Graphic material Copy

Maximum possible

24-bit colour

*Colour (24 bits) when the colour is an important feature of the document

Recognition of its material characteristics

300 ppp minimum

24-bit colour Special or rare books: Objects of great value Research into

their material characteristics

600 ppp minimum

24-bit colour minimum

With no further alterations once it has been digitised, it becomes a backup copy and is used to produce derivatives or files for publication. The recommendations given below have been drawn up following the guidelines laid down by different libraries engaged in similar projects. These recommendations should be considered minimum values and may vary and technology and/or the needs of the institution and/or its users change.

5.1.1 Technical Aspects The following table shows the correct technical features for each document type in the first column.

The column headed Objective shows the purpose of the copy. The Resolution column indicates optical, not interpolated, values on a 1:1 scale. Generally speaking, images include a colour scale if they are digitised in colour, and if not, a grey scale. The best image format for preservation is currently considered TIFF. Nevertheless, in the future other formats capable of guaranteeing image quality may become available and must be considered.

5.2 Files for publication

Files for publication are produced from the TIFF master images and are processed according to the digitisation specifications laid down by the Hispanic Digital Library.

Files for publication are uploaded to the Hispanic Digital Library as two kinds of objects:

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page15

Page 16: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page16

o Simple objects: A bibliographic record with a single digital file (PDF or JPEG).

o METS: A bibliographic record with several digital files (PDFs or JPEGs).

5.2.1 The Format of Files for Publication

File for publication have two formats, depending on the type of material: PDF and JPEG.

DOCUMENT TYPE FILE

Printed text taken from microform PDF with markers and OCR

Printed text taken from the original (including printed music scores)

PDF with markers and OCR

Incunabula digitised directly from the original support medium

PDF with markers without OCR

Incunabula digitised from microfilms PDF with markers without OCR

Graphic material digitised directly from the original medium (engravings, prints,

drawings, photographs, posters)

JPEG at 150-200 ppp

Graphic material taken from the negative JPEG at 150-200 ppp

Maps and plans

JPEG at 150-200 ppp. If the place names and details on the map or plan

cannot be read correctly, the JPEG quality will be increased.

Manuscripts digitised directly from the original medium (including hand-written

music) JPEG at 150-200 ppp

5.2.1.1 PDF Files for Publication

For prints, colour illustrations and colourful covers (together with any other outstanding motif where details would be lost if shown in black and white), the image are generated in colour or grey scale in order to achieve a true likeness of the original physical document digitised. This means that some of the PDFs posted on the website may be black and white, others may have front covers in colour and the rest of the work in black and white, or others may be in black and white with inside pages in colour or grey scale.

5.2.1.2 Creation of PDF Files

Cleaning the PDF Files PDFs do not include bindings or any blank pages before the title page that do no contain any information. Likewise, they do not include blank pages following the last page containing

Page 17: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

information. Other blank sheets in the work will be included in the PDF to avoid altering page numbering.

Watermark on the PDFs: All the pages in the PDF, whether in black and white, grey scale or colour, must have the BNE watermark at the bottom of each page.

Markers: PDF files contain markers with information about the chapters/parts/sections.

5.2.1.3 JPEG Files for Publication

JPEG files for publication are created at a resolution of 150-200 ppp to ensure a quality online viewing experience.

Manuscripts and old books must keep their binding and endpapers or blank sheets, as in most cases they contain information of interest for identifying the possible origin of the book, or how it was made.

5.2.1.4 Watermark on JPEG Files for Publication

The watermark is inserted in the bottom right-hand corner. This should never cover or overlap information from the original.

The weight of the images needs to be controlled; they must not be too heavy. In the interest of weight, in some cases the quality is reduced to around 150 ppp, provided the image does not become pixelled when enlarged.

6. QUALITY CONTROL

The files obtained from the digitisation process pass through quality control before and after being uploaded to Digitool (DOMS); the quality control includes the following tasks:

6.1 Quality Control Prior to Uploading to Digitool

o Depending on the total number of files to be uploaded to the system, we ask the IT Co-ordination Unit to check the Digitool and Oracle database storage conditions. To do this, they are sent the total number of PDF files, the no. of ingest, simple, complex, METS, together with the size of these files in GB.

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page17

Page 18: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page18

o The call numbers to be uploaded are checked and corrected.

o Xml file check: A 5-10% sample is taken of each of the batches of xml files to be uploaded to Digitool. When checking the xml files, we focus on the following tags:

Header: We check that there is no blank space (e.g. it should be 4500, not 45 0) in the sequences of positions 21-24 of the header.

Tag 008: We check that the positions corresponding to the publishing date are correctly expressed (there should be no blank spaces and the four digits must be complete).

Tag 080: We ensure that this tag appears with its subject code (UDC) or, failing this, that tag 899 contains the simplified subject code created for this purpose and also based on the UDC. We can look up these codes and simplified marks in the working document drawn up for this purpose.

Tag 300: We check this tag to see whether the work is a multiple volume or not, and that the Digitool upload type (simple, mets) is correct.

Tag 856: This tag enables us to check that the record has not already been uploaded to the BDH. It must exactly match the name of the PDF or JPEG file to which the metadata record is going to be associated (call number in digital format). Another important aspect involves manuscripts. In these cases it is best to verify that if the work appears in the BNE General Manuscript Inventory Catalogue, the record has the corresponding 856 link.

6.2 Quality Control After Uploading to Digitool

Digitised objects are uploaded to the Digitool preproduction server. Once uploaded, around 5-10% of the batch contents are analysed, paying particular attention to the following points:

o The metadata are viewed correctly

o The files are viewed correctly

o The markers on the PDF files are consistent with the established criteria.

Once all the problems detected have been resolved, the uploads are migrated from the preproduction to the production environment, which means that the digitised documents are made available to Hispanic Digital Library users.

Page 19: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page19

7. METADATA

Metadata are the set of data associated with digital objects, the purpose of which is to enable digital collections to be described, browsed, used and managed. Metadata are the tools used for specifying the contextual information associated with each document: the content, the transformation log of each digital object, the specifications of the hardware needed to build the emulators, the format of each file, the programmes giving access to each record. The BDH's digital objects contain descriptive metadata and preservation metadata (PREMIS).

7.1 Descriptive Metadata Descriptive metadata in Marc XML format are produced for each of the digitised works. The “.mrc” files of the works to be digitised are obtained from Unicorn (ISO 2709). These files are broken down into two parts:

o .mrc, which correspond to simple objects (documents comprising a single image). An XML file is generated that will include all the simple objects.

o .mrc, which correspond to complex objects (documents comprising several images). An XML file is generated following the METS XML structure for each complex object to be uploaded to the Hispanic Digital Library.

In order to adapt the descriptive metadata format to the specific Digitool (DOMS) upload characteristics, the following fields are entered into each record:

o Link between image and record (only for simple documents):

<datafield tag=”856” ind1=”4” ind2=”1”> <subfield code=”u”>Invent_029394.jpeg</subfield></datafield>

o Document type:

<datafield tag=”655” ind1=”1” ind2=”7”> <subfield code=”a”>Drawings, engravings and

photographs</subfield></datafield>

7.2 Preservation Metadata (PREMIS) PREMIS preservation metadata that accompany the master files produced in the digitisation phase are also generated. Both file types, master and PREMIS metadata, will be uploaded to the systems designed for this purpose.

Below is a diagram showing the structure of PREMIS metadata that are included in each digitised work. Abbreviations: M=Mandatory / O=Optional / R =Repeatable / NR = Not Repeatable:

1.1 objectIdentifier (M, R)

1.1.1 objectIdentifierType (M, NR)

1.1.2 objectIdentifierValue (M, NR)

Page 20: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page20

1.2 objectCategory (M, NR)

1.3 preservationLevel (O, R) [representation, file]

1.3.1 preservationLevelValue (M, NR) [representation, file]

1.4 significantProperties (O, R)

1.5 objectCharacteristics (M, R) [file, bitstream]

1.5.1 compositionLevel (M, NR) [file, bitstream]

1.5.2 fixity (O, R) [file, bitstream]

1.5.2.1 messageDigestAlgorithm (M, NR) [file, bitstream]

1.5.2.2 messageDigest (M, NR) [file, bitstream]

1.5.3 size (O, NR) [file, bitstream]

1.5.4 format (M, R) [file, bitstream]

1.5.4.1 formatDesignation (O, NR) [file, bitstream]

1.5.4.1.1 formatName (M, NR) [file, bitstream]

1.5.4.1.2 formatVersion (O, NR) [file, bitstream]

1.5.4.2 formatRegistry (O, NR) [file, bitstream]

1.5.4.2.1 formatRegistryName (M, NR) [file, bitstream]

1.5.4.2.2 formatRegistryKey (M, NR) [file, bitstream]

1.5.4.2.3 formatRegistryRole (O, NR) [file, bitstream]

1.5.5 creatingApplication (O, R) [file, bitstream]

1.5.5.1 creatingApplicationName (O, NR) [file, bitstream]

1.5.5.2 creatingApplicationVersion (O, NR) [file, bitstream]

1.5.5.3 dateCreatedByApplication (O, NR) [file, bitstream]

1.5.6 inhibitors (O, R) [file, bitstream]

1.5.6.1 inhibitorType (M, NR) [file, bitstream]

1.5.6.2 inhibitorTarget (O, R) [file, bitstream]

1.5.6.3 inhibitorKey (O, NR) [file, bitstream]

1.6 originalName (O, NR) [representation, file]

1.7 storage (M, R) [file, bitstream]

1.7.1 contentLocation (O, NR) [file, bitstream]

1.7.1.1 contentLocationType (M, NR) [file, bitstream]

1.7.1.2 contentLocationValue (M, NR) [file, bitstream]

1.7.2 storageMedium (O, NR) [file, bitstream]

1.8 environment (O, R)

1.8.1 environmentCharacteristic (O, NR)

1.8.2 environmentPurpose (O, R)

1.8.3 environmentNote (O, R)

1.8.4 dependency (O, R)

Page 21: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page21

1.8.4.1 dependencyName (O, R)

1.8.4.2 dependencyIdentifier (O, R)

1.8.4.2.1 dependencyIdentifierType (M, NR)

1.8.4.2.2 dependencyIdentifierValue (M, NR)

1.8.5 software (O, R)

1.8.5.1 swName (M, NR)

1.8.5.2 swVersion (O, NR)

1.8.5.3 swType (M, NR)

1.8.5.4 swOtherInformation (O, R)

1.8.5.5 swDependency (O, R)

1.8.6 hardware (O, R)

1.8.6.1 hwName (M, NR)

1.8.6.2 hwType (M, NR)

1.8.6.3 hwOtherInformation (O, R)

1.9 signatureInformation (O, R) [file, bitstream]

1.9.1 signature (O, R)

1.9.1.1 signatureEncoding (M, NR) [file, bitstream]

1.9.1.2 signer (O, NR) [file, bitstream]

1.9.1.3 signatureMethod (M, NR) [file, bitstream]

1.9.1.4 signatureValue (M, NR) [file, bitstream]

1.9.1.5 signatureValidationRules (M, NR) [file, bitstream]

1.9.1.6 signatureProperties (O, R) [file, bitstream]

1.9.1.7 keyInformation (O, NR) [file, bitstream]

1.10 relationship (O, R)

1.10.1 relationshipType (M, NR)

1.10.2 relationshipSubType (M, NR)

1.10.3 relatedObjectIdentification (M, R)

1.10.3.1 relatedObjectIdentifierType (M, NR)

1.10.3.2 relatedObjectIdentifierValue (M, NR)

1.10.3.3 relatedObjectSequence (O, NR)

1.10.4 relatedEventIdentification (O, R)

1.10.4.1 relatedEventIdentifierType (M, NR)

1.10.4.2 relatedEventIdentifierValue (M, NR)

1.10.4.3 relatedEventSequence (O, NR)

1.11 linkingEventIdentifier (O, R)

Page 22: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page22

Example of PREMIS implemented in the Hispanic Digital Library

<?xml version=”1.0” encoding=”UTF-8” ?> - <premis:premis version=»2.0» xmlns:xsi=»http://www.w3.org/2001/XMLSchema-instance» xmlns:xlink=»http://www.w3.org/1999/xlink» xmlns:premis=»info:lc/xmlns/premis-v2» xsi:schemaLocation=»info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premisv2- 0.xsd»> - <premis:object xsi:type=»premis:representation» xmlID=»VC_002307-006»> - <premis:objectIdentifier> <premis:objectIdentifierType>899$j</premis:objectIdentifierType> <premis:objectIdentifierValue>VC/2307/6</premis:objectIdentifierValue> </premis:objectIdentifier> - <premis:preservationLevel> <premis:preservationLevelValue>full</premis:preservationLevelValue> <premis:preservationLevelDateAssigned>20070529</premis:preservationLevelDateAssigned> </premis:preservationLevel> <premis:originalName>VC_002307-006</premis:originalName> </premis:object> - <premis:object xsi:type=»premis:file»> - <premis:objectIdentifier> <premis:objectIdentifierType>File</premis:objectIdentifierType> <premis:objectIdentifierValue>VC_002307-006_0001</premis:objectIdentifierValue> </premis:objectIdentifier> - <premis:preservationLevel> <premis:preservationLevelValue>full</premis:preservationLevelValue> <premis:preservationLevelDateAssigned>20070529</premis:preservationLevelDateAssigned> </premis:preservationLevel> - <premis:objectCharacteristics> <premis:compositionLevel>0</premis:compositionLevel> <premis:size>1234567</premis:size> - <premis:format> - <premis:formatDesignation> <premis:formatName>image/tiff</premis:formatName> <premis:formatVersion>6.0</premis:formatVersion> </premis:formatDesignation> </premis:format> - <premis:creatingApplication> <premis:creatingApplicationName>Omniscan</premis:creatingApplicationName> <premis:creatingApplicationVersion>11.0</premis:creatingApplicationVersion> <premis:dateCreatedByApplication>20090102</premis:dateCreatedByApplication> </premis:creatingApplication> - <premis:objectCharacteristicsExtension> - <mix:mix xmlns:mix=»http://www.loc.gov/mix/v20» xsi:schemaLocation=»http://www.loc.gov/mix/v20 http://www.loc.gov/standards/mix/mix20/mix20.xsd»> - <mix:BasicDigitalObjectInformation> <mix:byteOrder>big endian</mix:byteOrder> - <mix:Compression>

Page 23: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page23

<mix:compressionScheme>Uncompressed</mix:compressionScheme> </mix:Compression> </mix:BasicDigitalObjectInformation> - <mix:BasicImageInformation> - <mix:BasicImageCharacteristics> <mix:imageWidth>5530</mix:imageWidth> <mix:imageHeight>3210</mix:imageHeight> - <mix:PhotometricInterpretation> <mix:colorSpace>RGB</mix:colorSpace> </mix:PhotometricInterpretation> </mix:BasicImageCharacteristics> </mix:BasicImageInformation> - <mix:ImageCaptureMetadata> - <mix:ScannerCapture> <mix:scannerManufacturer>Zeutschel</mix:scannerManufacturer> - <mix:ScannerModel> <mix:scannerModelName>OS 10000-90 TT</mix:scannerModelName> <mix:scannerModelSerialNo>52008</mix:scannerModelSerialNo> </mix:ScannerModel> </mix:ScannerCapture> </mix:ImageCaptureMetadata> - <mix:ImageAssessmentMetadata> - <mix:ImageColorEncoding> - <mix:BitsPerSample> <mix:bitsPerSampleValue>8</mix:bitsPerSampleValue> </mix:BitsPerSample> <mix:samplesPerPixel>3</mix:samplesPerPixel> </mix:ImageColorEncoding> </mix:ImageAssessmentMetadata> </mix:mix> </premis:objectCharacteristicsExtension> </premis:objectCharacteristics> <premis:originalName>VC_002307-006_0001.tif</premis:originalName> - <premis:storage> - <premis:contentLocation> <premis:contentLocationType>filepath</premis:contentLocationType> <premis:contentLocationValue>VC_002307-006</premis:contentLocationValue> </premis:contentLocation> <premis:storageMedium>HD 001 Alta</premis:storageMedium> </premis:storage> - <premis:relationship> <premis:relationshipType>structural</premis:relationshipType> <premis:relationshipSubType>is included in</premis:relationshipSubType> - <premis:relatedObjectIdentification RelObjectXmlID=”VC_002307-006”> <premis:relatedObjectIdentifierType>899$j</premis:relatedObjectIdentifierType> <premis:relatedObjectIdentifierValue>VC/2307/6</premis:relatedObjectIdentifierValue> <premis:relatedObjectSequence>1</premis:relatedObjectSequence> </premis:relatedObjectIdentification> </premis:relationship> </premis:object> </premis:premis>

Page 24: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page24

7.3 Metadata Storage on the DOMS

DIGITOOL is the DOMS that is currently in use at the BNE. This application is designed to manage an institution's digital objects in an efficient and simple manner, putting special emphasis on the conservation and dissemination of these assets. It comprises seven modules, each one of which is designed to respond to the different needs, functions and workflows involved in a digital object life cycle.

The ingest module is used to upload both the objects and their associated metadata.

Digitool complies with the following standards:

o Z39.50 Protocol

o OAI-PMH

o Dublin Core

7.4 Metadata Export. OAI-PMH The OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting) is used for sending metadata over the Internet. It uses a client-server architecture, providing metadata in Dublin Core format that can then be harvested. Communication is via http protocol and the responses are encoded in XML.

At the BDH we use an OAI server http://bibliotecadigitalhispanica.bne.es/OAI-PUB for descriptive metadata “harvesting”. This can be done using OAI commandos or MEdit type programmes to harvest specific records, groups of records and OAI sets defined in the BDH.

Page 25: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

OAI-PMH for exporting metadata

8. TECHNOLOGICAL ENVIRONMENT

Generally speaking, the BNE has the following technological infrastructures:

o A digitisation room containing the scanners. o Bar code scanners for naming folders containing the images produced during

digitisation. This enables data to be uploaded to the Preservation System that is currently being created by the IT Co-ordination Unit using the univocal identifier ID-ITEM.

o Internet access for management and control of the workflow tool. o Licences available for Digitool (DOMS) administration. o Computers for uploading to Digitool (DOMS). o Storage servers for master and associated PREMIS metadata files. o Computers for storing the master files. o Master file control application.

8.1 Scanners

The scanners we use have all the technical features needed for scanning BNE holdings and in no way deteriorate the originals. The models used vary according to the type of document being digitised, and follow BNE technical guidelines.

The choice of scanner is based on a number of factors:

• Formats:

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page25

Page 26: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page26

1. Size: The formats used range from at 1/8th to over A1, taking into

account the percentage of documents. 2. Thickness/weight: The holdings contain copies of different thickness.

3. Fold-outs: When considering document format, fold-outs are an

additional factor to take into account, both in terms of handling the document and the size of the scanner.

• Document features:

4. Original in colour: Most manuscripts have some coloured motif of documentary interest that is also needed for study purposes. This means that the scanner must be able to accurately reproduce all the colours of the image. As far as colour is concerned, in the case of illuminated codices it is particularly difficult to copy gold tones. Copibook scanners are not advised for originals where colour is an essential characteristic.

5. Binding: Stiff bindings do not allow the book to be fully opened to 180º;

likewise, closed bindings may miss some information in the central part of the document. In either case, it is best to use a page by page scanner in order to reduce losses to a minimum and to open the document as flat as possible.

6. Material: A large percentage of manuscripts are written on parchment.

The specific features of this medium (corrugated sheets, information missing in the fold, rigidity, etc.) require special handling and the addition of pages that isolate the missing parts, and a scanner capable of focussing on different parts of the document is required.

• State of conservation:

7. Fragility of the material 8. Lack of information: Mutilations

• Diversity: The materials specific to Special Sections have different

characteristics which call for a wide range of scanning systems:

9. Parchment codices 10. Acid and friable paper 11. Engravings 12. Ancient bindings 13. Drawings 14. Photographs 15. Ephemera collection 16. Advertising posters and large formats in general

8.1.1 Scanner Types

Generally speaking, the scanners used for digitising the different materials conserved in the BNE are divided into:

Page 27: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page27

• A-Type Scanners: For digitising printed works in grey scale (18th to 19th centuries)

o CopibookHD600; i2s.

o Bookeye 3 –R2

o Book2net

o ScannTECH 602i-6 or 602i-3

• B-Type Scanners: For colour digitisation of hand-written or bound printed works, mainly illustrated, and of loose-sheet graphic holdings (photographs, posters, maps, ephemera collections…)

o Digibook Suprascan A1

o Book2net A1

• C-Type Scanners: For colour digitisation of works requiring particularly careful handling due to the kind of medium (codices, illuminated manuscripts, manuscripts where the inks have soaked through, iron gall inks, ancient bindings with metal elements…)

o Metis DRS5070

• D-Type Scanners: Which include digital cameras for photographic or unbound collections in average format, and for originals of great value that are particularly delicate and cannot be copied using a scanner.

o Nikon D700 (minimum quality)

o Nikon D3

o Sinar 75 digital backup (four shots)

The Robot scanner is also employed on works where, because of their physical features and state of conservation, it is possible to use mechanised methods without endangering the document.

Also, scanners which accept opening angles of 60-90º, for works requiring this type of handling.

Page 28: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page28

9. MASTER FILE TRANSFER

In order to prevent loss of information, master files are transferred to a series of storage units set up by the IT Co-ordination Unit. We thus obtain a reliable physical copy of the master and associated metadata files on the BNE systems.

To control this transfer, each digitisation is accompanied by an Excel spreadsheet that is part of an internal database containing the following data:

o Files: No. of files, JPEG or TIFF, the call number contains. o Mb: Total size of the folder. The weight must be given in Mb, with no thousand

separators and with a comma to indicate where the decimals begin. Neither is it necessary to indicate the unit, e.g. 30589,85

o Location: According to the server structure defined by the BNE … e.g. DM01/Batch1/1/085698

o Resolution: 300, 200, 100, etc. (depending on each image) without adding ppp or dpi, just the number. If this were to vary from image to image in a single call number, the most frequently used resolution is given.

o Format: TIFF. o Version: ORIGINAL, TRIMMED (as applicable) o Colour: The possible values in this field are:

B&W,

GREY,

COLOUR,

COLOUR RGB 8,

COLOUR RGB 16.

If the work has several features, we assign the one that predominates.

o Viewing: The possible values in this field are:

SIMPLE, for TIFFs trimmed to a single page

DOUBLE, for original double-page tiffs

o Start date: The start date of the phase in the digitisation project e.g. dd/mm/yyyy.

o End date: The end date of the phase in the digitisation project e.g. dd/mm/yyyy.

o Company: Name of the AD company.

o Machine: Scanner used by the company, e.g. Digibook Scanner Suprascan. If several machines have been used on one work (for instance, because of both colour and black and white images), the main scanner used is indicated.

o Software: Where possible, if not it is left blank e.g. i2s Digibook Scanner

Suprascan A0 10000 RGB.

Page 29: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

o Comments: The digitisation phase it belongs to, e.g. P4 Each name of each excel sheet follows the format: NOMBREFASE_discoNº_ddmmaaaa e.g.: F3_disc2_15152010

9.1 Server Structure

The structure of the IT Co-ordination Unit servers used for file transfer is as follows:

Either DM (for master files)

or DMD (for derivative files)

As the disk space is used up, additional resources are created and numbered: DM01; DM02; DM03…; DMD01, DMD02, DMD03.

10. SEARCH ENGINE

The main purpose of a search engine on a digitisation project is to make the process of browsing through immense volumes of digitised materials as simple and intuitive as possible, and to ensure that the best results are retrieved. The Hispanic Digital Library currently uses SOLR as its search engine, an open-source search platform that will allow the Library to extend and develop its own functionalities by using its source code. Using SOLR, the contents published on Digitool (DOMS) are indexed automatically and are visible on a personalised search interface.

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page29

Page 30: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

The application's simple search interface

The application's advanced search interface Through OAI, SOLR indexes both structured content (metadata) and destructured content (OCR). When used on the BDH, the search engine includes the following functions:

o Basic and concept search o Auto suggest for completing user queries (autofill) o Parametric search (browsing filters) o Hyperlinks (relationships between documents) o Query expansion o Highlighted context snippets

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page30

Page 31: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

SOLR architecture in the BDH

Basic architecture

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page31

Page 32: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page32

GLOSSARY OF TERMS AND ABBREVIATIONS

- ACDsee: Digital image editing software.

- BDH: Hispanic Digital Library

- Bits: A bit is the minimum unit of information used in computing. It is a binary

digit represented by two values: 0 or 1.

- BNE: Biblioteca Nacional de España

- UDC: Universal Decimal Classification

- Digitool: A Digital Object Management System for managing digital

collections, institutional libraries and multimedia holdings. It is a powerful

system which enables academic libraries and consortia to manage large

collections and provide access to digital resources. The tools it incorporates

enable users to control all aspects of digital object management: cataloguing,

filing, indexing, publication, preservation and copyright control.

- Dpi: A unit for measuring image resolution (quality) of a scanner, printer, etc.

It measures the resolution, this being the number of dots (pixels) in one square

inch.

- Dublin Core: A metadata vocabulary created by the DMCI (Dublin Core

Metadata Initiative), an organisation devoted to fostering widespread adoption

of interoperable standards and to promoting the development of specialised

metadata vocabularies for describing resources. It is the most popular

metadata system for describing electronic resources on the internet. It defines

a set of properties what may be used when describing a resource (whether or

not it is available in electronic format) to facilitate its retrieval.

- JPEG: An image format for storing and transmitting images on the net. Files of

this type have the extension .jpg. Their compression algorithm enables file size

to be reduced with no or no significant loss of image quality.

- MARC 21: An international standard traditionally used by libraries all over the

world for exchanging cartographic information, with modifications that enable

electronic resources to be described.

- Watermark: The method usually used for marking paper. A watermark is an

image formed by different thickness on a sheet of paper. It is used to avoid

document forgery, to show the authenticity of the origin of a document or

form, as decoration or to distinguish between different paper mills.

- Megabyte (MB): A unit for measuring the quantity of computer data. It is the

unit most frequently used at present, together with its multiple, the gigabyte,

Page 33: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page33

and it is used to specify RAM memory capacity, graphic card or CD-ROM

memories, or the size of programmes, large files, etc. Storage capacity is

usually measured in gigabytes, i.e., in thousands of megabytes.

- METS: METS is the name given to the xml file that contains the data from a

bibliographic record formed by several digital files (several PDFs or several

JPEGs).

- Metadata: Metadata are the set of data associated with digital objects, the

purpose of which is to facilitate digital collection description, search, use and

management. They are tools that specify the contextual information associated

with each document: Their contents, the record of the transformations each

digital object undergoes, the specifications of the hardware necessary for

building the emulators, the formats of each file, the programmes permitting

access to each record.

- OAI-PMH: The OAI-PMH (Open Archives Initiative-Protocol Metadata

Harvesting), is an application-independent interoperability tool which allows

information to be exchanged so that searches can be performed on information

compiled in the different associated repositories from different service

providers. The metadata to be transmitted via OAI-PMH must be encoded in

Dublin Core with no rating in order to minimise the problems that may result

from conversions between multiple formats.

- OCR: Stands for Optical Character Recognition. It is a technology for scanning

and recognising characters on any type of document. OCR (Optical character recognition) software quickly and accurately transforms

this information into electronic format. It not only captures and scans the data

contained in the document, but it also stores them in a file or database in a

recognisable format that can be retrieved for use on other applications.

OCR technology enables us to scan our documents and manage them

electronically in a flexible and secure way.

Information can be captured manually from documents or images using a

device, such as a scanner, which incorporates this functionality.

- PDF (Portable Document Format): A document storage format developed

by Adobe Systems, which is particularly suitable for complex documents

(multiple pages, combination of texts and images of different qualities). Some

of the advantages of this format include tools for browsing a single document

and several different documents, a reliable and secure digital copy, options for

searching and retrieving content, including inclusion in search engines.

- PhotoShop: Professional standard image editing software.

Page 34: The Digitisation Process at the Biblioteca Nacional de España · 2013. 3. 26. · These documents represent the BNE's contribution to Europeana, the European Digital Library, the

The Digitisation Process at the Biblioteca Nacional de España

http://bdh.bne.es/bnesearch Last update 26/03/2013 Page34

- PREMIS: Preservation metadata, which contain the information used by a

repository during the digital preservation process.

- DOMS: Digital Object Management System.

- Simplex: Simplex is the name given to the xml file that contains the data from

a bibliographic record formed by a single digital file (PDF or JPEG).

- TIFF (Tagged Image File Format): A file format for tagged images. This is

because in addition to the actual image data, TIFF files contain "tags" which

contain information about the characteristics of the image and are used later

when processing the image. This format is generally used for creating high

quality images. It creates large files, with no loss, that can be used as master

files but are unsuitable for distribution and public access to collections.

- UNICORN: The Integrated Library Management System used by different

university libraries. It is currently in use at the BNE.

Please address your queries or suggestions to: [email protected]