Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University...
-
Upload
cordelia-houston -
Category
Documents
-
view
216 -
download
0
Transcript of Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University...
![Page 1: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/1.jpg)
Metadata and identifiers for e-journals
Copenhagen 13.-14.3.2000
Juha Hakala
Helsinki University Library
![Page 2: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/2.jpg)
Contents
• Introduction
• Traditional cataloguing
• Full-text indexing
• Embedded metadata + Dublin Core
• DIEPER choices
• Identification of e-journals
![Page 3: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/3.jpg)
Introduction
• Metadata = structured description of resource• Structure of metadata is defined in a format
– simple formats (AltaVista)– complex formats (MARC)– structured formats (Dublin Core)
• Choices have important cost and quality implications (good is not free)
![Page 4: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/4.jpg)
Traditional cataloguing
• Routinely done for journals (ISSN DB)
• Articles indexed only selectively– Finnish article index Arto: 1100 journals;
65000 articles + 10 man years annually, 40 libraries co-operate in production
• Extending MARC cataloguing to all digitised articles is too expensive
• Any selection criteria for “good material”?
![Page 5: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/5.jpg)
Full-text indexing
• Will not replace cataloguing...– In large databases precision still bad
• ...but we should follow what is happening– RDBMS become document-literate (Oracle
Intermedia)– new search techniques (e.g. fuzzy searching)– efficient use of language technologies– knowledge management
![Page 6: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/6.jpg)
Embedded metadata (1)
• Three issues to solve: – semantics: in which metadata format should my
metadata be?– syntax: is it possible / feasible to embed
metadata into this document (does the document format allow inclusion of metadata)
– once topics 1 & 2 have been solved: are there tools for creating / harvesting / indexing my metadata?
![Page 7: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/7.jpg)
Embedded metadata - syntax
• It must be possible to include metadata in non-compromised form & specify each data element separately
• Most document formats do not allow efficient metadata usage– “flat files”, image formats, Word97
• “This is Dublin Core identifier element, and there is an ISBN in it”
![Page 8: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/8.jpg)
Embedded metadata - syntax (2)
• HTML 4.0– META tag enables sophisticated metadata – Explicit specification for how to embed Dublin
Core -based metadata (RFC 2731)
• XML/RDF– “Resource Description Framework makes data
machine understandable”– very versatile, but may be tough to implement
![Page 9: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/9.jpg)
Embedded metadata - semantics
• Metadata formats tend to be domain specific, complex and hard to learn
• Dublin Core as an alternative:– simple (in its basic form)– generic (no domain dependency)– extensible (local elements possible)
• Is there any competition left?
![Page 10: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/10.jpg)
Status of Dublin Core Initiative
• maintenance in reliable hands
• 15 elements stable (DC 1.1)
• syntax for HTML 4.0 stable
• core qualifiers under development– proposals published in December -99– agreement in DC-AC in March 2000– will result to 50-60 qualifiers
![Page 11: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/11.jpg)
Tools for Dublin Core
• Metadata support in Web indexes becoming more popular
• Metadata creation emerging in document management systems
• Text editors: XML support in place, RDF yet to come
![Page 12: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/12.jpg)
DIEPER choices
• Document format will be XML/RDF– extensible and open document format that will
become very popular in the future
• Metadata format will be based on DC– DC tags: Identifier, Title, Creator, Contributor,
Publisher, Language, Subject– Local tags: e.g. SerialsNumbering,
PlaceOfPublication, SizeSourcePrint
![Page 13: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/13.jpg)
Identifiers for e-journals
• Two different issues:– how to identify journals themselves– how to identify articles and possibly sections of
articles (table of contents etc.)
• Do we need resolution mechanism (based on DOI or URN)
![Page 14: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/14.jpg)
E-journals
• ISSN must be used, also for digitised journals– digitised version may have the same ISSN than
the original paper version
• ISSN should not be embedded on issues / articles, since this enhances recall too much
• Broadened scope: serials + integrating resources
![Page 15: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/15.jpg)
Issues & articles
• SICI (Serial Item and Contribution Identifier) should be used
• ANSI/NISO standard (1996)– http://sunsite.berkeley.edu/SICI/
• Not widely supported yet; e-commerce is likely to change this– need to identify whatever that can be sold
• SICI generator available
![Page 16: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/16.jpg)
Properties of SICI
• Extensible: can identify issue/article/section within article
• Can be created automatically (from structured source document)
• Complex– 0002-8231(1929)30:1<ZBDMSU>2.0.CO;2-Z
• Can be used as URN or DOI
![Page 17: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/17.jpg)
URN & DOI
• Umbrella systems that provide e.g. persistent linkage between a reference and the resource via a resolution service
• DOI is a publisher-driven initiative, URN comes from the Internet community
• DOIs can be used as URNs, not vice versa
![Page 18: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/18.jpg)
Digital object identifier
• Consist of prefix and suffix, separated by a slash– 10.1045/february2000-risher
• Suffix may be anything, there is no hint on its content
• Prefix identifies the publisher + indicates where to find a resolution service
![Page 19: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/19.jpg)
Uniform resource name
• Consists of three parts:– string urn:– Namespace identifier (NID)– Namespace specific string (NSS)
• When NID is known, creating URNs from existing identifiers is trivially easy
• No hint on where to find resolution service
![Page 20: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/20.jpg)
Business models
• DOI: annual payment for each DOI assigned– no decision yet on the size of the payment– flat fee for publisher ID
• URN: no price at all– but someone has to pay for the resolution
services
![Page 21: Metadata and identifiers for e- journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi.](https://reader030.fdocuments.net/reader030/viewer/2022032706/56649de65503460f94ade2dd/html5/thumbnails/21.jpg)
DIEPER policy
• URNs will be used, in order to enable URN-based resolution services
• ISSN/SICI will be used
• ISSN International Centre will assist in creation of URN resolution services– ISSN database will be contacted first, in order
to get the address of the resolution service