Digital Object Identifier workshop doi> Norman Paskin The International DOI Foundation.
-
Upload
evelyn-strickland -
Category
Documents
-
view
230 -
download
2
Transcript of Digital Object Identifier workshop doi> Norman Paskin The International DOI Foundation.
Digital Object Identifier workshop
doi>
Norman Paskin The International DOI Foundation
• Background: why DOI• What the DOI system consists of• What DOI does
DOI - outline of talk
• Identifiers enable us to manage content• Physical world: ISBN, ISSN, ISMN, SICI, etc
• good systems for publishers• Digital world: ? URL?
•poor systems for publishers (e.g. E Books)•how to use existing identifier systems?
• Make WWW transactions as invisible as telephone transactions– machine to machine, – not machine to people to machine
Background - why now?
Digital world enables both use and protection• Aim is to maximise value of information
objects: - reduce copy infringement and - increase accessibility; - need to identify what it is you are managing
• Mass production mass customisation - components must be clearly identifiable - and terms defined
The intellectual property background
• International DOI Foundation: founded 1998 – following demonstration of prototype in 1997
• Not-for-profit; paid membership support– similar principles to World Wide Web Consortium
• Open to all interested parties• Democratic: board elected from members• Full time staff (Director)• 40+ organisations (growing)
– Content owners (text publishers, music, etc )– Technology companies– Content intermediaries (etc)
DOI - organisation
• Establish a way of identifying content in the digital environment– actionable identifier
• Which can be the basis of rights management– extensible; can be developed further
DOI: aim
• Identification of content - intellectual property in any form - precisely• Actionable identification - automation; “click to do something”
- services • Interoperability, extensibility
• Open standard
DOI requirements
• Must be consistent• Must be extensible:
• technology: changes – e.g. PC netC P2P …?; E-books; WAP
• multimedia: needed – e.g. music clip and image in E-Book with web update (“media convergence”)
• applications: cannot be known in advance
Key issues:
ActivitytrackingActivitytracking
Full implementation
Full implementation
Initial implementation
Initial implementation
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, UDDI etc.Multiple resolution
A continuing development activity
DOI: development in three tracks
DOI: components
• An analogy: the telephone system
• A number (or “name”)– assign a number to something– (compare: telephone number)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
• An action – make the number do something – (compare: the telephone
system)
DOI: components
• A number (or “name”)– assign a number to something– (compare: telephone number)
• A description– what the number is assigned to– (compare: directory entry)
• An action – make the number do something – (compare: the telephone system)
• Policies– how to get a phone number; billing – (compare: social structures)
DOI: components
Deployment POLICIES
Syntax 10.1234/5678
NUMBERING
DESCRIPTION
MetadataPieces of data which describe uniquely that which is identified
ResolutionSystem able to link the number to somethinguseful
ACTION
POLICIES
Any form of identifier
NUMBERING
DESCRIPTION
<indecs> framework:DOI can describe any form of intellectual property, at any level of granularity
ACTION
Handle resolution allows a DOI to link to any and multiple piecesof current data
doi>extensible
• DOI syntax: how the number is made up - NISO standard (Z39.84) - 10.1000/12345
•10.1000 = prefix (e.g. publisher, journal, etc)•12345 = suffix (combination is unique)
• Suffix can be anything (CrossRef example)• An opaque string (“a dumb number”)
– parts do not have separate meaning• Permanent
– stays the same if ownership or location changes
1. Numbering
• “What is numbered?”• Not as simple as you might think:1. Not only digital files, but physical
things and intangible things.2. Not only things, but parts of things.
2. Description
Manuscriptmss #ABC123
Not only digital things...
paper journal/volume/pageISBN, ISSN, etc.
MS
Vol/page; ISBN; SICI, etc
URL“intangible abstraction”
“intangible abstraction”
ISTC?
• Components• Book
– Chapter• Section
– Figure
Not only things, but parts of things
• Components• Book
– Chapter•Section
–Figure• “Granularity”
Not only things, but parts of things
• Components• Book
– Chapter•Section
–Figure• “Granularity”• Must be able to identify at whatever level
is appropriate : functional granularity
Not only things, but parts of things
• Metadata is: Data• Relationships between data - Book: ISBN 0864426437 (data) - Price: $12.95 (metadata) - Subject: Buenos Aires (metadata)• One man’s metadata is another man’s
data
Description is by metadata
• Not sufficient to assign an identifier without specifying precisely what the entity is– “ a paper” or “a book” is not precise– must be precise, because:
• In an automated world, that specification must be by metadata (able to be used by machines)
• In an interoperable world, that metadata must be– unambiguous (“well-formed”)– follow a data model(able to be used consistently by machines)
Description is by metadata
Interoperability of data in e-commerce systems• Broad in scope: generic intellectual property
management– description, transaction, rights
• Based on tested “real world” models– CIS (music industry); IFLA (library cataloguing)
• Wide endorsement of this approach– see recent papers Lagoze, Caplan (links at
www.doi.org)• Now in use in applications
– note especially EPICS/ONIX dictionary• Extensible, structured, open standard
DOI used indecs framework
• A few (7-8) key pieces of data– title, type of content, origin, etc– varies according to what is needed (video, book, etc)
• about the object– does not include rights metadata
• but interoperates with rights data– because based on same data model– uses the same terms to mean the same thing
• DOI “Genre” defines key metadata for a family– see DOI Handbook
DOI kernel metadata
Web Browser
User
etc.
Actionable identifier
Specified Action
doi>
10.1000/123
3. Actions
• I have found what I want to link to, but:– I have a copy locally; or– I use an aggregator; or– The publisher provides alternative
sources; or– I am linked to an authorised E-print
archive; or– It is available in a public archive (etc)
• so I want to go to the “appropriate copy” – rights issues (access control) are
implicit
Example issue: getting the appropriate copy
• Open Standard using internet • Distributed, scalable, fast and reliable• In use now in several places (e.g. Lib. of
Congress) • Very simple concept, powerful applications• Fits with other standards (URL, URN, etc) • Associates a name with “values” (e.g. URL)
– input DOI– output URL (or some other defined value)
• Work by CNRI (Robert Kahn)
DOI uses Handle System®
Global Handle System
Web Browser
Local Client www.pub.com
DOI?
URLabc
abc.doc
3
Handle dataDOI Data type Index
10.123/456 URL http://srv1.pub.com/.....3
URL http://srv2.pub.com/.....2
9URL http://srv3.pub.com/.....59MD http://lu.cr.com/10.123..10
999EM [email protected]
9IP 10.456/7894
Background: DOIs resolve to Typed Data
DOI Handle data
3
Handle dataDOI Data type Index
10.123/456 URL http://srv1.pub.com/.....3
URL http://srv2.pub.com/.....2
9URL http://srv3.pub.com/.....59MD http://lu.cr.com/10.123..10
999EM [email protected]
9IP 10.456/7894
DOIs resolve to Typed Data
Multiple typed values per DOI
3
Handle dataDOI Data type Index
10.123/456 URL http://srv1.pub.com/.....3
URL http://srv2.pub.com/.....2
9URL http://srv3.pub.com/.....59MD http://lu.cr.com/10.123..10
999EM [email protected]
9IP 10.456/7894
DOIs resolve to Typed Data
Extensible typing
3
Handle dataDOI Data type Index
10.123/456 URL http://srv1.pub.com/.....3
URL http://srv2.pub.com/.....2
9URL http://srv3.pub.com/.....59MD http://lu.cr.com/10.123..10
999EM [email protected]
9IP 10.456/7894
DOIs resolve to Typed Data
Query by type
etc.
For convenience we re-draw like this:
URL
URL2
RAP
XYZ
doi>
10.1000/123
INPUT OUTPUT
• DOI free to use– costs paid by assigner
• DOI applies to any Intellectual Property entity – copyright focus (Berne/WCT etc)
• Registration agencies to deal with assigning DOIs (and metadata/resolution) for publishers etc
• Business models determined by agencies• Policies for agencies are now evolving
4. Policies
Digital Object Identifier• A unique persistentidentifier…. - of a piece of intellectual property - in any form (tangible, intangible) - defined by some key metadata - an opaque string e.g.
DOI:10.1000/123
What is DOI?
• “resolvable..”
- routing, via proven internet technology,
• “to associated state data”…. - one or more current values of specified types of data (e.g. URL); - these data may be, or link to,
services
What is DOI?
• “in an information management substrate…”
- once the (meta)data has been obtained, it can interoperate with other data
- e.g. about context (subscription etc) - to construct services and transactions - because (meta)data follows a generic
interoperable architecture
What is DOI?
“A unique resolvable identifier and multiple pieces of associated state data in an information management substrate” achieved by:
• Technical implementation + policies• Two underlying technical tools:
1. intellectual property: <indecs> framework
2. resolution: Handle System
What is DOI?
1. Identify the item of intellectual property• not its location, because:• if the location changes the identifier should
stay the same (persistence)• the same “resource” can be at several
locations at the same time (“multiple copies”)
DOI does this
What are the advantages of DOI?
2. Able to deal with relationships:– “this item is a manifestation of that
work”– “this item is a part of that item”
DOI does this:• Metadata can express relationships
– “is part of…” etc • DOIs can resolve to other DOIs
What are the advantages of DOI?
3. Apply to any intellectual property entity– any format (digital convergence)– any granularity (any part of something)
4. Enable complex actions – can express relationships between
entities– interact with data from other sources – enables services (automated,
predictable) to be constructed
What are the advantages of DOI?
5. Extensible• resolution system has capability for
trusted transactions (p.k.i.)• metadata framework has capability for
full rights management architecture6. Not limited to current environments• not just the Web (other Internet
applications)• not just digital (intangibles etc)
What are the advantages of DOI?
Web Browser
User
URL
“404 not found”
1. URL is not a persistent identifier - it refers to Location, not content
URL
?
2. Same content at two different URLs has two different identifiers - cannot use as common reference
“...has moved to…”
“One in five Web links >1yr old may be out of date” (Alta Vista)
Identifiers on the web
Web Browser
User
URL
1. Don’t change the URL; “persistence is a social, not a technology, problem”
People do change URLs There are good reasons to change URLs Does not deal with multiple copies
Identifiers on the web
URLWeb Browser
User
URL
2. Assign a Name and use http redirect
name
http Bookmarks and caches save the end point, not the name (in current browsers)
does not deal with multiple copies
Identifiers on the web
URLWeb Browser
User
3. Assign a Name and use resolver
doi>
DOI provides name
URL Multiple resolution
Identifiers on the web
Web Browser
User
URL
Resolution
1. DOI is a persistent identifier
DOI initial implementation
2. DOI identifies the content, irrespective of the location
doi>
10.1000/123
Web Browser
User
etc.
URLURL
URL2
Data 1
Data 2Actionable identifier
Multiple Resolution
Full DOI implementation
Identifier resolves to any piece of data
doi>
10.1000/123
Web Browser
User
etc.
URLURL
URL2
Data 1
Data 2Actionable identifier
Resolutionservice
Specified Action
doi>
10.1000/123
Service 1 @ 10.1000/123
Digital Object Identifier workshop
doi>
Norman Paskin The International DOI Foundation