Proposals for a new flexible and extensible XML-model for exchange of research information By Jens...
-
Upload
maurice-bridges -
Category
Documents
-
view
215 -
download
0
Transcript of Proposals for a new flexible and extensible XML-model for exchange of research information By Jens...
![Page 1: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/1.jpg)
Proposals for a new flexible and extensible XML-model for exchange of research information
By
Jens Vindvad, [email protected]
National Office for Research Documentation, Academic and Special Libraries, Norway
Erlend Øverby, [email protected]
Conduct AS, Oslo, Norway
![Page 2: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/2.jpg)
29 August 2002 CRIS2002 2
RBT
www.rbt.no
Content of presentation
Assumptions, Objectives, Difference in structure and mapping between structures.
Validation and MicroSchema Working model (vocabulary, internal structure) Namespace Testing of model
![Page 3: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/3.jpg)
29 August 2002 CRIS2002 3
RBT
www.rbt.no
3 assumptions and 1 observation
Internet is the driving force and preferred medium for international information exchange.
Today and in the near future XML is the basic Internet standard for information exchange.
Research Information Systems are based on relation database technology for storing.
Validation of data exchange against structure and allowed values are “often” scarified.
![Page 4: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/4.jpg)
29 August 2002 CRIS2002 4
RBT
www.rbt.no
Objectives
We want to exchange Research Information between different CRIS-systems, and other systems as well, by use of Internet and XML-technology.
We want to have the possibility to validate the exchanged data according to defined structure and allowed values.
Agreeing upon a common XML exchange model, which can be used for information exchange, can help to achieve this objective.
![Page 5: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/5.jpg)
29 August 2002 CRIS2002 5
RBT
www.rbt.no
How should the exchange model work
To be able to exchange information, the existing data needs to be transformed/mapped into the structure of the exchange model. To receive information, data in the exchange model has to be transformed/mapped into the receiver’s data model. This will ease information exchange when sender and receiver do not share or have the same data model, and will also ease data exchange between different communities.
![Page 6: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/6.jpg)
29 August 2002 CRIS2002 6
RBT
www.rbt.no
Different structures for CRIS and the exchange model
The CERIF standard is based on relational database technology, which is table oriented.
The exchange model is based on the XML standard by W3C, which has a hierarchical tree structure.
![Page 7: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/7.jpg)
29 August 2002 CRIS2002 7
RBT
www.rbt.no
Exchange between table structure and hierarchical tree structure illustrated
Table structure
Relational databases
Hierarchical tree structure
XML-document
![Page 8: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/8.jpg)
29 August 2002 CRIS2002 8
RBT
www.rbt.no
Information mapping between the exchange model and CRIS (I)
When mapping information between different structures we do not have a one to one solution. We have to make some choices when defining the exchange model, and no final answer or perfect solution exists.
When building the exchange model we have used the following guidelines:
Data should be placed in elements, not attributes.
![Page 9: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/9.jpg)
29 August 2002 CRIS2002 9
RBT
www.rbt.no
Information mapping between the exchange model and CRIS (II)
Characteristics and properties of data should be placed in attributes.
Be as explicit at possible, not implicit. The implicit path can lead to situations where users of the model have to make assumptions about how to handle the information in the model.
![Page 10: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/10.jpg)
29 August 2002 CRIS2002 10
RBT
www.rbt.no
The validation problem
Normally two alternatives exist to describe and define the information structure or model in an XML document, the first is a DTD (ISO 8879) and the second is an XML-schema. Both these approaches currently have the disadvantages that in order to validate and check the structure of the information, description of the whole structure and all its possibilities and constraints must be in existence in one large and inflexible model, making it harder to establish an efficient validation of data exchange between different systems
![Page 11: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/11.jpg)
29 August 2002 CRIS2002 11
RBT
www.rbt.no
MicroSchema
The idea of a MicroSchema is that it should only describe a very small piece of information, and only such information as is relevant to the specific description. Information that is not relevant to the specific context is described in another MicroSchema.
To be able to express the relevance and the connection between the MicroSchemas, we need to develop a standard method of enhancing the schema specification in order to address the valid elements in the specific context. Using namespaces, introducing the term "Allow-schema-namespaces", will do this.
![Page 12: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/12.jpg)
29 August 2002 CRIS2002 12
RBT
www.rbt.no
Working model
A working model limited to documentation produced by researchers has been build.
Successful communication requires common wordings and definitions – to achieve this a vocabulary has been defined.
With vocabulary and guidelines for building an exchange model in place an internal structure has been established.
Based on vocabulary and internal structure the XML exchange model is proposed in terms of MicroSchemas and Namespaces.
![Page 13: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/13.jpg)
29 August 2002 CRIS2002 13
RBT
www.rbt.no
Vocabulary – output and results
The collection of all types of information produced by the researchers is called output. Outputs are divided into four subgroups: results, communication, documentation and art.
With the exception of art, results are taken to mean the results of research produced by the researcher in person. Examples of results are: publications, patents and products.
![Page 14: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/14.jpg)
29 August 2002 CRIS2002 14
RBT
www.rbt.no
Vocabulary - communication
By communication we want to label the forms of communication that researchers use in their work. Researchers often need to or whish to discuss their ideas and views; their form for communication is not a result of their work, but represents interesting and important steps in the process of producing results. Examples of communications are: conference presentations, workshops, broadcasting and interview in the press.
![Page 15: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/15.jpg)
29 August 2002 CRIS2002 15
RBT
www.rbt.no
Vocabulary - documentation
A researcher has to carry out administrative tasks and produce documentation, which cannot be classified as results or forms of communication. This can be pure administration or high level of professional work. Examples are: reports to founding institution, computer programs, manuscripts, (thesis).
![Page 16: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/16.jpg)
29 August 2002 CRIS2002 16
RBT
www.rbt.no
Vocabulary - art
Art is not necessary output of a research’s work but it can be. Art can be seen as a result in itself, a form of communication or type of documentation, or all of these. Art needs and deserves a classification based on standards used and accepted in the art community. Examples of art are: works of art, exhibitions and performances.
![Page 17: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/17.jpg)
29 August 2002 CRIS2002 17
RBT
www.rbt.no
Vocabulary – publication and five-point test.
Publication is a commonly used word, which in daily use does not have a precise and distinct definition. To establish a vocabulary and a namespace, we need the word “publication” and have to give it a precise and distinct definition. To do this we have established a five-point test, which involve: addressee, copies, location, readability and time. The test must be taken in the following order: test against publication, then communication and finally against documentation.
![Page 18: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/18.jpg)
29 August 2002 CRIS2002 18
RBT
www.rbt.no
Internal structure (I)
Based on vocabulary and guidelines for building an exchange model an internal structure has been established.
The basic elements of the model are HEAD, contents and EXTENSIONS. The core of the model is content; HEAD and EXTENSIONS can be left out.
![Page 19: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/19.jpg)
29 August 2002 CRIS2002 19
RBT
www.rbt.no
Internal structure - HEAD
Each schema can consist of one and only one HEAD. HEAD can be left out. In HEAD, all administrative data should be placed. Administrative data such as when the information object was created, by whom, and who revised or edited the information, should be placed in HEAD. In case of transaction between systems, all transactional administration data should be placed in HEAD. HEAD may also contain EXTENSIONS.
![Page 20: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/20.jpg)
29 August 2002 CRIS2002 20
RBT
www.rbt.no
Internal structure - content
Contents make up the core elements, on which the model is built. All the content elements could have been put into one element e.g. BODY. This is not necessary when we know that all elements, which are not HEAD or EXTENSIONS are part of the core model.
![Page 21: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/21.jpg)
29 August 2002 CRIS2002 21
RBT
www.rbt.no
Internal structure - EXTENSIONS
The elements in the basic model should be understood and managed by all who want to exchange information. A basic model with a high degree of certainty will not satisfy all needs. To accomplish these needs the model is made extensible. With this construction, everybody can easily see what is part of the core model and what belongs to a specific extension. All that make use of an extension will have to supply a working namespace for the extension.
![Page 22: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/22.jpg)
29 August 2002 CRIS2002 22
RBT
www.rbt.no
Example: MicroSchema ArticleInJournal
General identifier Occurrence Content model
HEAD Zero or one mSchema HEAD
TitleInfo One mSchema TitleInfo
Author One or more mSchema Person
mSchema OrgUnit
RefInToJournal One mSchema RefInToJournal
URI Zero or one Uri
Abstract Zero or more Text
EXTENSIONS Zero or one mSchema EXTENSIONS
![Page 23: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/23.jpg)
29 August 2002 CRIS2002 23
RBT
www.rbt.no
Namespace - examples
Element name NS-abbr.
NS-Uri
Output out root/Outputs.msc
Results res root/output/Results.msc
Publications pub root/output/results/Publications.msc
Journal jour root/output/results/publications/Journal.msc
ArticleInJournal aij root/output/results/publications/ArticleInJournal.msc
Person pers root/level1/Person.msc
OrgUnit org root/level1/OrgUnit.msc
HEAD HEAD root/misc/HEAD.msc
TitleInfo ti root/output/TitleInfo.msc
![Page 24: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/24.jpg)
29 August 2002 CRIS2002 24
RBT
www.rbt.no
Test of exchange model
We have so far tested the model against CRIS data from BIBSYS FORSKDOK and data from the library system BIBSYS.
To perform the tests we have developed two XSLT program (XSL-stylesheets), which maps/ transform the input data into the proposed exchange model.
![Page 25: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/25.jpg)
29 August 2002 CRIS2002 25
RBT
www.rbt.no
Test example: Input (I)
<publikasjon> <f001> <f001b>r00015557</f001b> <f001d>A12</f001d> <f001i>f</f001i> <f001j>FO02RBRU</f001j> <f001n>2000-04-03</f001n> <f001o>2000-04-03</f001o> </f001> <f008> <f008c>eng</f008c> </f008> <f020> </f020> <f022> </f022>
<f100> <f100a>Heimdal, J-H.</f100a> <f100b>02013300</f100b><f100a>Aarstad, H.J.</f100a> <f100b>02013300</f100b> <f100a>Olofsson, J.</f100a> <f100b>02013300</f100b> </f100> <f245> <f245a>Peripheral Blood T-Lymphocyte and Monocyte Function and Survival in Patients with Head and Neck Carcinoma.</f245a> </f245> <f260> </f260>
![Page 26: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/26.jpg)
29 August 2002 CRIS2002 26
RBT
www.rbt.no
Test example: Input (II)
<f300>
<f300a>402 - 407</f300a>
</f300>
<f507>
</f507>
<f509>
<f509a>Laryngocope</f509a>
<f509c>2000</f509c>
<f509f>110</f509f>
<f509h>3</f509h>
<f509x>0023-852X</f509x>
</f509>
</publikasjon>
![Page 27: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/27.jpg)
29 August 2002 CRIS2002 27
RBT
www.rbt.no
Test example: Output (I)
<out:Results xmlns:res="http://www.rbt.no/xmlns/cerif/output/Results.msc">
<res:Publications xmlns:pub="http://www.rbt.no/xmlns/cerif/output/ results/Publications.msc">
<pub:ArticleInJournal xmlns:aij="http://www.rbt.no/xmlns/cerif/output/ results/publications/ArticleInJournal.msc">
<aij:HEAD xmlns:HEAD="http://www.rbt.no/xmlns/cerif/ misc/HEAD.msc">
<HEAD:SourceName>BIBSYS FORSKDOK</HEAD:SourceName>
<HEAD:IdNumber>r00015557</HEAD:IdNumber>
<HEAD:ClassificationCode>A12</HEAD:ClassificationCode>
<HEAD:Description>Artikkel i internasjonalt vit. tidsskrift uten referee</HEAD:Description>
<HEAD:Created>2000-04-03</HEAD:Created>
<HEAD:Updated>2000-04-03</HEAD:Updated>
</aij:HEAD>
![Page 28: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/28.jpg)
29 August 2002 CRIS2002 28
RBT
www.rbt.no
Test example: Output (II)
<aij:TitleInfo xmlns:ti="http://www.rbt.no/xmlns/cerif/output/TitleInfo.msc">
<ti:MainTitle Language="eng">
<ti:Title>Peripheral Blood T-Lymphocyte and Monocyte Function and Survival in Patients with Head and Neck Carcinoma.</ti:Title>
</ti:MainTitle>
</aij:TitleInfo>
![Page 29: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/29.jpg)
29 August 2002 CRIS2002 29
RBT
www.rbt.no
Test example: Output (III)
<aij:Author xmlns:pers="http://www.rbt.no/xmlns/cerif/level1/Person.msc">
<pers:Person>
<pers:FamilyNames>Heimdal</pers:FamilyNames>
<pers:FirstNames>J-H.</pers:FirstNames>
</pers:Person>
<pers:Person>
<pers:FamilyNames>Aarstad</pers:FamilyNames>
<pers:FirstNames>H.J.</pers:FirstNames>
</pers:Person>
<pers:Person>
<pers:FamilyNames>Olofsson</pers:FamilyNames>
<pers:FirstNames>J.</pers:FirstNames>
</pers:Person>
</aij:Author>
![Page 30: Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, Jens.Vindvad@rbt.no Jens.Vindvad@rbt.no National.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649efa5503460f94c0c347/html5/thumbnails/30.jpg)
29 August 2002 CRIS2002 30
RBT
www.rbt.no
Test example: Output (IV)
<aij:RefInToJournal xmlns:RIn2J="http://www.rbt.no/xmlns/cerif/ output/results/publications/journal/RefInToJournal.msc"> <RIn2J:Journal xmlns:jour="http://www.rbt.no/xmlns/cerif/ output/results/publications/Journal.msc"> <jour:TitleInfo xmlns:ti="http://www.rbt.no/xmlns/cerif/output/TitleInfo.msc"> <ti:MainTitle Language="eng"> <ti:Title>Laryngocope</ti:Title> </ti:MainTitle> </jour:TitleInfo> <jour:ISSN>0023-852X</jour:ISSN> </RIn2J:Journal> <RIn2J:PublishingYear>2000</RIn2J:PublishingYear> <RIn2J:Volum>110</RIn2J:Volum> <RIn2J:Issue>3</RIn2J:Issue> <RIn2J:Pages>402 - 407</RIn2J:Pages> </aij:RefInToJournal></pub:ArticleInJournal></res:Publications></out:Results>