LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon...
Transcript of LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon...
LoG: A Methodology for Metadata Registry-based Management of
Scientific Data
July 5, 2002
Doo-Kwon Baik
July 5, 2002 CODATA/DSAO 2002 2
Content
Motivation Objectives Related works
Overview on the MDR
The scientific data properties User levels and the data property Data visibility The conceptual model of the LoG A LoG Framework An Example Conclusions and Future work
July 5, 2002 CODATA/DSAO 2002 3
Motivation
The existing data integration approachesjust focus on the technical researches and
system developments
not consider the properties of the domain
knowledge
July 5, 2002 CODATA/DSAO 2002 4
The Domain Knowledge
The domain knowledge property is a very important factor in data integration Many works and services depends on the domain knowledge
properties• The quality degree and the quantity scope in data integration are
defined depending on the domain knowledge property.• Many other services such as data services and application services
depend on it.
Domainknowledge
the quality degreeof data integration
the quantity scopeof data integration
data services(information providing)
applicationservices
July 5, 2002 CODATA/DSAO 2002 5
Objectives
The objectives of our research to solve the problems of the existing data integration approaches to analyze and define the domain knowledge properties
• In this paper, we focus on the scientific data.
to define relationship among the domain knowledge properties, users and metadata
• i.e., define the considerations for data integration.
to create a new methodology considering the results of domain knowledge analysis
• we called it as LoG (Localization-based Global MDR methodology).
finally to design a framework which is suitable for the methodology.
July 5, 2002 CODATA/DSAO 2002 6
Related works: Bottom-up approach(1/2) The existing data integration approaches are classified into the
top-down approach and the bottom-up approach Bottom-up approach
is the most general approach The ontology-based methodology is representative
Design and create a guideline such as a global viewfrom the specified databases
new databases(the number of them = c)
Analyze all factual databases(the number of databases = n)
the number of databases = n + c
July 5, 2002 CODATA/DSAO 2002 7
Related works: Bottom-up approach(2/2) Advantages
can reach the perfect data integration because we use a global guideline which is created through analysis and design about all databases
Disadvantages the creation of a global guideline spends many costs and time is not suitable for very large scale data integration provides a static integration management mechanism
• Whenever a new schema or a new database is added to the integrated database, the previous processes is required.
• It causes the increase of costs and time geometrically.
not provide a standardized guideline• i.e., it depends on its domain.• each application domain for integration define and utilize the different and
various guidelines respectively.
July 5, 2002 CODATA/DSAO 2002 8
Related works: Top-down approach(1/2) Top-down approach
to solve the problems of the bottom-up approach MDR(ISO/IEC 11179) is representative
• MDR is the international standard
Design and create a guideline such as a global view(metadata elements) from the specified databases
new databases
Analyze all factual databases
Define the schemas of new databaseaccording to the standardized guideline
July 5, 2002 CODATA/DSAO 2002 9
Related works: Bottom-up approach(2/2) Advantages
reduces many costs• because it doesn’t require for the rebuilding process of the global guideline.
provides a standardized schema• all new databases can be built and managed consistently.
Disadvantages It also spends many costs initially as the bottom-up approach
• because it require for the create a global view through analysis of all legacy databases.
• It is a hard work in case of the very large scale integration.
July 5, 2002 CODATA/DSAO 2002 10
Overview on the MDR: Definition
Definition of MdR Metadata Registry System of Registering, Storing and managing the specification(Metadata)
about data elements Evolution of ISO/IEC 11179 Metamodel of Data Registry : ANSI X3.285
Purpose Metadata Registry for data standardization Support of data search, data specification Support of data sharing among systems or organizations Supporting System of creating, registering and managing data element Support understanding of meaning, representation and identification of
data for users
July 5, 2002 CODATA/DSAO 2002 11
Overview on the MDR: Basic concepts Data Element
The basic unit of data management the unit specifying the identification, context, representation of value about data
Components of Data Element Object Class : The data for collecting or storing Property : the characteristics needed to identify and explain objects Representation : The description about representational form and value domain of
each data elements
Object Class
Property
Data Element Concept
1:N
1:1
Object Class
Property
Data Element
1:N
1:1
Representation
1:1
July 5, 2002 CODATA/DSAO 2002 12
Overview on the MDR: Specification Specification of Data Element
Basic Attribute for specifying data element
Classification Characteristics
Identification Identification of data element
Definition Description of meaning
Relation Relation of data elements
Representation Description of data element representation
Administration Description of data element management
July 5, 2002 CODATA/DSAO 2002 13
Overview on the MDR: An Example
Definition of a metadata element
Identifying and Definition
Attributes
Data Element Name Student_IDIdentifier 2002020177Version 1Synonymous name Student Number
Context Student’s ID
Definitional Attribute Definition Assigned the unique number to each student
Relational and
Representational Attributes
Type Data Element
Representation Category Number
Representation Form CodeData Type NumericMin.size 7Max.size 12
Representation Layout N(12)
Data Domain reference of student ID classification
Administrative Attribute Registration Authority KOREA UNIV.
Registration Status recorded
July 5, 2002 CODATA/DSAO 2002 14
The scientific data properties
The scientific data(knowledge) has the following properties: the general data
• most people can understand and use it easily.• most databases in the scientific fields have the similar or same data elements.
the specialized data• are more complicated and detailed.• the general users can’t understand it.• the experts in the specific group are interested in the data, and can utilize it.
※ Building the MDR for all data as a whole is not necessary
July 5, 2002 CODATA/DSAO 2002 15
User levels and the data property
Classification of users The users are classified into two groups according to the scientific data
property• The general users and the specialized users.
The general users• use the general data in high-level and in the many fields.
The specialized users• domain experts in a specific field.• use the general data and specialized data.• also differentiated into more detailed fields.
i.e., The specialized users are distributed into several groups, the experts in each group are interested in more specialized data independently.
July 5, 2002 CODATA/DSAO 2002 16
Data visibility
Data visibility The quantity and the specialized degree is differentiated into several
levels according to the knowledge property, and each level has a independent data set
allusers
detailed-specialized
users n
specializedusers
detailed-specialized
users 1
generalusers
. . .
used by all users
used by specialized users
used in independentexpert domain group
the whole data set
set 1
set 2
set 3
set 4set 5
July 5, 2002 CODATA/DSAO 2002 17
The conceptual relation diagram
General User 1 General User 2 General User n
DomainExpert 1
.
.
.
DomainExpert 2
DomainExpert n
Local MDR 1 (Domain 1)
Local MDR 2 (Domain 2)
Local MDR m(Domain m)
DB 11 DB 12
DB 1n. . .
DB 21 DB 22
DB 2n. . .DB m1 DB m2
DB mn. . .. . .Domain mDomain 2Domain 1
. . .
Global MDR
Localization
Globalization
Specialization
Generalization. . .
July 5, 2002 CODATA/DSAO 2002 18
The conceptual model of the LoG The LoG methodology has four layers
Interface Layer• provides the user interface environments for all users.
Global MDR Layer• manages the global MDR for the most generalized and common data which all
users(general and specialized users) utilize and access.
Local MDR Layer• manages the local MDRs for the specialized data which the experts use.• The local MDR may be hierarchical structure.
Factual Database Layer• manages the low and factual data.
User Interface LayerUser Interface Layer
Factual Database LayerFactual Database Layer
Global MDR Layer (Generalized Layer)Global MDR Layer (Generalized Layer)
Local MDR Layer (Specialized Layer)Local MDR Layer (Specialized Layer)
July 5, 2002 CODATA/DSAO 2002 19
Factual DB Layer Factual DB Layer
A LoG Framework(1/2)
DB 11 DB 12DB 1n. . .
DB 21 DB 22DB 2n. . .
DB m1 DB m2DB mn. . .. . .
Domain mDomain 2Domain 1
Global User Interface (General User Level Interface)Global User Interface (General User Level Interface)
Loc
al U
ser
Inte
rfac
e(E
xper
t Lev
el I
nter
face
)
Expert Level
Interface Agent
LMDR Agent(Registration, Classification, Authorization)
LMDRs
LMDR 1 LMDR 2 LMDR n…
LMeta Repository
(Sets of actual metadata)
General User Level
Interface AgentGMDR Agent
(Registration, Classification) GMDR
GMeta RepositoryGlobal MDR
Layer
Local MDR Layer
User Interface Layer
Factual DB Layer
July 5, 2002 CODATA/DSAO 2002 20
A LoG Framework(2/2) Interface Layer
Global user interface and local user interface sub-layers Global MDR layer
GMDR agent• manage the GMDR(global MDR) and the GMeta(global metadata repository).
GMDR(global MDR)• a standardized guideline for general users and experts.• the set of metadata elements used commonly in all databases.
GMeta(global metadata repository)• the set of actual metadata
Local MDR layer LMDR agent
• manage the LMDRs and the LMeta LMDRs(local MDRs)
• a standardized guideline for the specialized users.• a set of metadata elements which is to generalize data in each field or detailed
field.
July 5, 2002 CODATA/DSAO 2002 21
GMDR
LMDRs
An Example
Namedefinition the unique object name
version 1registration
statusstandard
datatype characterformat character(20)
Biological Order Name
definition The systematic name that represents
the biological Species
version 1registration
statusstandard
datatype characterformat character(50)
Chemical Molecular Formula Code
definitionThe code that represents the number of atoms of each element in a molecule of a chemical substance
version 1registration
statusstandard
datatype characterformat character(100)
NameBiological Order Name
. . .
NameChemical Molecular Formula
Code . . .
. . .
. . .
July 5, 2002 CODATA/DSAO 2002 22
Conclusions and Future work
Conclusions We considered and defined the domain knowledge property The LoG methodology is proposed with the knowledge property
• provides a dynamic integration mechanism partially.• provides a standardization guideline based on ISO/IEC 11179, the
international standard.• reduces unnecessary costs from analysis and design all databases for creation
of a global view.
Future work to analyze and define the domain knowledge property in detail to implement a prototype based on the framework we described
Q / AThanks !