LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and...

13
T.M.T. Sembok et al. (Eds.): ICADL 2003, LNCS 2911, pp. 193–205, 2003. © Springer-Verlag Berlin Heidelberg 2003 A Multimedia Digital Library System Based on MPEG-7 and XQuery * Mann-Ho Lee 1 , Ji-Hoon Kang 1 , Sung Hyon Myaeng 2 , Soon Joo Hyun 2 , Jeong-Mok Yoo 1 , Eun-Jung Ko 1 , Jae-Won Jang 1 , and Jae-Hyoung Lim 2 1 Dept. of Computer Science, Chungnam National University 220 Gung-Dong, Yuseong-Gu, Daejeon, 305-764, South Korea {mhlee,jhkang,jmyoo,brain08,jaeback}@cs.cnu.ac.kr 2 School of Engineering, Information and Communications University 58-4 Hwaam-Dong, Yuseong-Gu, Daejeon, 305-732, South Korea {myaeng,shyun,theagape}@icu.ac.kr Abstract. We designed and implemented a digital library system that supports content-based retrieval of multimedia objects based on MPEG-7 and XQuery. MPEG-7, a metadata standard for multimedia objects, can describe various features such as color histograms of images and content descriptors. As such, a multimedia digital library system built with MPEG-7 metadata can provide content-based search for multimedia objects. Since MPEG-7 is defined by XML schema, XQuery is a natural choice as a query language. By adopting the standards, MPEG-7 and XQuery, our system has benefited for interoperability. First, any MPEG-7 metadata instance can be accommodated into the system with no additional effort. Second, any system that can generate a query in XQuery can retrieve necessary information. Third, our system can be easily extended to support any XML documents. Currently we have developed a prototype system for retrieving images of Korean cultural heritage. 1 Introduction MPEG-7 is a metadata standard for multimedia objects [1]. It allows representing traditional metadata attributes such as title, authors, date, and so on. Moreover, MPEG-7 can also describe various physical features such as color histograms of images and content descriptors. Consider that a query image is give to find similar ones from the set of images with MPEG-7 metadata, especially with color histograms. If we extract the color histogram from the query image, we can search the MPEG-7 metadata set for similar images by comparing the color histograms. As such, a multimedia digital library system built with MPEG-7 metadata can provide content- based search for multimedia objects. The MPEG-7 metadata schema is defined by XML schema. Each instance of MPEG-7 is an XML data. In order to retrieve MPEG-7 metadata, we need to consider query languages for XML. There have been some XML query languages like as XQL[2], XML-QL[3], Quilt[4], XPath[5], and XQuery[6]. Among them, XQuery, * This research has been supported by Software Research Center, founded by KOSEF.

Transcript of LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and...

Page 1: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

T.M.T. Sembok et al. (Eds.): ICADL 2003, LNCS 2911, pp. 193–205, 2003. © Springer-Verlag Berlin Heidelberg 2003

A Multimedia Digital Library System Based on MPEG-7 and XQuery*

Mann-Ho Lee1, Ji-Hoon Kang1, Sung Hyon Myaeng2, Soon Joo Hyun2, Jeong-Mok Yoo1, Eun-Jung Ko1, Jae-Won Jang1, and Jae-Hyoung Lim2

1 Dept. of Computer Science, Chungnam National University 220 Gung-Dong, Yuseong-Gu, Daejeon, 305-764, South Korea

{mhlee,jhkang,jmyoo,brain08,jaeback}@cs.cnu.ac.kr 2 School of Engineering, Information and Communications University

58-4 Hwaam-Dong, Yuseong-Gu, Daejeon, 305-732, South Korea {myaeng,shyun,theagape}@icu.ac.kr

Abstract. We designed and implemented a digital library system that supports content-based retrieval of multimedia objects based on MPEG-7 and XQuery. MPEG-7, a metadata standard for multimedia objects, can describe various features such as color histograms of images and content descriptors. As such, a multimedia digital library system built with MPEG-7 metadata can provide content-based search for multimedia objects. Since MPEG-7 is defined by XML schema, XQuery is a natural choice as a query language. By adopting the standards, MPEG-7 and XQuery, our system has benefited for interoperability. First, any MPEG-7 metadata instance can be accommodated into the system with no additional effort. Second, any system that can generate a query in XQuery can retrieve necessary information. Third, our system can be easily extended to support any XML documents. Currently we have developed a prototype system for retrieving images of Korean cultural heritage.

1 Introduction

MPEG-7 is a metadata standard for multimedia objects [1]. It allows representing traditional metadata attributes such as title, authors, date, and so on. Moreover, MPEG-7 can also describe various physical features such as color histograms of images and content descriptors. Consider that a query image is give to find similar ones from the set of images with MPEG-7 metadata, especially with color histograms. If we extract the color histogram from the query image, we can search the MPEG-7 metadata set for similar images by comparing the color histograms. As such, a multimedia digital library system built with MPEG-7 metadata can provide content-based search for multimedia objects.

The MPEG-7 metadata schema is defined by XML schema. Each instance of MPEG-7 is an XML data. In order to retrieve MPEG-7 metadata, we need to consider query languages for XML. There have been some XML query languages like as XQL[2], XML-QL[3], Quilt[4], XPath[5], and XQuery[6]. Among them, XQuery,

* This research has been supported by Software Research Center, founded by KOSEF.

Page 2: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

194 M.-H. Lee et al.

which has been influenced from most of the previous XML query languages, is a forthcoming standard for querying XML data.

We designed and implemented a digital library system that supports content-based retrieval of multimedia objects based on two important standards, MPEG-7 and XQuery. MPEG-7 enables content-based search. Since MPEG-7 metadata is XML data, XQuery was a natural choice as a query language. Besides we believe that the use of both a standard metadata and a standard query language would be essential for interoperability among multimedia digital library systems.

A user can give a query in the form of an example image or a template filled with values to be used as search keys. The system searches MPEG-7 metadata and returns the metadata that matches the query to a varying extent. The architecture is depicted in Fig. 1. The system consists of five main modules: MPEG-7 Metadata Creator, User Interface, XQuery Engine, Information Retrieval(IR) Engine, and Repository Engine.

UserInterface

XQueryEngine IR Engine

RepositoryEngine

MPEG-7MetadataCreator

Index

MediaContents

MPEG-7Metadata

Media Contents

MPEG-7 Metadata

Object ID

Object

UserQuery

Answer

MultimediaObjects

Resultsin XML

ResultSet

POT

Object ID& Path Result Set

XQuery

Fig. 1. A Digital Library System based on MPEG-7 and XQuery

The MPEG-7 metadata Creator gets multimedia objects as input and creates their corresponding MPEG-7 metadata. The objects are stored in a multimedia server and their MPEG-7 metadata in a relational DB by the Repository Engine.

The User Interface gives an interactive, easy-to-use, and graphical interface. It translates a user query to its corresponding form in XQuery and passes it to the XQuery Engine. The result in XML from the XQuery Engine, which is an XML fragment of MPEG-7 metadata, is returned to the user.

The XQuery Engine parses the query in XQuery and generates its Primitive Operation Tree (POT), which is defined as a protocol to the IR Engine. The IR Engine takes a POT from the XQuery Engine and searches for the metadata specified by the POT using the pre-configured index. The IR Engine finally finds a list of pairs of an object ID and its path information for the metadata fragment, which is given to the Repository Engine. The engine generates an SQL query to find the XML fragments satisfying the list from the MPEG-7 metadata DB. It executes the query and obtains the result set, which is returned to the XQuery Engine through the IR Engine. The XQuery Engine transforms the result set into an XML form that satisfies the constraint given by the query.

By adapting the standards, MPEG-7 and XQuery, our system benefits for interoperability. First, any MPEG-7 metadata instance can be accommodated into the system with no additional effort. Second, any system that can generate a query in

Page 3: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 195

XQuery can retrieve necessary information. Third, our system can be easily extended to support any XML documents. Currently we have developed a prototype system for retrieving images of Korean cultural heritage.

2 Related Works

Most of the XQuery Engines assumes a storage system such as relational database systems and directly translates a query in XQuery into a storage-dependent query such as an SQL query [7, 8, 9]. Therefore they depend on the storage systems. In our approach, however, the XQuery Engine does not depend on the storage system and only generates POTs, which we consider is a more general data structure than any system-dependant storage.

Retrieving structured documents have been an important issue for information retrieval and database communities. Earlier work emphasized on the principles of ranking when texts are stored in a hierarchically structure document [10, 11]. More recently, researchers have been attempting to extend several query languages for XML data retrieval by inserting similarity-based operations [12]. There has been no approach to using XQuery to deal with MPEG-7 metadata that includes multi-media features such as color layout information in addition to text- and structure-based query constraints.

The MPEG-7 standard focuses on content-based multimedia retrieval and search. The content-based multimedia retrieval and search have power to improve services and systems in digital libraries, education, authoring multimedia, museums, art galleries, radio and TV broadcasting, e-commerce, surveillance, forensics, entertainment, so on [13]. There are many researches and developments of audio, video, and multimedia retrieval techniques and systems. Earlier researches used their own description metadata to describe media contents and to retrieve efficiently [14, 15, 16, 17]. As the MPEG-7 has been released as a multimedia metadata standard, researchers started to use it as the media content description tool [18, 19, 20, 21, 22, 23, 24]. However, currently, there are very few cases that a system has a complete framework based on MPEG-7 including media content metadata generation, metadata repository management, user query interface enabling content-based search, XQuery processing, and information retrieval technique.

3 System Architecture

3.1 MPEG-7 Metadata Creator

The MPEG-7 Metadata Creator is a graphic user interface. It receives an image and text annotation explaining the image as input and creates the MPEG-7 metadata instance for the image. The physical information of the image such a color histogram can be extracted from the image by using the feature extraction software [25]. Some information in the annotation data can be extracted by natural language processing, and automatically fill the corresponding fields of the MPEG-7 metadata. Other fields

Page 4: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

196 M.-H. Lee et al.

for semantic information should be filled manually. The MPEG-7 metadata and the image are passed to the Repository Engine so that it can store and manage them.

The MPEG-7 metadata instance is an XML document based on the XML schema. In order to create a metadata instance, users should know the structure of the MPEG-7 schema structure. However, the interface does not require the users to understand the schema in detail. The interface displays the metadata field entries and asks the users to fill the entries, and then it creates an MPEG-7 metadata from the information entered by the users. The interface displays the schema structure on the left hand side of the display for the MPEG-7 schema experts.

3.2 User Interface

The User Interface provides the users with an interactive, easy-to-use, and graphical interface. It translates a user query to its corresponding form in XQuery and passes it to the XQuery Engine. Moreover, it gets the retrieval result in XML passed from the XQuery Engine, converts it into the HTML format, and displays it to the users.

Fig. 2. The Graphic Interface to create a query in XQuery

XQuery consists of FLWR expressions that support iteration and binding of variables to intermediate results. Therefore, users must understand the XML schema and the FLWR expression to write user query in XQuery. It would be very difficult to write a query in this language XQuery correctly. The User Interface of the system allows the users to create a query in XQuery without knowing the XML schema and XQuery in detail.

The interface to create a query in XQuery is shown in Fig. 2. The window of the interface is divided into sub-windows, for for/let-clause, for where-clause (not shown in the figure), for return-clause, for XQuery-result. In the first three sub-windows,

Page 5: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 197

users fill in the retrieval conditions. The XQuery-result sub-window displays the result created so that users can confirm the query. While creating a query in XQuery, the Boolean expression for retrieval condition can be optimized. The result of optimization can be seen in another window. The query created is passed to the XQuery Engine for further processing of retrieval.

For example, let’s consider the following query:

Retrieve titles of images created in Daejeon(“ ”) and containing

cultural objects of stone(“ ”) or wooden(“ ”) artifacts.

We type “ ”, “ ”, “ ” in the corresponding fields in the for/let-clause, build a Boolean expression in where-clause sub-windows, and build the format of the result in the return-clause sub-window. Then a query is created as shown in Fig. 3.

<Results>{

for $doc in input()for $img_poc in $doc/Mpeg7/.../Namefor $cul_desc in $doc/Mpeg7/.../FreeTextAnnotationfor $image in $doc/Mpeg7/.../MediaUrifor $docId in $doc/Mpeg7/.../PublicIdentifierfor $cul_title in $doc/Mpeg7/.../Title[@type="main"]where

(contains(string($img_poc), " ")) and ((contains(string($cul_desc), " ")) or(contains(string($cul_desc), " ")))

ranklimit(0)return

<Record> <image> { $image } </image><DocId> { $docId } </DocId><Title> { $cul_title } </Title>

</Record> }</Results>

<Results>{

for $doc in input()for $img_poc in $doc/Mpeg7/.../Namefor $cul_desc in $doc/Mpeg7/.../FreeTextAnnotationfor $image in $doc/Mpeg7/.../MediaUrifor $docId in $doc/Mpeg7/.../PublicIdentifierfor $cul_title in $doc/Mpeg7/.../Title[@type="main"]where

(contains(string($img_poc), " ")) and ((contains(string($cul_desc), " ")) or(contains(string($cul_desc), " ")))

ranklimit(0)return

<Record> <image> { $image } </image><DocId> { $docId } </DocId><Title> { $cul_title } </Title>

</Record> }</Results>

Fig. 3. Example query in XQuery

Fig. 4. The Graphic Interface to display the result set

The query is passed to the XQuery Engine, and the User Interface gets the result set from XQuery Engine and displays the result set as shown in Fig. 4. In the result set, only thumbnail, title, and the object ID (not shown in the display) of the images are retrieved. Clicking a thumbnail image passes the object ID of the image to the Repository Engine. Then the Repository Engine searches the database for the metadata and the image with the object ID, and passes them to the User Interface. Now, users can get the detail information and the full image of the selected image. This method to display the final result set in two steps guarantees the fast service of the overall system.

Page 6: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

198 M.-H. Lee et al.

For the user’s convenience, the User Interface provides options to display the result set, in HTML format or in PDF format. Users can choose one of them as they want.

Moreover, it supports content-based retrieval query by example for images that uses features such as color histogram extracted from the query image by using the feature extraction software.

3.3 XQuery Engine

The basic function of the XQuery Engine consists of three sub-processes, (1) receiving a query in the form of XQuery from the User Interface, (2) getting an answer of the query from the IR Engine, and (3) returning it back to the User Interface.

Our XQuery Engine has the following useful aspects. First, any user interface that generates a query in XQuery is able to access any digital library system including our XQuery Engine. Second, we define a set of primitive operations for POTs so that they can become a standard interface between the XQuery Engine and an IR Engine. Third, some query optimizations over POTs can be done in the XQuery Engine so that better searching performance is expected.

list _Return (int, string)15

list _AndNot (list, list)14

list _And (list, list)13

list _Or (list, list)12

list _ContainStructuredAnnotaiton (path, definition, string)11

list _MatchStructuredAnnotation (path, definition, string)10

list _ContainTitle (path, type, string)9

list _MatchTitle (path, type, string)8

list _SimilarTo (path, string)7

list _Contain (path, string)6

list _GreaterThanEqualElement (path, value)5

list _GreaterThanElement (path, value)4

list _LessThanEqualElement (path, value)3

list _LessThanElement (path, value)2

list _MatchElement (path, string)1

Primitive OperationPO Number

list _Return (int, string)15

list _AndNot (list, list)14

list _And (list, list)13

list _Or (list, list)12

list _ContainStructuredAnnotaiton (path, definition, string)11

list _MatchStructuredAnnotation (path, definition, string)10

list _ContainTitle (path, type, string)9

list _MatchTitle (path, type, string)8

list _SimilarTo (path, string)7

list _Contain (path, string)6

list _GreaterThanEqualElement (path, value)5

list _GreaterThanElement (path, value)4

list _LessThanEqualElement (path, value)3

list _LessThanElement (path, value)2

list _MatchElement (path, string)1

Primitive OperationPO Number

Fig. 5. Primitive Operations

The portability is an important issue for the design of the XQuery Engine. Since we adopt a standard XML query, called XQuery, for user queries, any user interface that can generate XQuery can interface with the XQuery Engine. Another relationship between the XQuery Engine and the IR Engine also requires a standard information exchange method. For this purpose, we define 15 primitive operations as in Fig. 5. Each primitive operation is an atomic operation for IR Engine. Any query can be represented by a Primitive Operation Tree (POT) whose leaf nodes are primitive operations. POT is a more general data structure than any system-dependent storage representation. By using POT, our XQuery Engine can run on top of any IR Engine that can understand POT.

Page 7: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 199

User Interface

IR Engine

XMLConstructor

XQueryParser

Tag Info.Extractor

POTGenerator

SyntaxTree

Result Set

Tag Info.

Resultin XML

XQeury

POT

Fig. 6. XQuery Engine Architecture

Fig. 6 shows the architecture of the XQuery Engine. A user query is given to the XQuery Parser. The parser analyses the query and produces a syntax tree for the query. The syntax tree is used for two purposes. First, it is transformed into a POT by the POT Generator, which is passed to the IR Engine. Second, any answer to a query should be an XML instance, whose format is specified in the original XQuery and finally in the syntax tree. This tagging information is extracted from the syntax tree by the Tagging Information Extractor. Later after the XQuery Engine receives a result set from the IR Engine, the XML Constructor transforms the result set into an XML instance according to the tagging information. The result in XML is returned to the User Interface as an answer to the given user query.

POT root

_And

_Or

_Contain _Contain

_Contain

$img_doc

$cul_desc $cul_desc

_Return

20 $image#$doc_id#$cul_title

Fig. 7. The POT corresponding to the query in Fig. 3.

Fig. 7 is a POT for the query in Fig. 3. A POT is serialized as a string, which is actually passed to the IR Engine. Currently, XQuery is a working draft and does not fully support the functions for information retrieval. For our purpose, we slightly extend the XQuery syntax by adding the “ranklimit” phrase, which is used when a user wants to restrict the number of ranked results. An example of the ranklimit phrase is shown also in Fig. 3 (circled by an oval).

Page 8: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

200 M.-H. Lee et al.

3.4 Information Retrieval(IR) Engine

In traditional information retrieval systems, a document as a whole is the target for a query. With increasing interests in structured documents like XML documents, there is a growing need to build an IR system that can retrieve parts of documents, which satisfy not only content-based but also structured-based requirements [10]. The retrieval engine we developed in this work supports the functionality with the goal of handling the semantics of the XQuery for MPEG-7 data (i.e. metadata of images).

Like conventional retrieval systems, our retrieval engine consists of two modules: the indexer and the retrieval modules. The indexer receives MPEG-7 data in the form of XML from the Repository Engine and parses them to create an index whose basic structure is in the form of an inverted file. The indexer takes the parsed elements and attributes of MPEG-7 data and creates the index structure as in Fig. 8 so that the traditional content-based and structure-based retrieval can be done together.

Inverted File

DocID uIDpIDw aID

DocID : Document IDpID : Full Path ID uID : Unique IDaID : Attribute IDw : WeightDF : Document Frequency

. . .

Posting File

term N

.

.

.

3term 2

5term 1

OffsetDFTerm

term N

.

.

.

3term 2

5term 1

OffsetDFTerm

Fig. 8. Index structure for structure- and content-based retrieval

The retrieval module takes a POT from the XQuery Engine and searches the metadata (i.e. MPEG-7 metadata) using the preconfigured index. The IR Engine finally finds a list of documents represented by pairs of an object ID corresponding to an image and its path information to the element satisfying the term in a primitive operation, so that the structural requirement in the query can be examined. After the necessary operations combining the partial results (e.g. a Boolean operation), the results are given to the Repository Engine that fetches the actual image corresponding to the MPEG-7 data.

The dictionary part of the inverted file structure as depicted in Fig. 8 consists of terms, document frequencies (DF), and offset values. Terms are extracted from the elements of MPEG-7 metadata as if they are extracted from unstructured documents. DF is counted by considering the metadata for an image as a document. The offset part contains the starting location of the list of document information in the Posting File. Each segment of the list in the Posting File contains the document ID (DocID), weight, path ID (pID), unique ID (uID), and attribute ID (aID). DocID is the identifier of the document stored in the Repository Engine.

We use the well-known the tf*idf weight to calculate the weight of the term in a document. The next item, pID, is the identifier of the predefined absolute path from the root element to the leaf node containing the term in an XML document. Next to the pID is uID (unique ID) that uniquely identifies individual elements in the XML document. In order to locate a child or the parent of an element easily, we employ a K-ary tree by which the hierarchical structure of a tree (an XML document in this

Page 9: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 201

case) is well represented [26]. The relationship between a parent element and its child element can be directly obtained as follows:

[ ]1/)2()( +−= KiiParent (1)

where i is an arbitrary node in K-ary tree and K is a degree that each internal node has. The attribute ID, aID, serves as an identifier of an attribute value and is used for

retrieval based on an attribute value assigned to an element.

28 26 34 4 20 28 14 16 17 35Doc Nn

••••••

41 31 33 12 25 7 17 21 16 26 16 8Doc Ni+1

35 26 34 4 20 27 8 14 16 25 17 11Doc Ni

ColorDoc ID

28 26 34 4 20 28 14 16 17 35Doc Nn

••••••

41 31 33 12 25 7 17 21 16 26 16 8Doc Ni+1

35 26 34 4 20 27 8 14 16 25 17 11Doc Ni

ColorDoc ID

Fig. 9. Index structure for a color-based retrieval

Fig. 9 shows an index structure that is used for high-speed retrieval and browsing by using the color information of images. This index structure consists of DocID (Document ID), and a vector of “color layout” values which specify a spatial distribution of colors. Each number in a color vector represents the degree of the particular color axis. The similarity of a query and an image in terms of a color vector is calculated based on the following formula:

∑=

−=n

jijji dqDQSim

1

),( (2)

where Q & iD represent the color layout vector of a query & a vector for a

document i. The retrieval module ranks a set of documents by using the p-norm model, and uses structural constrains as a filter [27]. For a query containing both term-based and color-based constraints, we perform a merge operation of the retrieval results: one set coming from term-based retrieval and another from color-based retrieval.

A query in XQuery can be divided into several primitive operations that are structure-based or content-based or even a combination of them. Table 1 shows an example of a retrieval query that is divided into three sub-queries.

For structure-based retrieval that takes into account the logical structure of a document, we first retrieve a set of documents that are relevant to a given keyword

“ ” in the query, and filters out the documents that do not satisfy the structural constraints expressed with path information, “/MPEG7/Descripton/MultimediaContent/ Image/CreationInformation/Creation/CreationCoordinates/Location/Nam.”

Page 10: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

202 M.-H. Lee et al.

Table 1. An example of different types of queries

TYPE QUERY

Structure-based Contain (/MPEG7/Descripton/.../CreationCoordinates/Location/Name, “ ”)

Content-based Similar_To (/Mpeg7/Description/.../Image/VisualDescriptor,

"29 22 36 14 8 7 15 3 16 13 15 19")

combination Contain (/MPEG7/Descripton/.../CreationCoordinates/Location/Name, “ ”) AND Similar_To (/Mpeg7/Description/.../VisualDescriptor,

"29 22 36 14 8 7 15 3 16 13 15 19")

3.5 Repository Engine

The Repository Engine (RE) provides an effective storing and retrieval service of MPEG-7 metadata. The main functions of the RE are (1) extracting structural information, (2) storing XML documents and image (media) files, (3) retrieving XML data. The RE extracts the structural information of MPEG-7 metadata from metadata documents and stores the information in the database. Then RE stores metadata document instances sent by the MPEG-7 metadata creator according to MPEG-7 metadata structure. The RE stores media content files and provides them to the clients through HTTP. When the IR engine requests XML data specified with a document ID and an absolute path, the RE performs a retrieval task using the structure information and delivers the resulting XML metadata to the IR engine.

In the following subsections, we show the architecture of the RE and discuss how it carries out structure information extraction, XML document and image file management, and XML data retrieval.

Architecture of RE. The RE has three major components, the Structural Information Generator, the Storage Manager, and the Retrieval Manager as shown in Fig. 10.

MPEG-7 Metadata Creator

IR Engine

StructuralInformationGenerator

ImageLocator

ContentGenerator

Updated InformationProvider

Multimedia InformationFinder

MediaContent

Files

MPEG-7Metadata

StructuralInformation

MediaContentFile

MPEG-7XMLDocument

XSD orMPEG-7XML Document

MPEG-7XML

DocumentRetrieval

ResultRetrieval Request(Document IDand Element Path)

StorageManager

RetrievalManager

Fig. 10. Repository Engine Architecture

Page 11: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 203

The Structural Information Generator extracts the structural information of MPEG-7 XML metadata and stores it into a database. It creates tables that store the metadata instances based on the structure. The Storage Manager stores the MPEG-7 metadata instances and media files into the database. It consists of Content Generator and Image Locator which manipulate XML document and media files, respectively. The Retrieval Manager interacts with the IR engine. In response to the requests from the IR engine, it transfers XML documents to the IR engine for indexing. It consists of Updated Information Provider and Multimedia Information Finder. The Updated Information Provider sends XML documents to the IR engine for index update and the Multimedia Information Finder carries out retrieval service.

Automatic Extraction of Metadata Structural Information. Registering the structural information of MPEG-7 description metadata, the RE stores and retrieves metadata according to the structure.

To extract and store the structural information, there are two often used methods: deductive and inductive methods. The former lets the system deal every form of document instances by using schema definition (XSD: XML Schema Definition)[28] defined in the standard MPEG-7 specifications. This method has advantages that can include the whole MPEG-7 definition and can accept many changes between document instances, but has problem of high developing cost and execution time. The latter extracts the structural information from a document instance. For the inductive method, the input document is required to satisfy only the minimum number of the MPEG-7 descriptors used by a specific application.

The RE adopts the inductive method because each MPEG-7 based system uses only several descriptors, and thus supporting all descriptors defined by the MPEG-7 standard specifications causes unnecessary overhead in the course of development and run-time performance degrade.

Storing XML Document and Media Contents. When a media content (i.e. image) file and its MPEG-7 description metadata document are transferred in a pair from MPEG-7 XML metadata creator, the Storage Manager stores them separately and maintains a mapping table of Uniform Resource Identifiers (URI) of media files and document IDs [29]. The Image Locator deals with storing media content files and the Content Generator maintains MPEG-7 description metadata documents.

When the Storage Manager receives a new XML document, the document is not stored in file format, but broken down into elements and attributes. Then it stores them in the database according to the pre-identified structure information. This database approach allows faster metadata access and media file retrieval. We used Microsoft SQL Server 2000 as the underlying database management tool in the prototype development.

After storing an XML document instance and a media content file, the Storage Manager generates a document ID for the XML document and a URI for the media content file. It keeps the pair in a database table, and transfers the ID with the document to the IR engine where the metadata document is indexed for retrieval service.

Page 12: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

204 M.-H. Lee et al.

Metadata Retrieval. The Retrieval Manager receives two sets of arguments in a query from the IR engine: a set of MPEG-7 metadata document IDs and absolute paths of elements in the documents. From the absolute paths, the system finds the corresponding element IDs from the structural information table in the database. With the element IDs and other structural information in the database, such as sibling orders, a DOM tree of each element is constructed. The DOM trees are the result set of the query to be returned to the IR engine.

4 Discussion

Metadata has become an essential element for resource discovery and retrieval, especially with unstructured data such as text and multimedia. Among others, MPEG-7 is quite challenging and interesting to deal with because of the variety of features and the structural complexity that can be represented in its manifestation.

Our research is one of few efforts to handle MPEG-7 metadata using XQuery, a standard query language for XML, in a digital library setting. Our goal to this point has been to develop a system that consists of essential components that any digital library for MPEG-7 metadata should provide: a persistent storage and its manager for MPEG-7 metadata and the corresponding images, an indexing and retrieval engine that can handle various operations necessary for MPEG-7 metadata, an XQuery processing engine capable of handling queries targeted at MPEG-7 yet extensible for other purposes, and a query interface for users who are not an expert in XQuery nor in MPEG-7.

We have developed a system that can handle image data in Korean cultural heritage, with which the essential digital library functionality for MPEG-7 can be demonstrated. While we have addressed various technical issues in the course of designing and developing this system, we feel that this is still a test bed that can be used to enhance the technology from both theoretical and practical points of view and eventually build an operational digital library.

References

1. ISO,/IEC: Introduction to MPEG-7 (version 4.0). approved, N4675, March 2002 2. W3C: XML Query Language (XQL). Sep. 1998.

(http://www.w3.org/TandS/QL/QL98/pp/xql) 3. W3C: A Query Language for XML. Note, Aug. 1998.

(http://www.w3.org/TR/NOTE-xml-ql/) 4. IMB: An XML Query Language. May. 2000.

(http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html) 5. W3C: XML Path Language (XPath) 2.0. Working Draft, May. 2003.

(http://www.w3.org/TR/xpath20/) 6. W3C: XQuery 1.0: An XQuery Language. Working Draft, May. 2003.

(http://www.w3.org/TR/2003/WD-xquery-20030502) 7. Shanmugasundaram, J., Kiernan, J., Shekita, E., Fan, C., Funderburk, J.: Querying XML

Views of Relatinal Data. Proc. of 27th VLDB, Rome, Italy, Sep. 2001

Page 13: LNCS 2911 - A Multimedia Digital Library System Based on MPEG-7 and XQueryir.kaist.ac.kr/wp-content/media/anthology/2003.12-Lee.pdf · 2011. 7. 8. · 2 Related Works Most of the

A Multimedia Digital Library System Based on MPEG-7 and XQuery 205

8. Fan, C., Funderburk, J., Lam, H., Kiernan, J., Shekita, E., Shanmugasundaram, J.: XTABLES: Bridging Relational Technology and XML. IBM System Journal, 2002

9. DSRG: Rainbow: Relational Database Auto-tuning for Efficient XML Query Processing. 2002. (http://davis.wpi.edu/dsrg/rainbow)

10. Myaeng, S.H., Jang, D., Kim, M., Zhoo, Z.: A Flexible Model for Retrieval of SGML Documents. In Proc. of the 21st SIGIR, 1998, 138–145

11. Navarro, G., Baeza-Yates, R.: Proximal Nodes, A Model to Query Document Databases by Content and Structure. ACM Trans. on Information Systems, 15 (4), 1997, 400–435

12. Bremer, J.M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In Proc. of the 5th Int. Workshop on Web and Databases, Madison, Wisconsin, Jun. 6–7, 2002, 1–6

13. Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: Multimedia Content Description Interface. ISBN0471486787, John Wiley&Sons, 2002, 335–361

14. Hu, M.J., Jian, Y.: Multimedia Description Framework (MDF) for Content Description of Audio/Video Documents. Proc. of the 4th ACM Conf. on Digital Libraries, Aug. 1999

15. Chung, T.S., Ruan, L.Q.: A Video Retrieval and Sequencing System. ACM Transactions on Information Systems (TOIS), Vol. 13, Issue 4, Oct. 1995

16. Li, W., Gauch, S., Gauch, J., Pua, K.M.: VISION: A Digital Video Library. Proc. of the 1st ACM Int. Conf. on Digital Libraries, Apr. 1996

17. Chang, S.F., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: VideoQ: An Automated Content Based Video Search System Using Visual Cues. Proc. of the 5th Int. Conf. on Multimedia, Nov. 1997

18. Graves, A., Lalmas, M.: Video Retrieval using an MPEG-7 based Inference Network. Proc. of the 25th annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Aug. 2002

19. Rehm, E.: Representing Internet Streaming Media Metadata using MPEG-7 Multimedia Description Schemes. Proc. of the 2000 ACM workshops on Multimedia, Nov. 2000

20. Martínez, J.M., González, C., Fernández, O., García, C., de Ramón, J.: Towards Universal Access to Content using MPEG-7. Proc. of the 10th ACM Int. Conf. on Multimedia, Dec. 2002

21. Charlesworth, J.P.A., Garner, P.N.: Spoken Content Metadata and MPEG-7. Proc. of the 2000 ACM Workshops on Multimedia, Nov. 2000

22. Döller, M., Kosch, H., Dörflinger, B., Bachlechner, A., Blaschke, G.: Demonstration of an MPEG-7 Multimedia Data Cartridge. Proc. of the 10th ACM Int. Conf. on Multimedia, Dec. 2002

23. van Setten, M., Oltmans, E.: Demonstration of a Distributed MPEG-7 Video Search and Retrieval Application in the Educational Domain. Proc. of the 9th ACM Int. Conf. on Multimedia, Oct. 2001

24. Löffler, J., Biatov, K., Eckes, C., Köhler, J.: IFINDER: an MPEG-7-based Retrieval System for Distributed Multimedia Content. Proc. of the 10th ACM Int. Conf. on Multimedia, Dec. 2002

25. Lee, H.K., Yoon, J.H., Lim, J.H., Jung, K.S., Jung, Y.J., Kang, K.O., Ro, Y.M.: Image Indexing and Retrieval System Using MPEG-7. IWAIT 2001, Feb. 2001, 63–68

26. Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative Searching. Comm. of the ACM, 18(9), Sep. 1975

27. Salton, G., Fox, E., Wu, H.: Extended Boolean Information Retrieval. Comm. of the ACM, 26 (12), 1983, 1022–1036

28. David C. Fallside: XML Schema Part 0: Primer, W3C Recommendation. May 2 2001, http://www.w3.org/TR/xmlschema-0/

29. Berners-Lee, T., Fielding, R., Masinter, L., Uniform Resource Identifiers (URI): Generic Syntax (RFC2396). Aug. 1998, http://www.ietf.org/rfc/rfc2396.txt