MPEG 7 Overview

MPEG-7: Overview of MPEG-7 Description Tools, José M. Martínez, Copyright © 2002 IEEE. Reprinted from IEEE Computer Society, July-September 2002 This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of MPEG's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

1070-986X/02/$17.00 © 2002 IEEE 83

Standards Editor: Peiya LiuSiemens Corporate Research

MPEG-7,1,2 which became ISO/IEC 15398standard in Fall 2001, is the standard for

describing multimedia content that provides therichest multimedia content description tools forapplications ranging from content management,organization, navigation, and automated pro-cessing. The MPEG-7 standard defines a largelibrary of core description tools, and a set of sys-tem tools provides the means for deploying thedescription in specific storage and transport envi-ronments. MPEG-7 addresses many differentapplications in many different environments,which means it needs to provide a flexible frame-work for describing multimedia data, includingextensibility (using the Description DefinitionLanguage) and restrictibility (via the MPEG-7Profiles under specification).

Part one of this article provided a compre-hensive overview of MPEG-7’s motivation, objec-tives, scope, and components. In this issue, I’ll

give more details on the MPEG-7 descriptiontools, which comprise all of MPEG-7’s predefineddescriptors and description schemes. We can alsodefine additional description tools for specificapplications using the MPEG-7 Description Defi-nition Language (DDL), an extension of the XMLSchema.3,4

We can group these description tools in dif-ferent classes according to their functionality (seeFigure 1). These description tools are standard-ized by three of the six parts that make up theMPEG-7 standard:5–10

❚ Part 3 (Visual6) standardizes the descriptorsrelated to visual features that apply to imagesand/or videos;

❚ Part 4 (Audio7) standardizes the descriptiontools related to audio features, covering areasfrom speech to music; and

José M. MartínezPolytechnic

University ofMadrid, Spain

MPEG-7Overview of MPEG-7 Description Tools, Part 2

Collection andclassification

Mediainformation

Usageinformation

Userpreferences

Usagehistory

Creationinformation

ModelsContent organization

Content management

Spatio–temporalstructure

Semanticstructure

Partitions anddecompositions

Summaries

VariationsAudio and visual

features

Content description

Datatypes andstructures

Link and medialocalization

Root and top-level elements

PackagesBasicelements

Schematools

Userinteraction

Navigationand

access

Figure 1. MPEG-7description toolsoverview.

84

IEEE

Mul

tiM

edia

Standards

❚ Part 5 (Multimedia Description Schemes[MDS]8) standardizes the description toolsrelated to features applying to audio, visual,and audio–visual content.

The last two parts of the standard, Confor-mance11 and Extraction and Use of MPEG-7Descriptions,12 are still under development.

Schema tools and basic elementsTo create MPEG-7 descriptions of any multi-

media content, the first requirement is to builda wrapper for the description using the SchemaTools. The different description tools we can usefor the description use generic basic elements,which are the MPEG-7 bricks that extend theones provided by XML (built-in data types13).

Wrappers: Root and top-level elementsThere are two different valid types of MPEG-7

descriptions: description units and completedescriptions. Each MPEG-7 description shouldstart with the MPEG-7 root element (<Mpeg7>)including the description metadata header(<DescriptionMetadata>), which providesmetadata about the description, and either adescription unit (<DescriptionUnit>) or acomplete description (<Description>). Thedescription unit lets us create valid MPEG-7descriptions containing any description tool

instance as the component of thedescription. This lets us send onlypart of a whole description whenan application sends a request for aspecific component from a com-plete description. The completedescription tag implies that theenclosed description’s structure fol-lows one of the MPEG-7 top-levelelements, which are organized inthree groups (see Figure 2):

❚ Content Management top-level elements describe manage-ment-related aspects of thecontent such as media, creation,and usage.

❚ Content Entity top-level ele-ments describe multimedia enti-ties such as image, video, audio,audio–visual, and multimediacontent.

❚ Content Abstraction top-level elementsdescribe abstractions of the content such assemantics, models, summaries, and variations.

These top-level elements are the startingpoints of any MPEG-7 complete description. Fig-ure 3 presents a snapshot of an MPEG-7 descrip-tion example.

Additionally, optional relationships elements letus describe the relationships among differenttop-level elements within the same MPEG-7complete description.

Finally, package tools let us group descriptiontools in any combination and rename them.These description tools are intended to let us cus-tomize description tools for predesigned applica-tions (such as search engines).

Bricks: Basic elementsThe basic elements are the generic entities that

various description tools use as building blocks.They include Audio basic elements, Visual basicelements, basic numerical data types (such asnumbers, matrices, and vectors), string data types(such as country, region, and currency codes fol-lowing ISO standards), links and locators (such astime, media locators, and referencing descriptiontools), and other basic description tools for peo-ple, places, textual annotations, controlled vocab-ularies, and so forth. Although MPEG-7 wasn’t

CompleteDescription(abstract)

ContentDescription(abstract)

ContentManagement

(abstract)

ContentAbstraction(abstract)

ContentEntity

Multimedia Content (abstract)

- Image- Video- Audio- Audio–Visual- Multimedia Content- Multimedia Collection- Signal- Ink Content- Analytic Edited Video

- Semantic Description- Model Description- Summary Description- View Description- Variation Description

- User Description- Media Description- Creation Description- Usage Description- Classification Scheme Description

Figure 2. Typehierarchy for top-level elements.

targeted at the standardization of description toolsfor supporting textual descriptions, some (see the“Textual Annotation and Controlled Vocabular-ies Description Tools” sidebar) were necessary toprovide a complete set of description tools foraudio–visual content.

Content Management top-level elementsThe five top-level elements dealing with con-

tent management description are derived fromthe abstract Content Management DescriptionScheme. (In this article, we use abstract in theXML Schema sense, that is, abstract descriptiontools are instantiated as description tools derivedfrom them, and therefore, we need to have anxsi:type attribute to specialize the descriptiontool instance [see Figure 3].) The content man-agement description deals with informationrelated to content, but the information is inde-pendent of what is in the content itself. It covers,

among other things, classical archival and usageinformation.

❚ Media Description top-level element. The MediaDescription top-level element describes thecontent using the Media Information Descrip-tion Scheme, which encapsulates the differentdescription tools for describing the multime-

85

July–Septem

ber 2002

<Mpeg7>

<DescritpionMetadata>...</DescriptionMetadata>

<Description xsi:type=“ContentEntity”>

<MultimediaContent xsi:type=“VideoType”>

<Video id=“video_example”>

<MediaInformation>...</MediaInformation>

<TemporalDecomposition gap=“false” overlap=“false”>

<VideoSegment id=“VS1”>

<MediaTime>

<MediaTimePoint>

T00:00:00</MediaTimePoint>

<MediaDuration>PT2M</MediaDuration>

</MediaTime>

<VisualDescriptor xsi:type=“GoFGoPColorType”

aggregation=“average”>

<ScalableColor numOfCoef=“8”

numOfBitplanesDicarded=“0”>

<Coeff>1 2 3 4 5 6 7 8</Coeff>

</ScalableColor>

</VisualDescriptor>

</VideoSegment>

<VideoSegment id=“VS2”>

<MediaTime>

<MediaTimePoint>T00:02:00

</MediaTimePoint>

<MediaDuration>PT2M</MediaDuration>

</MediaTime>

<VisualDescriptor xsi:type=“GoFGoPColorType”

aggregation=“average”>

<ScalableColor numOfCoef=“8”

numOfBitplanesDicarded=“0”>

<Coeff>8 7 6 5 4 3 2 1</Coeff>

</ScalableColor>

</VisualDescriptor>

</VideoSegment>

</TemporalDecompostion>

</Video>

</MultimediaContent>

</Description>

</Mpeg7>

Figure 3. Snapshot of an MPEG-7 description example.

Textual Annotation andControlled Vocabularies

Description ToolsSeveral Textual Annotation description tools

provide different ways of creating textual anno-tation, from keyword- to linguistic-orientedannotation, having also a structured annotationbased on the seven terms who, what action,what object, where, when, why, and how.

Controlled Vocabularies description tools letus have controlled terms for descriptors—thatis, the descriptor values are selected from a con-trolled vocabulary or thesaurus. The descriptiontool defining the controlled vocabularies is theClassification Scheme Description Scheme,which includes terms with associated multilin-gual labels and descriptions. We can group theterms in hierarchies or more complex relation-ships. Descriptors using classification schemescan be of Term Use type, which means thatthey can include controlled and not controlledterms, or of Controlled Term Use type, whichmeans the term must be from a classificationscheme. Classification schemes in MPEG-7 mustbe used as defined, without modification. Nev-ertheless, MPEG-7 lets us use other thesauri orcontrolled vocabularies to generate descrip-tions, after they are instantiated as an MPEG-7classification scheme. The establishment of aregistration authority for MPEG-7 classificationschemes is currently in progress.

dia content’s coding aspects. The Media Infor-mation Description Scheme is composed of anoptional Media Identification DescriptionScheme for identifying the multimedia con-tent independently of the different availableinstances, and one or more Media ProfileDescription Schemes for describing a profileof the multimedia content. A media profilerefers to the parameters of each variation ofthe content that is produced from a contententity (such as a recording of a reality—suchas a concert or sports event—or synthetic con-tent generated by an authoring tool). The pro-file includes the description of the mediaformat (file format and coding parameters)and transcoding hints and quality. For eachprofile, there’s also a description of all theavailable instances with an identifier and alocator. Figure 4 depicts these concepts.

❚ Creation Description top-level element. The Cre-ation Description top-level element describesthe content using the Creation InformationDescription Scheme, which includes descrip-tion tools for creation and production infor-mation (such as author, title, characters, anddirector), classification information (such astarget audience, genre, and rating), and relat-ed materials.

❚ Usage Description top-level element. The UsageDescription top-level element describes thecontent using the Usage Information Descrip-tion Scheme, which provides the descriptiontools for pointing to rights, usage information(such as availability and audience) and finan-cial information (such as costs and prices).

❚ Classification Scheme Description top-level ele-ment. The Classification Scheme Descriptiontop-level element lets us describe a classifica-tion scheme (see the “Textual Annotation andControlled Vocabularies Description Tools”sidebar) for a term-based descriptor used todescribe the content.

❚ User Description top-level element. The UserDescription top-level element describes user-related information of the content making useof the Agent Description Scheme (for the userdescription), the User Preferences DescriptionScheme (for enabling effective and personalizedaccess such as filtering, searching, and browsingpreferences), and the Usage History DescriptionScheme (for logging user actions that mighthelp refine user preferences).

Content Description top-level elementsThe different top-level elements dealing with

86

IEEE

Mul

tiM

edia

Standards

Copiedinstance

Originalinstance

Copiedinstance

Copy(to a differentprofile)

Copy(to a differentprofile)

Content Entity

Copy(in the same profile)

Copy(in the sameprofile)

Record

RealityUsage InformationCreation InformationMedia Information

Usage InformationCreation InformationMedia Information

Originalinstance

CopiedInstance

CopiedInstance

RecordMasterprofile

Media profile Media instance

Describes Describes

Content Entity

Media instance

Descriptions

Figure 4. Relationshipsbetween the ContentManagementdescription tools andthe content instances.

content description are derived from the abstractContent Description Description Scheme. Con-tent description deals with entities (ContentEntity top-level elements) and abstractions (Con-tent Abstraction top-level elements).

Content Entity top-level elementThe Content Entity top-level element

describes multimedia content by instantiatingone or several description schemes derived fromthe abstract Multimedia Content DescriptionScheme. As Table 1 shows, these derived descrip-tion schemes use different description tools:

❚ Still Region Description Scheme. The Still RegionDescription Scheme extends the SegmentDescription Scheme for the description of animage or a 2D spatial region, not necessarilyconnected in space, because there are descrip-tion tools for defining the regions and theirconnectivity. We can incorporate any Visualdescription tool (see the “Visual DescriptionTools” sidebar) into the Still Region Descrip-tion Scheme-based description.

❚ Video Segment Description Scheme. The VideoSegment Description Scheme extends the Seg-ment Description Scheme for the descriptionof a video or groups of frames, not necessarilyconnected in space or time, because there aredescription tools for defining regions, intervals,and their connectivity. We can incorporate anyVisual description tool (see the “Visual Descrip-tion Tools” sidebar) into the Video SegmentDescription Scheme-based description.

❚ Audio Segment Description Scheme. The AudioSegment Description Scheme extends the Seg-ment Description Scheme for the descriptionof an audio or groups of audio samples, notnecessarily connected in time, because thereare description tools for defining intervals andtheir connectivity. We can incorporate anyAudio description tool (see the “Audio Descrip-tion Tools” sidebar, next page) into the AudioSegment Description Scheme-based descrip-tion.

❚ Audio–Visual Segment Description Scheme. TheAudio–Visual Segment Description Schemeextends the Segment Description Scheme forthe description of an audio–visual content orsegments of it, which corresponds to theaudio and video content in the same tempo-

87

July–Septem

ber 2002

Visual Description ToolsISO/IEC 15938-3, MPEG-7 Visual, standardizes the

description tools we use to describe video and image con-tent. The Visual descriptors (there are no Visual DescriptionSchemes) are based on visual features that let us measure sim-ilarity in images or videos. Therefore, we can use the MPEG-7 Visual Descriptors to search and filter images and videosbased on several visual features like color, texture, objectshape, object motion, and camera motion.

We can classify the MPEG-7 Visual Descriptors into gener-ic and high-level (application-specific) description tools. Thegeneric Visual descriptors let us describe color, texture, shape,and motion features. The high-level descriptors providedescription tools for face-recognition applications.

The generic Visual descriptors are grouped as follows:

❚ Basic Elements (used by the other Visual descriptors): gridlayout, time series, 2D–3D multiple view, spatial 2D coor-dinates, and temporal interpolation.

❚ Color Descriptors: Color Space, Color Quantization, Scal-able Color, Dominant Color, Color Layout, Color Struc-ture, and Group-of-Frames/Group-of-Pictures Color.

❚ Texture Descriptors: Homogeneous Texture, Non-Homogeneous Texture (Edge histogram), and TextureBrowsing.

❚ Shape Descriptors: Region-Based, Contour-Based, and 3DShape.

❚ Motion Descriptors (for video): Motion Activity, CameraMotion, Parametric Motion, and Motion Trajectory.

❚ Location Descriptors: Region Locator and Spatio–TemporalLocator.

Table 1. Multimedia content derived description tools and theircomponents.

Content Description Type Description SchemeImage Still RegionVideo Video SegmentAudio Audio SegmentAudio–visual Audio–Visual SegmentMultimedia Multimedia SegmentMultimedia Collection Collection or Structured CollectionSignal Still Region, Video Segment, or Audio SegmentInk Content Ink ContentAnalytic Edited Video Analytic Edited Video

ral intervals. The segments don’t need to beconnected in space and/or time, because thereare description tools for defining spatio–temporal segments and their connectivity. Wecan incorporate Visual and Audio descriptiontools into the Audio–Visual Segment Descrip-tion Scheme through the Audio SegmentDescription Scheme and Video SegmentDescription Scheme that can appear withinthe description’s decomposition level.

❚ Multimedia Segment Description Scheme. TheMultimedia Segment Description Schemeextends the Segment Description Scheme forthe description of a multimedia content orsegments of it, including audio, video, andpossible other media. It uses the Media Source

Decomposition Description Scheme thatdescribes the media source decomposition inone or more segments, including their seg-mentation criteria, their spatio–temporal seg-mentation relationships (gap and overlap),and the resulting segments using SegmentDescription Schemes or references.

❚ Collection Description Scheme. The CollectionDescription Scheme comprises differentdescription tools that we use to describe col-lections derived from the abstract CollectionDescription Scheme. These derived descrip-tion tools are Content Collection DescriptionScheme (for collections of multimedia con-tent), Segment Collection Description Scheme(for collections of segments), Descriptor Col-

88

IEEE

Mul

tiM

edia

Standards

ISO/IEC 15938-3, MPEG-7 Audio, standardizes the descriptiontools for describing audio content. Most Audio description toolsare based on audio features that let us measure similarity in sounds(such as music and speech). Therefore, we can use these MPEG-7Audio descriptors and description schemes to search and filteraudio content based on several audio features like spectrum, har-mony, timbre, and melody. Other Audio description tools let usdescribe spoken content and create a classification of sounds.

We can classify the MPEG-7 Audio description tools intogeneric and high-level description tools. The generic Audiodescription tools include a group of low-level descriptors foraudio features, named the MPEG-7 Audio Framework (see Fig-ure A), that let us describe an audio signal’s spectral, parametric,and temporal features. The high-level group provides descriptiontools for sound recognition andindexing, spoken content, andquery-by-humming applications,among other things.

We can group the high-levelaudio description tools by the func-tionality they support as follows:

❚ Robust audio matching, supportedby the Audio Signature Descrip-tion Scheme, which describesspectral flatness of sounds;

❚ Timbre matching (identification,search and filtering), supportedby the Harmonic Instrument Tim-bre and the Percussive Instru-ment Timbre Descriptors;

❚ Melodic search, supported by the Melody Contour DescriptionScheme (efficient melody description) and the MelodySequence Description Scheme (complete melody description);

❚ Sound recognition and indexing, supported by the SoundModel Description Scheme, the Sound Classification ModelDescription Scheme, the Sound Model State Path Descrip-tor, and the Sound Model State Histogram Descriptor; and

❚ Spoken content, supported by the Spoken Content LatticeDescription Scheme and the Spoken Content HeaderDescriptor.

Audio Description Tools

Silence Description

Timbral TemporalLog Attack Time Description

Temporal Centroid Description

BasicAudio Waveform Description

Audio Power Description

Basic SpectralAudio Spectrum Envelope DescriptionAudio Spectrum Centroid DescriptionAudio Spectrum Spread DescriptionAudio Spectrum Flatness Description

Spectral BasicAudio Spectrum Basis Description

Audio Spectrum Projection Description

Signal ParametersAudio Harmonicity Description

Audio Fundamental Frequency Description

Timbral SpectralHarmonic Spectral Centroid DescriptionHarmonic Spectral Deviation DescriptionHarmonic Spectral Spread Description

Harmonic Spectral Variation DescriptionSpectral Centroid Description

Audio Framework

Figure A. Overview of the MPEG-7 Audio Framework.

lection Description Scheme (for collections ofdescriptors), Concept Collection DescriptionScheme (for collections of semantic concepts,like objects and events), and Mixed CollectionDescription Scheme (for collections includingany of these components).

❚ Structured Collection Description Scheme. TheStructured Collection Description Scheme letsus describe relationships among collectionsand models. The relationships can indicate var-ious relations among the items, such as thesimilarity of collections in terms of features orthe overlapping of semantic meaning. Besides aGraph Description Scheme for describing rela-tionships, it includes description tools fordescribing the collections, models, and clusters.

❚ Ink Content Description Scheme. The Ink Con-tent Description Scheme extends the SegmentDescription Scheme for the description of asegment of ink data, not necessarily connect-ed in space or time, as there are descriptiontools for defining spatio–temporal segmentsand their connectivity. We can incorporateany Visual description tool (see the “VisualDescription Tools” sidebar) and descriptiontools specific to ink data (such as ink mediaand creation and handwriting recognition)into the Ink Segment Description Schemebased description.

❚ Analytic Edited Video Description Scheme. TheAnalytic Edited Video Description Schemeextends the Video Segment DescriptionScheme for the description of an edited videosegment (such as shots and transitions) froman analytic point of view—that is, thedescription is made (either automatically, withsupervision, or manually) after the edition ofthe video. The Analytic Edited VideoDescription Scheme adds, among others,description tools for the spatio–temporaldecomposition from the viewpoint of videoediting (shots and transitions), and editing-level location and reliability.

Most of these description tools are derivedfrom the abstract Segment Description Scheme.This description scheme includes descriptiontools that let us annotate the differentspatio–temporal segments that can compose amultimedia content, instead of the content as awhole. These description tools include, among

others, the Media Information, Creation Infor-mation, Usage Information, Text Annotation,and Semantic Description Schemes.

Content Abstraction top-level elementsContent abstraction deals with the description

of secondary content representations that are cre-ated from or are related to the multimedia con-tent. Semantics depicted in the content,summaries of a video, or a model of audio–visualfeatures are abstractions of multimedia content.The different top-level elements dealing withcontent abstraction are derived from the abstractContent Abstraction Description Scheme:

❚ Semantic Description top-level element. TheSemantic Description top-level element usesthe Semantic or Concept Collection Descrip-tion Schemes. The Semantic DescriptionScheme lets us describe reality or fiction (nar-rative world) depicted by, represented by, orrelated to the multimedia content. The Seman-tic Description Scheme includes specializedsemantic description tools (derived from theSemantic Base Description Scheme) for describ-ing objects (Object Description Scheme),events (Event Description Scheme), agents(Agent Object Description Scheme), concepts(Concept Description Scheme), and their rela-tionships. Figure 5 (next page) illustrates a con-ceptual description using these descriptiontools.

❚ Model Description top-level element. The ModelDescription top-level element uses the differ-ent description tools available for describingmodels. These description tools are derivedfrom the abstract Model Description Schemeand let us describe different specialized modeltypes—for example, probabilistic, analytic,and classification models.

❚ Summary Description top-level element. The Sum-mary Description top-level element uses theSummarization Description Scheme, whichlets us describe a set of summaries to enablerapid browsing, navigation, visualization, andsonification of multimedia content. Eachsummary is specified using the SummaryDescription Scheme, which lets us describeeither hierarchical or sequential summaries.

❚ View Description top-level element. The ViewDescription top-level element uses different

89

July–Septem

ber 2002

description tools for the description of contentviews—that is, a space and frequency partitionof a multimedia content (signal). The abstractView Description Scheme is used to describe theSource and Target View image, video, or audiosignal (including their locations), and it’s spe-cialized in the different description tools fordescribing the specified types of views (space,

frequency, resolution, space resolution andspace frequency). We use the View Set Descrip-tion Scheme to describe a set of views, indicat-ing completeness and redundancy in thecoverage of the space and/or frequency planesas well as the types of views in the set.

We use the abstract View DecompositionDescription Scheme as the base to describe adecomposition of a multimedia content (sig-nal) into views. Specific description tools forthe description of view decompositions are theView Set, Space Tree, Frequency Tree, SpaceFrequency Graph, Video View Graph, andMultiresolution Pyramid Description Schemes.

❚ Variation Description top-level element. The Vari-ation Description top-level element uses thetwo description tools for the description of con-tent variations, including summarization,modality translation, reductions (such as thenumber of colors, spatial size, temporal dura-tion, compression, and frame rate reduction),and so on. The Variation Description Schemeincludes description tools, among others, forthe description of the source, the type of varia-tion, and the source’s fidelity. We use the Vari-ation Set Description Scheme to describe a setof variations (using the Variation DescriptionScheme) of a source.

Describing multimedia assetsThe first and main use of MPEG-7 will be

describing individual multimedia assets, creatingMPEG-7 documents that user queries14 or filter-ing agents15 will access. Nevertheless, as I’vealready described, the possibility exists of usingthe MPEG-7 description tools to describe collec-tions of multimedia assets and to annotate userpreferences and usage history that applicationsrunning on the devices accessing the content(such as set-top boxes) will use.

When describing a multimedia asset, thereare two main options. One is to describe it as awhole—that is, without describing parts of it.This option implies that the multimedia assetwill have no structural decomposition, andtherefore, it will be composed by a ContentManagement description and/or a ContentAbstraction description. These descriptions maybe wrapped by a Multimedia Content descrip-tion, but without the full potential of a descrip-tion organized by the content’s spatio–temporalstructure. Spatio–temporal structured descrip-

90

IEEE

Mul

tiM

edia

Standards

Table 2. Features associated to the main spatio–temporal region descriptiontools.

Video Still Moving AudioFeature Segment Region Region Segment

Time (MDS) x x xShape (Visual) x xColor (Visual) x x xTexture (Visual) xMotion (Visual) x xCamera motion (Visual) xAudio features (Audio) x x

Semantic TimeDescription Scheme

ObjectDescription

Scheme

Agent ObjectDescription

Scheme

EventDescription

Scheme

Semantic PlaceDescription Scheme

"7-8 p.m., 14 Oct. 1998 "Carnegie Hall"

location oftime of

state change of

describes

describes

describes

"Play"

musician

"Tom'stutor"

"TomDaniels" "piano"

Abstractionnonperceivable

Interpretationperceivable

Narrative world

Figure 5. Example of conceptual aspects description.

tions are one of MPEG-7’s main, innovative, andpowerful features. It provides the functionalitiesfor describing content at different levels of detailbased on a spatial and temporal segmentation.

Table 2 lists the main description tools fordescribing spatio–temporal regions, togetherwith the main audio–visual features (see the“Visual Description Tools” and “Audio Descrip-tion Tools” sidebars), that we can include in eachdescription. Besides this feature, we can attachcontent management and semantic descriptionsto each region in the decomposition.

Figure 6 depicts an example of decomposition-and region-based description. We can annotatethe whole image with a Still Region covering thewhole image, including a Content Managementdescription, a Textual description, and Visualdescriptions applied to the whole frame. Afterspatial segmentation at different levels, we candescribe the different regions using differentdescription tools (from the ones allowed with theStill Region Description Scheme), depending onthe nature of the region. We can use SegmentRelationship description tools to describe rela-tionships among the different regions.

In the case of video and audio assets, MPEG-7provides description tools for describing static spa-tial regions as well as temporal regions (segments)and moving regions (in the case of video, ofcourse). Figure 7 shows a snapshot of a video with

91

SR1:• Creation, usage metainformation• Media description• Textual annotation• Color histogram, texture

SR2:• Shape• Color histogram• Textual annotation




SR8:• Color histogram• Textual annotation

SR7:• Color histogram• Textual annotation

SR5:• Shape• Textual annotation

No gap, no overlap

No gap, no overlap

Gap, no overlap

Gap, no overlap

Figure 6. Examples ofimage description withStill Regions.

Video segment 1: Pass

Video segment 2: Kick and score

Moving region:Players

Moving region:Goal keeper

Moving region:Ball

Still region:Goal

Figure 7. Example of video segments and regions.

two temporal segments, each of them with sever-al regions (still and moving). To describe the com-position and positional relationship of regionswithin the same segment and the (identity) rela-tionship of a same region from one segment toanother, we can use the Segment RelationshipGraph description tool (see Figure 8).

Further reading and resourcesFor more information about MPEG-7, visit the

MPEG homepage (http://mpeg.tilab.com/) andthe MPEG-7 Alliance Web site (http://www.mpeg-industry.com). These Web pages containlinks to a wealth of information about MPEG-7,many publicly available MPEG documents, sev-eral lists of Frequently Asked Questions, and linksto other MPEG-7 Web pages. Complete MPEG-7schemas and description examples are availableat the MPEG-7 Schema page (http://pmedia.i2.ibm.com:8000/mpeg7/schema). You can vali-date MPEG-7 descriptions using the NIST MPEG-7 Validation Service (http://m7itb.nist.gov/M7Validation.html). MM

AcknowledgmentsThis article is mainly based on the MPEG pub-

lic document Overview of the MPEG-7 Standard.16 Ithank all the contributors to this living document.

References1. B.S. Manjunath, P. Salembier, and T. Sikora, eds.,

Introduction to MPEG-7: Multimedia Content Descrip-tion Language, John Wiley & Sons, New York, 2002.

2. S.F. Chang et al., special issue on MPEG-7, IEEETrans. Circuits and Systems for Video Technology, vol.11, no. 6, June 2001.

3. XML Schema Part 0: Premier, World Wide Web Con-sortium Recommendation, May 2001,http://www.w3.org/TR/xmlschema-0.

4. XML Schema Part 1: Structures, World Wide WebConsortium Recommendation, May 2001,http://www.w3.org/TR/xmlschema-1.

5. ISO/MPEG N4285, Text of ISO/IEC Final Draft Inter-national Standard 15938-1 Information Technology -Multimedia Content Description Interface - Part 1 Sys-tems, MPEG Systems Group, Sydney, July 2001.

6. ISO/MPEG N4288, Text of ISO/IEC Final Draft Inter-national Standard 15938-2 Information Technology -Multimedia Content Description Interface - Part 2Description Definition Language, MPEG SystemsGroup, Sydney, July 2001.

7. ISO/MPEG N4358, Text of ISO/IEC Final Draft Inter-national Standard 15938-3 Information Technology -Multimedia Content Description Interface - Part 3Visual, MPEG Video Group, Sydney, July 2001.

8. ISO/MPEG N4224, Text of ISO/IEC Final Draft Inter-national Standard 15938-4 Information Technology -Multimedia Content Description Interface - Part 4Audio, MPEG Audio Group, Sydney, July 2001.

9. ISO/MPEG N4242, Text of ISO/IEC Final Draft Inter-national Standard 15938-5 Information Technology -Multimedia Content Description Interface - Part 5

92

IEEE

Mul

tiM

edia

Standards

Video segment:Pass

Is composed of

Video segment: Kick and score

Player 1 Player 2Ball

Ball Goal keeper Player 2 Goal

Is close to Right of

In front of

Moves toward

Is composed of

Moves toward

Same as Same as

Figure 8. Example of a Segment Relationship Graph for the image in Figure 7.

93

Multimedia Description Schemes, MPEG MultimediaDescription Schemes Group, Sydney, July 2001.

10. ISO/MPEG N4206, Text of ISO/IEC Final Draft Inter-national Standard 15938-6 Information Technology -Multimedia Content Description Interface - Part 6 Ref-erence Software, MPEG Implementation StudiesGroup, Sydney, July 2001.

11. ISO/MPEG N4633, Text of ISO/IEC Final CommitteDraft International Standard 15398-7 InformationTechnology - Multimedia Content Description Inter-face - Part 7 Conformance, Jeju, March 2002.

12. ISO/MPEG N4579, Text of ISO/IEC Draft TechnicalReport 15398-8 Information Technology - MultimediaContent Description Interface - Part 8 Extraction andUse of MPEG-7 Descriptions, Jeju, March 2002.

13. XML Schema Part 2: Data types, World Wide WebConsortium Recommendation, May 2001,http://www.w3.org/TR/xmlschema-2.

14. P. Liu, A. Chakraborty, and L.H. Hsu, “A PredicateLogic Approach to MPEG-7 XML Document

Queries,” J. Markup Languages: Theory and Practice,vol. 3, no. 3, 2002, pp. 1-17.

15. P. Salembier et al., “Decription Schemes for VideoPrograms, User, and Devices,” Signal Processing:Image Communication, vol. 16, nos. 1–2, Sept.2000, pp. 211-234.

16. ISO/MPEG N4674, Overview of the MPEG-7Standard, v 6.0, J.M. Martínez, ed., MPEG Require-ments Group, Jeju, Mar. 2002.

Readers may contact José M. Martínez at Grupo deTratamiento de Imágenes, Depto. Señales, Sistemas y Radio-Comunicaciones, E.T.S.Ing. Telecomunicación, UniversidadPolitécnica de Madrid, E-28040 Madrid, Spain, [email protected]

Contact Standards editor Peiya Lui, Siemens CorporateResearch, 755 College Road East, Princeton, NJ 08540,email [email protected].

Sandy Brown

10662 Los Vaqueros Circle

Los Alamitos, California 90720-1314

USA

Phone: +1 714 821-8380

Fax: +1 714 821-4010

[email protected]

For production information, conference,

and classified advertising, contact

Debbie Sims

10662 Los Vaqueros Circle

Los Alamitos, California 90720-1314

Phone: +1 714 821-8380

Fax: +1 714 821-4010

[email protected]

http://computer.org

Advertising Sales Offices

Adobe 95

Canon C3

CentraOne 95

Chyron 95

Envivio 94

France Telecom C3

IBM 94

IBM 94

ICASSP C4

ITT Cannon 95

NotePage 95

Optibase 94

Panasonic C3

Pep Modular Computers 94

Pixo 95

Pulse TV 94

ScanSoft C3

Serious Magic C3

Smart Technologies 95

Teac 94

3M Touch Systems 94

Advertiser / Products Page Number

MPEG 7 Overview

Documents

Transcript of MPEG 7 Overview