Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report...

101
Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA Villers les Nancy, France Chinese Academy of Sciences Beijing, China 06/23/22

Transcript of Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report...

Page 1: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

Overview of MPEG-7Overview of MPEG-7

Dr Zhang Sen

Speech Group, INRIA-LORIAVillers les Nancy, France

Chinese Academy of SciencesBeijing, China

04/18/23

Page 2: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

2

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 3: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

3

Ozone WP2 architecture

Ozone application

Software Environment layer

Oz

on

e

Servic

es

Situation Sensitivity

User Context

OzoneContext

Multi-modal widgets

Dialog management

smartagent User

Interfacemana-

gement Percep-

tion QoS

Security

speechrecognition

videobrowser

...

animated agent

Authen-tication

User-interaction module

gesturerecognition

Page 4: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

4

90 92 94 98 99 01 ?

v1 v2

mpeg1 mpeg2 mpeg4 mpeg7 mpeg21

• MPEG-3, ever defined, but abandoned

• MPEG-5 and -6, not defined

From MPEG-1 to MPEG-7

Page 5: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

5

MPEG-1 – Coding of moving pictures and audio for digital

storage media (CD-ROM, MP3), 11/92

MPEG-2 – Generic Coding of moving pictures and audio

information (DVD, Digital TV), 11/94

MPEG-4 – Coding of Audiovisual Objects for MM appls

Ver1 09/98, Ver2 11/99

MPEG-7 – Multimedia content description for AV material 08/01

MPEG-21 – Digital AV framework: Integration of

multimedia technologies, 11/01

MPEG Family

Page 6: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

6

Why is MPEG-7 needed

• Digital audiovisual information increasing– more and more available contents– all kinds of sources of information

• Use of the digital audiovisual information– description of the contents– fast search of the contents

Page 7: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

7

Objective of MPEG-7

• Standardize content-based description for various types of audiovisual information – Enable fast and efficient content searching, filtering and

identification

– Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.)

– Address a large range of applications

• Types of audiovisual information: – Audio, speech

– Moving video, still pictures, graphics, 3D models

– Information on how objects are combined in scenes

Page 8: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

8

Scope of MPEG-7

• The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device, ...) are non normative parts of MPEG-7.

• The goal is to define the minimum that enables interoperability.

DescriptionDescriptiongeneration

Description consumption

Scope of MPEG-7Research and

future competitionResearch and

future competition

Page 9: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

9

Scope of MPEG-7

Feature SearchExtraction Engine

MPEG-7Description

standardization

Search Engine:Searching & filteringClassificationManipulationSummarization Indexing

MPEG-7 Scope:Description Schemes (DSs)Descriptors (Ds)Language (DDL)Ref: MPEG-7 Concepts

Feature Extraction:Content analysis (D, DS)Feature extraction (D, DS)Annotation tools (DS)Authoring (DS)

Page 10: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

10

Audio in MPEG-7

• Audio content description (yes)

• Sound retrieval and classifier (yes)

• Speech synthesis (no)

• Speech recognition (no)

• Probability Models (yes)

Page 11: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

11

Parts of the MPEG-7 Standard

• ISO / IEC 15938 - 1: Systems • ISO / IEC 15938 - 2: Description Definition Language • ISO / IEC 15938 - 3: Visual • ISO / IEC 15938 - 4: Audio • ISO / IEC 15938 - 5: Multimedia Description Schemes • ISO / IEC 15938 - 6: Reference Software

Page 12: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

12

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 13: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

13

Main elements of MPEG-7

• Descriptors (D): representations of features, that define the syntax and the semantics of each feature representation (low-level).

• Description Schemes (DS): that specify the structure and semantics of the relationships between their components, which may be both Ds and DSs (high-level).

• A Description Definition Language (DDL): based on XML Schema, to allow the creation of new DSs and Ds, and to allow the extension and modification of existing DSs

• System tools: to support multiplexing of descriptions, synchronization issues, transmission mechanisms, coded representations, management and protection of intellectual property

Page 14: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

14

Relations of main elements

DS

DDL

DSDS

DSDS

D

DDD

D DSDS

DS

DD

D

Page 15: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

15

Description Definition Language

• Description Definition Language (DDL) is a language

that define what description is valid, and allows the

creation of new Description Schemes and Descriptors.

It also allows the extension and modification of existing

Description Schemes• DDL is used to define a set of formal rules

• ordering of the elements

• occurrences of elements

……...

• XML + MPEG-7 extensions

Page 16: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

16

• Why choose XML as the base for the DDL? • The popularity of XML• The interoperability with other standards in the future

• Why XML should be extended for MPEG-7?• SGML > XML• Structural extensions• Datatype extensions

XML: Base for DDL

Page 17: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

17

DDL parser

DDL parser is a software to check if

a description is valid

Description Parser

Schema

YesorNo

Page 18: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

18

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 19: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

19

Type of descriptions

• Low level description (features, etc)• Generic and flexible • Intelligent / efficient search engine

• High level description (structures, concepts,etc)• Efficient and powerful • Lack of flexibility

Page 20: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

20

Low-level Description

• Information in the creation and production processes• director, title, short feature movie

• Information related to the usage of the content • copyright pointers, usage history, broadcast schedule

• Information on the storage features of the content • storage format, encoding

• Information about low-level features in the content • colors, textures, sound timbres, melody

Page 21: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

21

High-level Description

• Structural description – video segments, frames, still and moving regions,

audio segments– Segment DS (representing the spatial, temporal or

spatio-temporal structure)• Conceptual (semantic) description

– objects, events, and notions – links of the two descriptions

Page 22: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

22

Illustration of descriptions

Page 23: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

23

Basic description

• Elements– Information containers– containing data and other elements– <city> …… </city>

• Attributes– Attribute-value pairs used to characterize elements– <city population=“10000”> …… </city>

Page 24: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

24

Structured descriptions

• Structured descriptions are trees• Trees are suitable for retrieval and search

DS

DS DS D

D D DD

Page 25: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

25

Description trees<letter>

<header><name> Mr Sen </name><address>

<street> 16 rue Laplace </street><city> Nancy </city>

</address></header><text> Dear Mr White, …</text>

</letter>

text

name

letter

header

address

street city

Page 26: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

26

Example: Audio description

<Mpeg7Main><DescriptionMetadata>

<Version>1.0</Version></DescriptionMetadata><ContentDescription>

<AudioContent xs1:type=“AudioType”><Audio>

<CreationInformation><Creation>

<Title> The daily news </Title></Creation>

</CreationInformation></Audio>

</AudioContent></ContentDescription>

</Mpeg7Main>

Page 27: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

27

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 28: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

28

Audio description

• Low-level Description – spectrum, parametric, and temporal features

• High-level Description– Audio signature Description Scheme – Instrument timbre Description Schemes – The melody Description Tools – Sound recognition and indexing Description To

ols– Spoken Content Description Tools

Page 29: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

29

Audio low-level descriptors

• Waveform• Loudness• Spectral basis• Spectral envelope• Spectral centroid• Spectral spread• Fundamental frequency• Harmonicity• Attack time

Page 30: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

30

Audio descriptor: Basic

• Two basic audio Descriptors– AudioWaveform Descriptor

• describes the audio waveform envelope (minimum and maximum)

– AudioPower Descriptor • describes the temporally-smoothed instantaneous po

wer

Page 31: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

31

Audio descriptor: Basic Spectral

• AudioSpectrumEnvelope Descriptor– describes the short-term power spectrum

• AudioSpectrumCentroid Descriptor – describes the center of gravity of the log-frequency po

wer spectrum

• AudioSpectrumSpread Descriptor – describing the second moment of the log-frequency po

wer spectrum

• AudioSpectrumFlatness Descriptor – describes the flatness properties of the spectrum

Page 32: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

32

Audio Signature Description

• AudioSignature Description Scheme provides a unique content identifier for the purpose of robust automatic identification of audio signals

• Applications include – audio fingerprinting– identification of audio– locating metadata for legacy audio content

Page 33: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

33

Instrument Timbre Description

• Timbre is defined as the perceptual features that make two sounds having the same pitch and loudness sound different.

• Timbre Description describes the perceptual features with a reduced set of Descriptors– HarmonicInstrumentTimbre Descriptor – LogAttackTime Descriptor– PercussiveIinstrumentTimbre Descriptor – Combination with Basic Spectral Descriptors

Page 34: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

34

Melody Description Tools

The melody Description Tools is to facilitate efficient, robust, and expressive melodic similarity matching

• MelodyContour Description Scheme– 5-step contour representation– basic rhythmic information representation

• MelodySequence Description Scheme – supporting an expanded descriptor set and high p

recision of interval encoding

Page 35: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

35

General Sound Recognition and Indexing Description Tools

• SoundModel (SM) DS– statistical model, such as HMM or GMM– SoundModelStatePath Descriptor

• consists of a state sequence generated by a SM– SoundModelStateHistogram Descriptor

• consists of a normalized histogram of the state sequence generated by a SM given an audio segment

• SoundClassificationModel DS – a trainable multi-way classifier based on SMs

• speech vs music, male vs female, trumpet vs violin• genre classification, voice recognition

Page 36: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

36

Spoken content retrieval

• Output of ASR– phone lattice or word lattice– spoken content DS stores these

lattices instead of plain text– lattices are good for retrieval

Page 37: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

37

Spoken Content Description Tools

• SpokenContentLattice– representing the actual decoding produced by a

n ASR engine

• SpokenContentHeader– contains information about the speakers being r

ecognized and the recognizer itself– WordLexicon Descriptor – PhoneLexicon Descriptor– SpeakerInfo Descriptor– ConfusionInfo Descriptor

Page 38: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

38

Gaussian DS

<Gaussian>

<Mean>

4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 2645.27 2577.09

………………………………

</Mean>

<Variance>

1.6982e+007 5.21621e+007 14.3636 9749.09 3.65743e+006

………………………………

</Variance>

</Gaussian>

Page 39: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

39

State-transition model DS<StateTransitionModel>

<Transitions size1="20" size2="20">

0 0 0.210526 0.0526316 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

……………………………………

</Transitions>

<Initial size="20">

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

</Initial>

<State label="0 players" confidence="1">

……………………………………

<State label="19 players" confidence="0.223607">

</StateTransitionModel>

Page 40: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

40

ProbabilityModelClassier DS<ProbabilityModelClassifier confidence="0.9" length="2">

<ProbabilityModelClass SemanticLabel="fish" Confidence="0.5"

DescriptorName="ColorHistogram">

<Gaussian>

<Mean>

4087.18 7173.73 1.36364 94.2727 1834.36 2359.55

………………………….

</Mean>

<Variance>

1.6982e+007 5.21621e+007 14.3636 9749.09

………………………….

</Variance>

</Gaussian>

</ProbabilityModelClass>

Page 41: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

41

SpokenContentLattice DS

A lattice structure for an hypothetical (combined phone and word) decoding of the expression “Taj Mahal drawing …”.

Page 42: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

42

MPEG-7SOUND

DATABASE

SoundModelStatePath

SoundRecognitionClassifier

HMM 2

HMM 1

HMM N-1

HMM N

MODEL REF+STATE PATH

HMMAND

BASES

SELECTAUDIOQUERY

SPECTRUMPROJECTION

N

SoundRecognitionModel

Segmented AudioDescription

AudioSpectrumBasis

Extraction of sound indexes using a sound-recognition classifier. The model reference and state

path is stored.

Page 43: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

43

MATCHING

MPEG-7SOUND

DATABASE

RESULT LIST

SoundModelStatePath

SoundRecognitionClassifier

HMM 2

HMM 1

HMM N-1

HMM N

MODEL REF+STATE PATH

HMMAND

BASIS

SELECTAUDIOQUERY

SPECTRUMPROJECTION

N

SoundRecognitionModel

AudioSpectrumBasisContinuousMarkovModel

Indexed Audio

Query-by-example application with a query in media source form. Features must be

extracted and projected into the classification space for each model

in order to match against the database.

Page 44: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

44

MATCHING

MPEG-7SOUND

DATABASE

RESULT LIST

MODEL REF +STATE PATH

DDLQUERY

An example search application utilizing a query in DDL format

Page 45: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

45

Extraction of hidden Markov model and basis functions

and storage in a DDL representation

HMMAND

BASISAUDIOWAV FILES

BASISEXTRACT

HMM

SoundRecognitionModel

FEATUREEXTRACT

AudioSpectrumBasis

SoundRecognitionFeatures ContinuousMarkovModel

Page 46: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

46

Scenario for for the spoken content Description Tools

• Recall of AV data by memorable spoken events– A film or video recording where a character or person spoke a particular

word or sequence of words. The source media would be known, and the query would return a position in the media.

• Spoken Document Retrieval– There is a database consisting of separate spoken documents. The result

of the query is the relevant documents, and optionally the position in those documents of the matched speech

• Annotated Media Retrieval– Similar to spoken document retrieval. The result of the query is the

media which is annotated with speech, and not the speech itself. An example is a photograph retrieved using a spoken annotation.

Page 47: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

47

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 48: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

48

Multimedia DSs

• Basic Elements• Content Management• Content Description• Content Organization• Navigation and Access• User Interaction

Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content

Page 49: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

49

Organization of Multimedia DSs

Page 50: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

50

Content Management• Creation and production information

– Creation information • title, textual annotation, creators, and dates

– Classification information• genre, subject, purpose, language

• Media coding, storage and file formats– format, compression, and coding

• Content usage– usage rights, usage record

Page 51: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

51

Navigation and Access

• Summaries– hierarchical summaries– sequential summaries

• Partitions and Decompositions– decompositions in space, time and frequency– used in multi-resolution access and progressive retrieval

• Variations– selection of the most suitable of an AV program– adapt to the different capabilities of terminal devices,

network conditions or user preferences

Page 52: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

52

Hierarchical summary

Page 53: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

53

Illustration of variations

Page 54: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

54

Content Organization

• Collections– group the contents into clusters

– describes statistics and models of the attribute values – describe relationships among collection clusters

• Models– model the attributes and features of AV content– Probability Model

• specify statistical functions and structures – Analytic Model

• specify semantic labels • specify the confidence• build classifiers

Page 55: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

55

Collection Structure

Page 56: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

56

User Interaction

• User Preference– context dependency in terms of time and place– relative importance of different preferences– privacy characteristics of the preferences – preferences update by agent or user

• Usage History – history of actions – used to determine the user's preferences

Page 57: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

57

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 58: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

58

eXperimentation Model(XM)

• Simulation platform for:• Ds, DSs, CSs, DDL

• XM applications: • the server (extraction) applications • the client (search, filtering and/or transcoding) applications

CS: Coding Schemes

Page 59: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

59

The XM applications

• Extraction from Media• all low-level Ds or DSs should have an application class of this type

• Search & Retrieval Application• either client application

• Media Transcoding Application• either client application

• Description Filtering Application• either client application

Page 60: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

60

Extraction from Media

Page 61: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

61

Search and retrieval application

Page 62: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

62

Media transcoding application

Page 63: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

63

Description Filtering Application

Page 64: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

64

Interface model for XM app

Page 65: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

65

Real world application

MDB = media database, DDB = description database. First, from a media database two features are extracted. Then, basing on the first feature,

relevant media files are selected from the media database. The relevant media files are transcoded basing on the second extracted feature.

Page 66: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

66

• Storage and retrieval of audiovisual databases (image, film, radio archives)

• Broadcast media selection (radio, TV programs)

• Surveillance (traffic control, surface transportation, production chains)

• E-commerce and Tele-shopping (searching for clothes / patterns)

• Remote sensing (cartography, ecology, natural resources management)

• Entertainment (searching for a game, for a karaoke)

• Cultural services (museums, art galleries)

• Journalism (searching for events, persons)

• Personalized news service on Internet (push media filtering)

• Intelligent multimedia presentations

• Educational applications nBio-medical applications

MPEG-7 application areas

Page 67: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

67

Illustration of applications

Users

Page 68: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

68

Information Flow

Feature extraction

Transmission

Storage

AV Description

Search/query

Browse

Filter

UsersUsers

PullPull

PushPush

Manual/automatic

DecodingEncoding

Page 69: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

69

Push and Pull applications

• Push applications– Example: Search engines for internet and DBs – Advantage: Many search engines work on stand

ardized descriptions

• Pull applications– Example: Broadcast of video, Interactive TV – Advantage: Intelligent agents filter standardize

d descriptions

Page 70: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

70

Example: Pull application

MPEG-7MPEG-7DatabaseDatabase

Page 71: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

71

Example: Push application

Page 72: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

72

Example: queries

• Text (keywords): – Find AV material with subject corresponding to some k

eywords • Semantic description:

– Find AV material corresponding to a specified semantic • Image as an example:

– Find an image with similar characteristics (global or local)

• A few notes of music: – Find corresponding musical pieces or movies

• Low level features (example: motion): – Find video with specific object motion trajectories

Page 73: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

73

Integration of MPEG-7 into XML

<seq begin=20s dur=10s> <img id="Image1" dur=5s> <MP7: annotation> <Who>Fernado Morientes</Who> < WhatAction >Spain vs. Sweden soccer match </ WhatAction> </MP7: annotation> </img> <img id="Image2" dur=2s /> </seq>

Page 74: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

74

Outline of contents

• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information

Page 75: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

75

MPEG-7 and other Standards

• MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information.

• MPEG-1, -2, and -4 make content available, while MPEG-7 allows you to find the content you need.

Page 76: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

76

Ultimate ambition of MPEG-7

• To make the web as searchable for multimedia content as it is searchable for text today

• To improve the use of computer systems as easy as possible

Page 77: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

77

MPEG-7 beyond

• To mould computers around human requirements and not humans around computer requirements

• To enable content disclosure based on facts, rather than on human annotations

• To find information by rich spoken queries, hand-drawn images and address what most people expect computers to be able to do

Page 78: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

78

More Information on WWW

• Major MPEG-7 documents

http://www.cselt.it/mpeg/, semi-official website

http://www.mpeg-7.com, official website

• Others

http://www.elsevier.com/locate/image

Page 79: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

79

Conclusion

AV contents

Structures

Features

Ds

DSs

DDL Ds, DSs

User

Page 80: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

80

ThankThankss

Page 81: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

81

Page 82: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

82

Low level AV descriptors

Video segments•Color •Camera motion •Motion activity •Mosaic

Moving regions•Color •Motion trajectory•Parametric motion•Spatio-temporal shape

Still regions

•Color •Shape •Position •Texture

Audio segments

•Spoken content •Spectral feature•Timbre

Page 83: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

83

Face Recognition Descriptor

• Projection of a face vector onto a set of basis vectors (face patterns)

• Feature set is extracted from a normalized face image

• Normalized face image– 56 lines with 46 intensity values in each line– The centers of the two eyes are located on the

24th row and the 16th and 31st column for the right and left eye respectively

Page 84: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

84

Segment Decomposition

Page 85: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

85

MPEG-7 Normative Interfaces

Page 86: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

86

Example: Content description

MPEG-7MPEG-7DatabaseDatabase

IndexingIndexingFea extracFea extrac

SearchSearchretrievalretrieval

High levelHigh levelprocessprocess

Low levelLow levelprocessprocess

Page 87: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

87

Segment DS Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses:

• Multimedia Segment DS• AudioVisual Region DS• AudioVisual Segment DS• Audio Segment DS• Still Region DS• Still Region 3D DS• Moving Region DS• Video Segment DS • Ink Segment DS

Page 88: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

88

Examples: T/S segments

Page 89: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

89

Example: Segment trees

Page 90: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

90

Illus of conceptual description

Object DS

Event DS

Concept DS

Semantic state DS

Semantic place DS

Semantic time DSAV content

Semantic DS

Semantic container DS

Semantic base DS

Page 91: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

91

Visual description

• Basic structures– Grid layout, Time series, Multiple view,

Spatial 2D coordinates, Temporal interpolation

• Descriptors– Color, Texture, Shape, Motion, Localization

Page 92: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

92

Example: Color Descriptors

• Color space

• Color Quantization

• Dominant Colors

• Scalable Color

• Color Layout

• Color-Structure

• GoF/GoP Color

Page 93: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

93

Example: Color space

• R,G,B

• Y,Cr,Cb

• H,S,V

• HMMD

• Linear transformation matrix with reference to R, G, B

• Monochrome

Page 94: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

94

Audio Framework

Page 95: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

95

Descriptor

• Definition A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation. • Notes A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature. • Examples For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc.

Page 96: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

96

Descriptor Value

• Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof).

• Notes Descriptor Values are combined via the mechanism of a Description Scheme to form a Description.

Page 97: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

97

Description Scheme

• Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes.• Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level. • Note Ds contain only basic data types, and does not refer to others D or DSs.

Page 98: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

98

DS: XML Scheme & Extensions

• XML Scheme• Data types • Simple and Complex types • Elements • Inheritance, Abstract types

• MPEG-7 extensions• Array and Matrix datatype • Enumerated datatypes for MimeType, CountryCode, RegionCode, CurrencyCode and CharacterSetCode • Typed references

Page 99: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

99

Basic elements of DS

• Constructs for linking media files

• Localizing pieces of content

• Describing – time, places, persons, individuals, groups,

organizations, and textual annotation, etc– Who? What object? What action? Where?

When? Why? and How?

Page 100: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

100

Content recognition tools

• No speech or face or gesture recognition engines included in MPEG-7

• Content recognition tools is a task for industries, not a standard– coding tools in MPEG-1, -2, -4 were for

research purposes, not part of the standard– no tools were part of the MPEG standard

Page 101: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Chinese Academy of Sciences, Beijing, China

Speech and Language Processing Techniques

Report

Docum

ent

101