MPEG-7 - Electronic Engineeringee502/MPEG-7.pdf · MPEG-7 Audio Amendment 2 will include extended...

Post on 25-Aug-2020

4 views 0 download

Transcript of MPEG-7 - Electronic Engineeringee502/MPEG-7.pdf · MPEG-7 Audio Amendment 2 will include extended...

MPEG-7

• MPEG-7 overview– What is…– Why?– Objectives and scope– Main elements and organization.

• MPEG-7 Audio– Low-level features– High-level tools

What is MPEG-7?• "Multimedia Content Description Interface”• ISO/IEC standard by MPEG (Moving Picture Experts Group)

• Providing meta-data for multimedia• MPEG-1, -2, -4: make content available;

MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer).

• Multi-degrees of interpretation of information’s meaning• Support as broad a range of applications as possible.• A compatible (with existing tech) and extensible standard.

Why MPEG-7?

• “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ”

• Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms

• Now: growing amount of audiovisual information-> Identifying and managing them efficiently is

becoming more difficult.e.g. “record only news about sport.”

Why MPEG-7?• For future multimedia services, content

representation and description may have to be addressed jointly.

• Many services dealing with content representation will have to deal first with content description– “a non-described content may be useless”

• Need for access only to the content description:– New original services (e.g. optimizing personal time)– Adaptation to networks and terminal capabilities

Application domains• Broadcast media selection (e.g., radio channel, TV

channel).• Digital libraries (e.g., film, video, audio and radio

archives).• E-Commerce (e.g., personalized advertising).• Education (e.g., repositories of multimedia courses,

multimedia search for support material).• Home Entertainment (e.g., management of personal

multimedia collections, including manipulation of content, e.g. karaoke).

• Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face).

• Multimedia directory services (e.g. yellow pages, G.I.S).• Surveillance and remote sensing.

MPEG-7 ObjectivesStandardize content-based description for various

types of audiovisual information

• Independent from media support (encoding and storage)• Different granularity

– Low-level features: shape, size, key, tempo changes,– High-level semantic info: “scene with a barking brown dog on the

left and with the sound of passing cars in the background.”• Meaningful in the context of the application

– Same material -> different types of features and combinationse.g. timbre v.s. loudness

MPEG-7 Objectives

• Information about the content– The form: e.g. the coding format used

– Conditions for accessing the material:e.g. Intellectual property rights / price

– Classification: e.g. parental rating

– Links to other relevant materials– The context: “e.g. Olympic Games 1996, final of 200 meter

hurdles, men)”

• Information present in the content:– Combination of low-level and high-level descriptors

Scope of the Standard

processing chain:

An example of architecture

• Pull: (Client Queries -> Descriptions repository -> Matched Ds)• Push: (Filter descriptions -> Programmed actions)

Where are the descriptions from?• Preservation of existing descriptive data (e.g.

scripts) through production/delivery• Generated automatically by capture devices

(e.g. time or GPS location in a camera)• Extracted automatically & semi-automatically

(i.e. with some human assistance)• Manually produced (e.g. for legacy material such

as existing film archives)

Main Elements of MPEG-7

• Relationship among elements introduced above.

Descriptions

• MPEG-7 approaches the description of content from several viewpoints.

• A set of methods and tools for the different viewpoints of the description (not a monolithic system)

• Interrelated and can be combined in many ways.• Associated with the content itself: (searching, filtering)• Location: (document V.S. stream)

– physically located with the material– somewhere else on the globe (maybe not)

• Interoperability with other metadata standards: (XML)

Major Functionalities• MPEG-7 Systems• MPEG-7 Description Definition Language• MPEG-7 Visual• MPEG-7 Audio• MPEG-7 Multimedia Description Schemes • Reference Software: the eXperimentation Model (test)

• MPEG-7 Conformance (syntax checking)

• MPEG-7 Extraction and use of descriptions (technical report)

MPEG-7 Audio• Audio provides structures—building upon

some basic structures from the MDS—for describing audio content.

• Low-level Descriptors:– audio features that cut across many applications

• High-level Description Tools:– more specific to a set of applications.

Low-level Features

Low-level Features (details)• Basic: (temporally sampled scalar values for general use)

– AudioWaveform Descriptor• waveform envelope: (for display purposes).

– AudioPower Descriptor• temporally-smoothed instantaneous power:

(quick summary of a signal)• Silence segment: (no significant sound)

– aid further segmentation of the audio stream, or as a hint not to process a segment

– Applicable to all kinds of signals

Low-level Features (details)

• Basic Spectral: (single time-frequency analysis of signal)– AudioSpectrumEnvelope: (Base class)

• the short-term power spectrum:(display, synthesize, general-purpose search)

– AudioSpectrumCentroid: • dominated by high or low frequencies ?

– AudioSpectrumSpread:• the power spectrum centered near the spectral centroid, or spread

out over the spectrum?• pure-tone and noise-like sounds

– AudioSpectrumFlatness: (the presence of tonal components)

Low-level Features (details)

• Signal Parameters: (periodic or quasi-periodic signals)

– AudioFundamentalFrequency:• “confidence measure”, replacing “pitch-tracking”

– AudioHarmonicity:• distinction between sounds with a

harmonic / inharmonic / non-harmonic spectrum

Low-level Features (details)• Timbral Temporal: (temporal characteristics of segments

of sounds, musical timbre)– LogAttackTime– TemporalCentroid

• where in time the energy of a signal is focused.• Useful when attack times are identical

T0t

Signal envelope(t)

T1Illustration of log-tack time

Low-level Features (details)

• Timbral Spectral: (spectral features in a linear-frequency space)– SpectralCentroid:

• power-weighted average of the frequencyof the bins in the linear power spectrum.

• distinguishing musical instrument timbres– 4 Ds for harmonic regularly-spaced components of signals:

• HarmonicSpectralCentroid• HarmonicSpectralDeviation• HarmonicSpectralSpread• HarmonicSpectralVariation

Low-level Features (details)• Spectral Basis: (low-dimensional projections of a spectral space to

aid compactness and recognition)

– AudioSpectrumBasis:• a series of (time-varying / statistically independent) basis functions

derived from the singular value decomposition of a normalized power spectrum.

– AudioSpectrumProjection:• low-d features of a spectrum after projection upon a reduced rank

basis.

– independent subspaces of a spectra correlate strongly with different sound sources.

– Provide more salience using less space.• With Sound Classification and Indexing Description Tools.

High-level audio Description Tools (Ds and DSs)

• Exchange some generality for descriptive richness:– a smaller set of audio features (as compared to visual

features) that may canonically represent a sound without domain-specific knowledge.

• Audio Signature (DS)

• Musical Instrument Timbre• Melody• General Sound Recognition and Indexing• Spoken Content

High-level audio Description Tools (details)

• Audio Signature Description Scheme– SpectralFlatness Ds– a unique content identifier for the purpose of

robust automatic identification– e.g. audio fingerprinting

High-level audio Description Tools (details)

• Musical Instrument Timbre Description Tools– HarmonicInstrumentTimbre Ds:

• LogAttackTime Descriptor– PercussiveIinstrumentTimbre Ds:

• SpectralCentroid Descriptor

High-level audio Description Tools (details)

• Melody Description Tools: – efficient, robust, and expressive melodic similarity

matching.– MelodyContour Description Scheme:

• terse, efficient melody contour / rhythm– MelodySequence Description Scheme:

• verbose, complete, expressive melody / rhythm.• Interval encoding

High-level audio Description Tools (details)

• General Sound Recognition and Indexing Description Tools: – SoundModel Description Scheme– SoundClassificationModel Description Scheme

• a set of SoundModel DS -> multi-way classifier– SoundModelStatePath Descriptor

• indices to states generated by a SoundModel of a segment

– immediately applied to sound effects– automatically index and segment sound tracks.– Low -> mid -> high level analyses

High-level audio Description Tools (details)

• Spoken Content Description Tools: – detailed description of words spoken within an

audio stream.– indexing into and retrieval of an audio stream– indexing of multimedia objects annotated with

speech.• Recall of audio/video data by memorable spoken events.

– a character or person spoke a particular word• Spoken Document Retrieval

– separate spoken documents• Annotated Media Retrieval

– photograph retrieved using a spoken annotation

Instantaneous HarmonicSpectralCentroid

Instantaneous HarmonicSpectralDeviation

Signal

Sliding Analysis Window

STFT

Signal envelope

f0

Harmonic Peaks

Detection

Instantaneous HarmonicSpectralSpread

Temporal Centroid

z-1

Power Spectrum SpectralCentroid

LogAttackTime

Instantaneous HarmonicSpectralVariation

Timbre Descriptor Estimation

MPEG-7 Audio Amendment 2

will include extended functionality of audio metadatathat is complementary to low-level audio descriptorsin ISO/IEC 15938-4,

providing high level description tools like chord pattern and Rhythm pattern,

both of which support compact representation of timbre and rhythm.