MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

40
MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Page 1: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 Audio Overview

Beinan Li

MUMT 611 Week 2

2005. 1. 20

Page 2: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Content

MPEG-7 overviewWhat is…Why?Objectives and scopeMain elements and organization.

MPEG-7 AudioLow-level featuresHigh-level tools

Page 3: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

What is MPEG-7 "Multimedia Content Description Interface“ ISO/IEC standard by MPEG (Moving Picture Experts Group)

Providing meta-data for multimedia MPEG-1, -2, -4: make content available;

MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer). Multi-degrees of interpretation of information’s meaning Support as broad a range of applications as possible. A compatible (with existing tech) and extensible standard.

Page 4: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Why MPEG-7 “The value of information often depends on how

easy it can be found, retrieved, accessed, filtered and managed. ”

Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms

Now: growing amount of audiovisual information-> Identifying and managing them efficiently is becoming more difficult.e.g. “record only news about sport.”

Page 5: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Why MPEG-7 For future multimedia services, content representation and

description may have to be addressed jointly. Many services dealing with content representation will

have to deal first with content description “a non-described content may be useless”

Need for access only to the content description: New original services (e.g. optimizing personal time) Adaptation to networks and terminal capabilities

Page 6: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Application’s domains (incomplete)

Broadcast media selection (e.g., radio channel, TV channel). Digital libraries (e.g., film, video, audio and radio archives). E-Commerce (e.g., personalized advertising). Education (e.g., repositories of multimedia courses, multimedia

search for support material). Home Entertainment (e.g., management of personal multimedia

collections, including manipulation of content, e.g. karaoke). Journalism (e.g. searching speeches of a certain politician using his

name, his voice or his face). Multimedia directory services (e.g. yellow pages, G.I.S). Surveillance and remote sensing.

Page 7: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 Objectives

Standardize content-based description for various types of audiovisual information

Independent from media support (encoding and storage)

Different granularity Low-level features: shape, size, key, tempo changes, High-level semantic info: “scene with a barking brown dog on

the left and with the sound of passing cars in the background.”

Meaningful in the context of the application Same material -> different types of features and combinations

e.g. timbre v.s. loudness

Page 8: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 Objectives Information about the content

The form: e.g. the coding format used

Conditions for accessing the material:e.g. Intellectual property rights / price

Classification: e.g. parental rating

Links to other relevant materials The context: “e.g. Olympic Games 1996, final of 200 meter hurdles, men)”

Information present in the content: Combination of low-level and high-level descriptors

Page 9: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Scope of the Standardprocessing chain:

Page 10: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

An example of architecture

Pull: (Client Queries -> Descriptions repository -> Matched Ds) Push: (Filter descriptions -> Programmed actions)

Page 11: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Workplan

Page 12: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Where are the descriptions from?

Preservation of existing descriptive data (e.g. scripts) through the production/delivery

Generated automatically by capture devices (e.g. time or GPS location in a camera)

Extracted automatically & semi-automatically (i.e. with some human assistance)

Manually produced (e.g. for legacy material such as existing film archives)

Page 13: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Main Elements of MPEG-7

Description Tools: ( textual / binary ) Descriptors (D): define the syntax and the semantics of each

feature (metadata element) Description Schemes (DS): relationships between components

Description Definition Language (DDL): Define the syntax of the MPEG-7 Description Tools Creation , extension and modification of DSs

System tools: Storage and transmission, synchronization of descriptions with c

ontent, multiplexing of descriptions, etc.

Page 14: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Main Elements of MPEG-7 Relationship among elements introduced above.

Page 15: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Description Tools Creation and production processes: (director, title) Usage: (broadcast schedule) Storage features. Structural information: (spatial-temporal components)

Segmentations Low level features: (sound timbres, melody description) Conceptual information: (objects and events, interactions) Navigation and access: (summaries, variations) Collections of objects. User-content interactions: (user preferences, usage history)

Page 16: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Organization of Description Tools

Page 17: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Descriptions (further) MPEG-7 approaches the description of content from

several viewpoints. A set of methods and tools for the different viewpoints of

the description (not a monolithic system) Interrelated and can be combined in many ways. Associated with the content itself: (searching, filtering) Location: (document V.S. stream)

physically located with the material somewhere else on the globe (maybe not)

Interoperability with other metadata standards: (XML)

Page 18: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Use of Description Tools The description tools are presented on the basis of the fun

ctionality they provide. In practice, they are combined into meaningful sets of des

cription units. Furthermore, each application will have to select a sub-set

of descriptors and DSs. Library of tools! DDL can be used to handle specific needs of the application. (like scripting in many current applications)

Page 19: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Major Functionalities MPEG-7 Systems MPEG-7 Description Definition Language MPEG-7 Visual MPEG-7 Audio MPEG-7 Multimedia Description Schemes (D.T.)

Reference Software: the eXperimentation Model (test)

MPEG-7 Conformance (syntax checking)

MPEG-7 Extraction and use of descriptions (technical report)

Page 20: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 Audio

Audio provides structures—building upon some basic structures from the MDS—for describing audio content.

Low-level Descriptors:audio features that cut across many applications

High-level Description Tools:more specific to a set of applications.

Page 21: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features “MPEG-7 Audio Framework”: Two low-level descriptor types: (for sample and segment)

Scalar : (e.g. power or fundamental frequency) Vector : (e.g. spectra)

Hierarchical, consistent interface Any descriptor inheriting from these types can be instantiated,

describing a segment with a single summary value or a series of sampled values, as the application requires.

Scalable Series: (hierarchical re-sampling) Progressively down-sample the data contained in a series

(Application-oriented)

Page 22: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (types)

Basic Basic Spectral Signal Parameters Timbral Temporal Timbral Spectral Spectral Basis MPEG-7 Silence Descriptor

Page 23: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (graph)

Page 24: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Basic: (temporally sampled scalar values for general use)

AudioWaveform Descriptorwaveform envelope: (for display purposes).

AudioPower Descriptortemporally-smoothed instantaneous power:

(quick summary of a signal)

Applicable to all kinds of signals

Page 25: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Basic Spectral: (single time-frequency analysis of signal) AudioSpectrumEnvelope: (Base class)

the short-term power spectrum:

(display, synthesize, general-purpose search)

AudioSpectrumCentroid: dominated by high or low frequencies ?

AudioSpectrumSpread: the power spectrum centered near the spectral centroid, or spread out over

the spectrum? pure-tone and noise-like sounds

AudioSpectrumFlatness: (the presence of tonal components)

Page 26: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Signal Parameters: (periodic or quasi-periodic signals)

AudioFundamentalFrequency:“confidence measure”, replacing “pitch-tracking”

AudioHarmonicity:distinction between sounds with a

harmonic / inharmonic / non-harmonic spectrum

Page 27: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Timbral Temporal: (temporal characteristics of segments of sounds, musical timbre)

LogAttackTimeTemporalCentroid

where in time the energy of a signal is focused.Useful when attack times are identical

Page 28: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Timbral Spectral: (spectral features in a linear-frequency space) SpectralCentroid:

power-weighted average of the frequency

of the bins in the linear power spectrum.distinguishing musical instrument timbres

4 Ds for harmonic regularly-spaced components of signals:HarmonicSpectralCentroidHarmonicSpectralDeviationHarmonicSpectralSpreadHarmonicSpectralVariation

Page 29: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details) Spectral Basis: (low-dimensional projections of a spectral space to aid co

mpactness and recognition)

AudioSpectrumBasis: a series of (time-varying / statistically independent) basis functions deriv

ed from the singular value decomposition of a normalized power spectrum.

AudioSpectrumProjection: low-d features of a spectrum after projection upon a reduced rank basis.

independent subspaces of a spectra correlate strongly with different sound sources.

Provide more salience using less space. With Sound Classification and Indexing Description Tools.

Page 30: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Low-level Features (details)

Silence segment: (no significant sound) aid further segmentation of the audio stream, or as a hint not to

process a segment

Page 31: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (Ds and DSs)

Exchange some generality for descriptive richness: a smaller set of audio features (as compared to visual features)

that may canonically represent a sound without domain-specific knowledge.

Audio Signature (DS)

Musical Instrument Timbre Melody General Sound Recognition and Indexing Spoken Content

Page 32: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (details)

Audio Signature Description SchemeSpectralFlatness Dsa unique content identifier for the purpose of robust aut

omatic identificatione.g. audio fingerprinting

Page 33: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (details)

Musical Instrument Timbre Description ToolsHarmonicInstrumentTimbre Ds:

LogAttackTime Descriptor

PercussiveIinstrumentTimbre Ds:SpectralCentroid Descriptor

Page 34: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (details)

Melody Description Tools: efficient, robust, and expressive melodic similarity mat

ching.MelodyContour Description Scheme:

terse, efficient melody contour / rhythmMelodySequence Description Scheme:

verbose, complete, expressive melody / rhythm.Interval encoding

Page 35: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (details)

General Sound Recognition and Indexing Description Tools: SoundModel Description SchemeSoundClassificationModel Description Scheme

a set of SoundModel DS -> multi-way classifierSoundModelStatePath Descriptor

indices to states generated by a SoundModel of a segment immediately applied to sound effectsautomatically index and segment sound tracks.Low -> mid -> high level analyses

Page 36: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

High-level audio Description Tools (details)

Spoken Content Description Tools: detailed description of words spoken within an audio

stream. indexing into and retrieval of an audio stream indexing of multimedia objects annotated with speech.

Recall of audio/video data by memorable spoken events. a character or person spoke a particular word

Spoken Document Retrieval separate spoken documents

Annotated Media Retrieval photograph retrieved using a spoken annotation

Page 37: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Development Currently under development:

MPEG-7 Audio COR.1 (currently at DCOR1) MPEG-7 Amendment 1 (currently at FPDAM1)

New Audio Description Tools specified (MPEG-7 version 2): Spoken Content: Audio Signal Quality: Audio Tempo:

Currently Proposed tools: Low Level Descriptor for Audio Intensity Low Level Descriptor for Audio Spectrum Envelope Evolution Generic mechanism for data representation based on ‘modulation

decomposition’ MPEG-7 Audio-specific binary representation of descriptors

Page 38: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 version 1 Schedule Call for Proposals October 1998 Evaluation February 1999 First version of Working Draft (WD) December 1999 Committee Draft (CD) October 2000 Final Committee Draft (FCD) February 2001 Final Draft International Standard (FDIS) July 2001 International Standard (IS) September

2001

Page 39: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

MPEG-7 work plan:

See :

Annex A of MPEG-7 Overview (version 9) http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm

Page 40: MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Annotated Link Page / References

http://www.music.mcgill.ca/~damonli/611/611_w2.htm

All pictures taken from: P. Salembier and O. Avaro, “MPEG-7: Multimedia Content Description inter

face”,

http://gps-tsc.upc.es/imatge/_Philippe/demo/MPEG21_MPEG7.pdf