MPEG-7 - Electronic ee502/MPEG-7.pdf¢  MPEG-7 Audio Amendment 2 will include extended...

download MPEG-7 - Electronic ee502/MPEG-7.pdf¢  MPEG-7 Audio Amendment 2 will include extended functionality

of 58

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of MPEG-7 - Electronic ee502/MPEG-7.pdf¢  MPEG-7 Audio Amendment 2 will include extended...

  • MPEG-7

    • MPEG-7 overview – What is… – Why? – Objectives and scope – Main elements and organization.

    • MPEG-7 Audio – Low-level features – High-level tools

  • What is MPEG-7? • "Multimedia Content Description Interface” • ISO/IEC standard by MPEG (Moving Picture Experts Group) • Providing meta-data for multimedia • MPEG-1, -2, -4: make content available;

    MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer).

    • Multi-degrees of interpretation of information’s meaning • Support as broad a range of applications as possible. • A compatible (with existing tech) and extensible standard.

  • Why MPEG-7?

    • “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ”

    • Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms

    • Now: growing amount of audiovisual information -> Identifying and managing them efficiently is

    becoming more difficult. e.g. “record only news about sport.”

  • Why MPEG-7? • For future multimedia services, content

    representation and description may have to be addressed jointly.

    • Many services dealing with content representation will have to deal first with content description – “a non-described content may be useless”

    • Need for access only to the content description: – New original services (e.g. optimizing personal time) – Adaptation to networks and terminal capabilities

  • Application domains • Broadcast media selection (e.g., radio channel, TV

    channel). • Digital libraries (e.g., film, video, audio and radio

    archives). • E-Commerce (e.g., personalized advertising). • Education (e.g., repositories of multimedia courses,

    multimedia search for support material). • Home Entertainment (e.g., management of personal

    multimedia collections, including manipulation of content, e.g. karaoke).

    • Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face).

    • Multimedia directory services (e.g. yellow pages, G.I.S). • Surveillance and remote sensing.

  • MPEG-7 Objectives Standardize content-based description for various

    types of audiovisual information

    • Independent from media support (encoding and storage) • Different granularity

    – Low-level features: shape, size, key, tempo changes, – High-level semantic info: “scene with a barking brown dog on the

    left and with the sound of passing cars in the background.” • Meaningful in the context of the application

    – Same material -> different types of features and combinations e.g. timbre v.s. loudness

  • MPEG-7 Objectives

    • Information about the content – The form: e.g. the coding format used – Conditions for accessing the material:

    e.g. Intellectual property rights / price – Classification: e.g. parental rating – Links to other relevant materials – The context: “e.g. Olympic Games 1996, final of 200 meter

    hurdles, men)”

    • Information present in the content: – Combination of low-level and high-level descriptors

  • Scope of the Standard

    processing chain:

  • An example of architecture

    • Pull: (Client Queries -> Descriptions repository -> Matched Ds) • Push: (Filter descriptions -> Programmed actions)

  • Where are the descriptions from? • Preservation of existing descriptive data (e.g.

    scripts) through production/delivery • Generated automatically by capture devices

    (e.g. time or GPS location in a camera) • Extracted automatically & semi-automatically

    (i.e. with some human assistance) • Manually produced (e.g. for legacy material such

    as existing film archives)

  • Main Elements of MPEG-7 • Relationship among elements introduced above.

  • Descriptions

    • MPEG-7 approaches the description of content from several viewpoints.

    • A set of methods and tools for the different viewpoints of the description (not a monolithic system)

    • Interrelated and can be combined in many ways. • Associated with the content itself: (searching, filtering) • Location: (document V.S. stream)

    – physically located with the material – somewhere else on the globe (maybe not)

    • Interoperability with other metadata standards: (XML)

  • Major Functionalities • MPEG-7 Systems • MPEG-7 Description Definition Language • MPEG-7 Visual • MPEG-7 Audio • MPEG-7 Multimedia Description Schemes • Reference Software: the eXperimentation Model (test) • MPEG-7 Conformance (syntax checking) • MPEG-7 Extraction and use of descriptions (technical


  • MPEG-7 Audio • Audio provides structures—building upon

    some basic structures from the MDS—for describing audio content.

    • Low-level Descriptors: – audio features that cut across many applications

    • High-level Description Tools: – more specific to a set of applications.

  • Low-level Features

  • Low-level Features (details) • Basic: (temporally sampled scalar values for general use)

    – AudioWaveform Descriptor • waveform envelope: (for display purposes).

    – AudioPower Descriptor • temporally-smoothed instantaneous power:

    (quick summary of a signal) • Silence segment: (no significant sound)

    – aid further segmentation of the audio stream, or as a hint not to process a segment

    – Applicable to all kinds of signals

  • Low-level Features (details)

    • Basic Spectral: (single time-frequency analysis of signal) – AudioSpectrumEnvelope: (Base class)

    • the short-term power spectrum: (display, synthesize, general-purpose search)

    – AudioSpectrumCentroid: • dominated by high or low frequencies ?

    – AudioSpectrumSpread: • the power spectrum centered near the spectral centroid, or spread

    out over the spectrum? • pure-tone and noise-like sounds

    – AudioSpectrumFlatness: (the presence of tonal components)

  • Low-level Features (details) • Signal Parameters: (periodic or quasi-periodic signals)

    – AudioFundamentalFrequency: • “confidence measure”, replacing “pitch-tracking”

    – AudioHarmonicity: • distinction between sounds with a

    harmonic / inharmonic / non-harmonic spectrum

  • Low-level Features (details) • Timbral Temporal: (temporal characteristics of segments

    of sounds, musical timbre) – LogAttackTime – TemporalCentroid

    • where in time the energy of a signal is focused. • Useful when attack times are identical

    T0 t

    Signal envelope(t)

    T1 Illustration of log-tack time

  • Low-level Features (details) • Timbral Spectral: (spectral features in a linear-frequency

    space) – SpectralCentroid:

    • power-weighted average of the frequency of the bins in the linear power spectrum.

    • distinguishing musical instrument timbres – 4 Ds for harmonic regularly-spaced components of signals:

    • HarmonicSpectralCentroid • HarmonicSpectralDeviation • HarmonicSpectralSpread • HarmonicSpectralVariation

  • Low-level Features (details) • Spectral Basis: (low-dimensional projections of a spectral space to

    aid compactness and recognition)

    – AudioSpectrumBasis: • a series of (time-varying / statistically independent) basis functions

    derived from the singular value decomposition of a normalized power spectrum.

    – AudioSpectrumProjection: • low-d features of a spectrum after projection upon a reduced rank


    – independent subspaces of a spectra correlate strongly with different sound sources.

    – Provide more salience using less space. • With Sound Classification and Indexing Description Tools.

  • High-level audio Description Tools (Ds and DSs)

    • Exchange some generality for descriptive richness: – a smaller set of audio features (as compared to visual

    features) that may canonically represent a sound without domain-specific knowledge.

    • Audio Signature (DS) • Musical Instrument Timbre • Melody • General Sound Recognition and Indexing • Spoken Content

  • High-level audio Description Tools (details)

    • Audio Signature Description Scheme – SpectralFlatness Ds – a unique content identifier for the purpose of

    robust automatic identification – e.g. audio fingerprinting

  • High-level audio Description Tools (details)

    • Musical Instrument Timbre Description Tools – HarmonicInstrumentTimbre Ds:

    • LogAttackTime Descriptor – PercussiveIinstrumentTimbre Ds:

    • SpectralCentroid Descriptor

  • High-level audio Description Tools (details)

    • Melody Description Tools: – efficient, robust, and expressive melodic similarity

    matching. – MelodyContour Description Scheme:

    • terse, efficient melody contour / rhythm – MelodySequence Description Scheme:

    • verbose, complete, expressive melody / rhythm. • Interval encoding

  • High-level audio Description Tools (details)