The MPEG-7 Multimedia Content Description Interface
description
Transcript of The MPEG-7 Multimedia Content Description Interface
Iούνιος 6, 2006
The MPEG-7
Multimedia Content Description Interface
Αναστασία Μπολοβίνου,
Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών
Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ
2
Outline
• MPEG-7 motivation and scope• Visual Descriptors (color, texture, shape)• MPEG-7 retrieval evaluation criterion• Similarity measures and MPEG-7 visual descriptors• Building MPEG-7 Descriptors and Descriptors
Schemes with Description Definition Language• MPEG-7 VXM current state• Towards MPEG-7 Query Format Framework
(Queries and visual descriptor tools employed by the queries)
• Summary
3
Proliferation of audio-visual content
MPEG-7 motivation and design scenarios (possible queries)
• Music/audio: play a few notes and return music with similar music/audio
• Images/graphics: draw a sketch and return images with similar graphics
• Text/keywords: find AV material with subject corresponding to a keyword
• Movement: describe movements and return video clips with the specified temporal and spatial relations
• Scenario: describe actions and return scenarios where similar actions take place
Standardize multimedia metadata descriptions (facilitate
multimedia content-based retrieval) for
various types of audiovisual information
Consumer content
news
sports
Scientific content
Digital art galleries
Recorded material
4
- How to extract descriptions(feature extraction, indexing process,annotation & authoring tools,...)
Scope of the Standard
DescriptionProduction(extraction)
DescriptionConsumption
StandardDescription
Normative part ofMPEG-7 standard
- How to use descriptions (search engine, filtering tool, retrieval process, browsing device, ...) - The similarity between contents->The goal is to define the minimum that enables interoperability.
* MPEG-7 does not specify (non normative parts of MPEG-7):
5
Information flow
6
• Color DescriptorsDominant ColorScalable ColorColor LayoutColor StructureGoF/GoP Color
• Texture DescriptorsHomogeneous TextureTexture BrowsingEdge Histogram
• Shape DescriptorsRegion ShapeContour Shape3D Shape
Visual Descriptors• LocalizationRegion LocatorSpatio-TemporalLocator
OtherFace Recognition
• Motion Descriptors for VideoCamera MotionMotion TrajectoryParametric MotionMotion Activity
(Normative, basic, for localization)
7
Color Descriptors
Constrained color spaces:->Scalable Color Descriptor uses HSV->Color Structure Descriptor uses HMMD
Color Descriptors
Dominant Color Scalable Color- HSV space
Color Structure-HMMD space
Color Layout-YCbCr space
GroupOfFrames/Pictures
• Color Space: - R, G, B- Y, Cr, Cb- H, S, V- Monochrome- Linear transformation of R, G, B- HMMD
8
Scalable Color Descriptor (CSD)
• A color histogram in HSV color space
• Encoded by Haar TransformFeature vector: {NoCoef, NoBD, Coeff[..], CoeffSign[..]}
9
SCD extraction
to 4bits/bin
to 11bits/bi
nNbits/bin
(#bin<256)
10
GoF/GoP Color Descriptor
• Histograms Aggregation methods:– Average ..but sensitivity to outliers (lighting changes, occlusion, text overlays)– Median ..increased comp. complexity for sorting– Intersection ..differs: a “least common” color trait
viewpoint
•Extends Scalable Color Descriptor for a video segment or a group of pictures (joint color hist. is then possessed as CSD- Haar transform encoding)
Extraction
12
Dominant Color Descriptor (DCD)
• Clustering colors into a small number of representative colors (salient colors)
• F = { {ci, pi, vi}, s}• ci : Representative colors
• pi : Their percentages in the region
• vi : Color variances
• s : Spatial coherency
13
DCD Extraction (based on Lloyd gen. algorithm)
ci centroid of cluster ;
x(n) color vector at pixel;
v(n) perceptual weight for pixel .
+spatial coherency:Average number of connecting pixels of a dominant color using 3x3 masking window
H.V.P more sensitive to smooth regions
14
• http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.html
15
Color Layout Descriptor (CLD)
• Clustering the image into 64 (8x8) blocks
• Deriving the average color of each block (or using DCD)• Applying (8x8)DCT and encoding
• Efficient for– Sketch-based image retrieval– Content Filtering using image
indexing
…
…
.
.
...
. .
.
16
If the time domain data is smooth (with little variation
in data) then frequency domain data will make low frequency data larger and high frequency data smaller.
-> derived average colors are transformed into a series of coefficients by performing DCT (data in time
domain - > data in frequency domain).
-> A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD (large quantization step in quantizing AC coef / small quantization
step in quantizing DC ). ->The color space adopted for CLD is YCrCb.
CLD extraction
F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}
17
Color Structure Descriptor (CSD)• Scanning the image by an
8x8 struct. element• Counting the number of
blocks containing each color• Generating a color histogram
(HMMD/4CSQ operating points)
8 x 8 structuringelement
COLORBINC0
C1 +1
C2
C3 +1
C4
C5
C6
C7 +1
18
CSD extraction
If
Then sub sampling factor p is given by:
F = {colQuant, Values[m]}
19
CSD scaling
20
Texture Descriptors
• Homogenous Texture Descriptor• Non-Homogenous Texture
Descriptor (Edge Histogram)• Texture Browsing
21
Homogenous Texture Descriptor (HTD)
• Partitioning the frequency domain into 30 channels (modeled by a 2D-Gabor function)
• Computing the energy and energy deviation for each channel
• Computing mean and standard variation of frequency coefficients
- > F = {fDC, fSD, e1,…, e30, d1,…, d30}
• An efficient implementation: – Radon transform followed by Fourier
transform
22
HTD Extraction –How to get 2-D frequency layout following the HVS
2-D image f(x,y)
1D P (R, θ)
Radon transform
1D F(P (R, θ))
Resulted sampling grid in polar coords
23
- > 2D-Gabor Function deployed to define Gabor filter banks
• It is a Gaussian weighted sinusoid
• It is used to model individual channels
• Each channel filters a specific type of texture
HTD Extraction - Data sampling in feature channel
25
HTD properties
One can perform
• Rotation invariance matching
• Intensity invariance matching (fCD removed from the feature vector)
• Scale-Invariant matching
F = {fDC, fSD, e1,…, e30, d1,…, d30}
26
Texture Browsing Descriptor
-> Same sp. filtering procedure as the HTD..
Scale and orientation
selective band-pass filters
regularity(periodic to random)
Coarseness(grain to coarse)
Directionality (/300)
->the texture browsing descriptor can be used to find aset of candidates with similar perceptual properties and thenuse the HTD to get a precise similarity match list among thecandidate images.
e.g look for textures that are very regular and oriented at 300
27
Edge Histogram Descriptor (EHD)
• Represents the spatial distribution of five types of edges– vertical, horizontal, 45°, 135°, and non-
directional
• Dividing the image into 16 (4x4) blocks• Generating a 5-bin histogram for each
block• It is scale invariant
Retain strong edges by thresholding canny edge operator
…• F = {BinCounts[k]} ,k=80
28
EHD extraction
Basic (80 bins) Extended (150 bins)
+13 clusters for semi-global
basic Semi-global
global
Egde map image using “Canny” edge operator
.
29
ETD valuation
• Cannot be used for object-based image retrieval
• Thedgeif set to 0 ETD applies for binary edge images (sketch-based retrieval)
• Extended HTD achieves better results but does not exhibits rotation invariant property
30
Shape Descriptors
• Region-based Descriptor• Contour-based Shape Descriptor• 2D/3D Shape Descriptor• 3D Shape Descriptor
31
Region-based Descriptor (RBD)
• Expresses pixel distribution within a 2-D object region
• Employs a complex 2D-Angular Radial Transformation (ART)
2
0
1
0,,,,,, ddfVfVF nmnmnm
jmAm exp2
1
0cos2
01
nn
nRn
m = 0, ..12
n = 0, ..3
• F = {MagnitudeOfART[k]} ,k=nxm
32
Region-based Descriptor (2)
• Applicable to figures (a) – (e)• Distinguishes (i) from (g) and
(h)• (j), (k), and (l) are similar
Advantages:Describes complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching
33
Contour-Based Descriptor (CBD)
• It is based on Curvature Scale-Space representation
34
Curvature Scale-Space
• Finds curvature zero crossing points of the shape’s contour (key points)
• Reduces the number of key points step by step, by applying Gaussian smoothing
• The position of key points are expressed relative to the length of the contour curve
35
CBD Extraction
Location xCSS of curvature zero-crossing points
Filtering pass ycss
Repetitive smoothing of X and Y contour coordinates by the low-pass kernel (0.25, 0,5, 0,25) until the contour becomes convex
• F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ], HighestPeakY, peakX[k], peakY[k]}
36
CBD Applicability
• Applicable to (a)• Distinguishes
differences in (b)• Find similarities in
(c) - (e)
Advantages:• Captures the shape very well• Robust to the noise, scale, and orientation• It is fast and compact
37
Comparison (RB/CB descriptors)
• Blue: Similar shapes by Region-Based• Yellow: Similar shapes by Contour-
Based
38
How MPEG-7 compare descriptors?
ANMRR (average modified retrieval rank):
-normalized measures that take into account different sizes of ground truth sets and the actual ranks obtained from the retrieval were defined -> retrievals that miss items are assigned a penalty.
Traditional metric
39
Similarity between features
• Typically descriptors: multidimensional vectors (of low level features)
• Similarity of two images in the vector feature space:
– the range query: all the points within a hyperrectanglealigned with the coordinate axes– the nearest-neighbour or within-distance (α−cut)query: a particular metric in the feature space– dissimilarity between statistical distributions: thesame metrics or specific measures
40
• http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html
An example of CBIR system using HTD performing range query and NN query
41
Criticism on MPEG-7 distance measures• MPEG-7 adopts feature vector space distances based on
geometric assumptions of descriptor space, e.g
..but these quantitative measures (low-level information) do not fit ideally with human similarity perception
->researchers from other areas have developed alternative predicate-based models (descriptors are assumed to contain just binary elements in opposition to continuous data) which express the existence of properties and express high level information
See “Pattern difference” : 2K
bc K:NofPredicates in the data vectors Xi, Xj
b: property exists in Xi c: property exists in Xj
44
How to build and deploy an MPEG-7 Description
A description A Description Scheme (structure) .
A set of Descriptor Values (instantiation of a Descriptor for a given data set)
+
MPEG-7 Description Tools are a library of standardized Descriptions and Description Schemes
Adopting the XML Schema as the basis for the MPEG-7 DDL and the resulting XML-compliant instances (Descriptions in MPEG-7 textual format) eases interoperability by using a common, generic and powerful (+ extensible) representation format
in DDLanguage
45
How that worksDescription Definition Language:
->XML Schema (flexibility) - XMLS struct.lang.components - XMLS datatype lang.components - mpeg-7 spesific extentions + - >Binary version (efficiency)
Mpeg7 support for
vectors, matrices and
typed references
Text formatBiM formatmix
(XML)
47
Descriptions enabled by the MPEG-7 tools
Perceptual Descriptions:
- content’s spatio-temporal structure- info on low-level features - semantic info related to the reality captured by the content
Archival-oriented Descriptions:
-content’s creation/production
- info on using the content
- info on storing and representing the content
Additional info for organizing, managing and accessing the content:
- How objs are related and gathered in collections
-summaries/variations/transcoding to support efficient browsing
- User interaction info
Organization/Naviga-tion/Access/ User Interaction Tools
Content description Tools
Content management Tools
48
Type hierarchy for top levels elements
49
<Mpeg7><Description xsi:type=“ContentEntity”><MultimediaContent xsi:type=“VideoType”> <Video id=“video_example”> <MediaInformation>...</MediaInformation> <TemporalDecomposition gap=“false” overlap=“false”> <VideoSegment id=“VS1”> <MediaTime> <MediaTimePoint> T00:00:00</MediaTimePoint> <MediaDuration>PT2M</MediaDuration> </MediaTime> <VisualDescriptor xsi:type=“GoFGoPColorType” aggregation=“average”> <ScalableColor numOfCoef=“8” numOfBitplanesDicarded=“0”> <Coeff>1 2 3 4 5 6 7 8</Coeff> </ScalableColor> </VisualDescriptor> </VideoSegment>……
…
</VideoSegment> </TemporalDecompostion> </Video></MultimediaContent></Description></Mpeg7>
50
What DS to choose..?
MPEG-7 provides DSs for description of the structure and semantics of AV content + content management
Cont.Manag.Info can be attached to individual Segments
51
Viewpoint of the structure: Segments
52
Structure description
Video Segment
Segment decomposition
• Time• Color• Motion• Texture• Shape• Annotation
• Time• Mosaic• Annotation
Moving region
Relation Linkabove
Video Segments
Moving regions
Segment decomposition
Segments decomposition
53
Segment Decomposition
timeconnectivity
54
Content structural aspects (Segment DS tree) Annotate
the whole image with StillRegion
Spatial segmentation at different levels
Among different regions we could use
SegmentRelationship description tools
55
Content structural aspects
Temporal segments
(Segment Relationship DS graph)
57
Content Semantic aspects (SemanticGraph)
58
Example of Structure-Semantic Link DS
59
Content abstraction aspects (CoAbstr)-Hierarchical summary of a video
f0
f0
f0
f00
f01
f02
- > enables rapid browsing, navigation (also sequential summary)
60
(CoAbstr)-Partitions and decompositions(ViewDecomposition DS)
Frequency-space graph
61
(CoAbstr) Content Variation
• Universal Multimedia Access: Adapt delivery to network and terminal characteristics
62
CoAbstr – A collection (Collection StructureDS)
- >groups segments, events, or objects into collection clusters and specifies properties that are common to the elements:•The CollectionStructure DS describes also statistics and models of the attribute values of the elements, such as a mean color histogram for a collection of images. •The CollectionStructure DS also describes relationships among collection clusters.
63
Reference Software: the XM
• XM implements– MPEG-7 Descriptors (Ds) – MPEG-7 Description Schemes (DSs)– Coding Schemes– DDL
extraction <--search and retrieval
<--trasnscoding
description filtering
64
Beyond mpeg-7 version 1 (D&DS in VXM)
ColorTemperature: This descriptor specifies the perceptual temperaturefeeling of illumination color in an image for browsing and display preference controlpurposes (user friendly). Four perceptual temperature browsing categories areprovided; hot, warm, moderate, and cool. Each category is used for browsing imagesbased upon its perceptual meaning. – uses dominant color descriptor
Illumination Invariant Color: wraps the color descriptors. One or more color descriptors processed by the illumination invariant method can be included in this descriptor.
Shape Variation: can describe shape variations in terms of Shape Variation Map and the statistics of the region shape description of each binary shape image in the collection. Shape Variation Map consists of StaticShapeVariation and DynamicShapeVariation. The former corresponds to 35 quantized ART coefficients on a 2-dimensional histogram of group of shape images and the latter to the inverse of the histogram except the background.
Media-centric description schemes: Three visual description schemes are designed to describe several types of visual contents. The StillRegionFeatureType contains several elementary descriptors to describe the characteristics of arbitrary shaped still regions.
65
Visual CE current phase
• CE explore new technologies on identifying original images and their modified versions (N-1 modified versions), focused on the accuracy and robustness of identification
- > robustness is measured as the accuracy (HitRatio = k/(N)) separately calculated with each level of modification
Modifications: Brightness Size reduction Color to Monochrome
JPEG compr. with varying quality factors Color reduction Crop Histogram Equalization
Blur Geometric Transformation
66
Towards MPEG-7 Query Format
- >Though, the interface to support queries in an MPEG-7 database is not yet supported, requirements have been drafted
Output Query Format
ClientApplication
MPEG-7 Database
Input Query Format
Query Management Tools
e.g-query by textual description-Combinations of query conditions-spesification of the structure of the result set
e.g. structure of
the response
containing the
resulting set
e.g-spesification of the exceptions
-relevant feedback
67
Basic search functionalities may include:
• Query by Description (the client application provides possible query criteria)
68
69
70
71
72
73
74
75
76
77
78
79