IMAGE RETRIEVAL Low-Intermediate level features€¦ · – Some colour spaces are perceptually...
Transcript of IMAGE RETRIEVAL Low-Intermediate level features€¦ · – Some colour spaces are perceptually...
IMAGE features Low-Intermediate level features
Image Features
• Features are local, meaningful, detectable parts of the image. Features are used for the
purpose of matching in image to image comparison and image retrieval, or to extract more
meaningful content for the purpose of image understanding, semantic annotation and
multimedia processing.
• Low-intermediate level features that can be used to describe contents of images are:
– Color
– Texture
– Shape
– Edge, Lines
– Gradient-based for local regions at keypoints
We shortly review here holistic representations of images, where a single descriptor is
given for the representation of the full image or large image regions and the MPEG7
standard for content representation.
We then discuss in greater detail features at local regions of interesting points, that
typically provide more useful information for the purpose of matching 2
Color
Colour Image Formation
Observer (Camera)
Incident light Reflected light
( )( ) ( )
• Colour image formation is determined by the relative radiant power distribution of the
incident light, the reflection of the materials and the characteristics of the observer.
• The appearance of an object is determined by its reflectance and the visible wavelenghts
of the light it is exposed with (and angle).
Observer/Sensor f
Eye Response Camera Response
• Reflected light spectrum is represented by a 3 element vector
( )
Camera spectral integration
dfEB
dfEG
dfER
BSkin
GSkin
RSkin
)()()(
)()()(
)()()(
• Having the light spectrum and the spectral reflectance curve of the object the appearance of
the object depends on the spectral sensitivity of the observer: considering Tristimulus RGB
values of camera = Colour * Tristimulus.
• If the spectrum of the light source changes then the colour of the reflected light also changes.
xA
i
xxifdxxf
/
1
)()(
Color spaces
• A color space is a three-dimensional definition of a color system. The attributes of the color system are mapped onto the coordinate axes of the color space.
• Different color spaces exist: each has advantages and disadvantages for colorselection and specification for different applications:
– Some colour spaces are perceptually linear, a change in stimulus will produce the same
change in perception wherever it is applied. Other colour spaces, e.g. computer graphics
color spaces, are not linear.
– Some colour spaces are intuitive to use, i.e. it is easy for the user creating desired colours
from space navigation. Other spaces require to manage parameters with abstract
relationships to the perceived colour.
– Some colour spaces are tied to specific equipments while others are equally valid on
whatever device they are used.
– …..Models Applications
CIE
Colorimetiric
XYZ Colorimetric calculations
Device-
oriented
Non-uniform
spaces
RGB, YIQ, YCC
Storage, processing, analysis,
coding, color TV
Uniform spaces
L* a* b*, L* u* v*
Color difference evaluation,
analysis, color management
systems
Device-
oriented and
User-oriented
HSI, HSV, HSL,
I1I2I3 ....
Human color perception,
computer graphics
Munsell Human visual system
• The color matching experiment was devised in 1920 to characterize the relationship between the physical spectra and the perceived color, measuring the mixtures of different spectral distributions that are required for human observers to match colors.
• In 1931 CIE standardized a set of spectral weighting functions that model the
human perception of color. They are referred as the x, y and z color matching
functions. CIE recommended them as monochromatic primaries.
They roughly correspond to colour sensations of red green and blue. Almost any spectral composition can be achieved by a suitably chosen mix of
these three monochromatic primaries.
•
•The mixture of the three CIE primaries may be specified by three numbers called tristimulus values.
blue
greenredx, y, z color matching
functions.
1.5
1
0.5
CIE primaries and color matching functions
• The CIE XYZ color model is based on the tristimulus values of the spectral matching functions of the three primaries identified by CIE.
• The Luminance Y of a source is obtained by integrating the source’s Spectral Power Distribution, weighted by the y color matching function. The two other components X and Z, are concerned with hue and saturation and are similarly
computed using the x and z color matching functions.
CIE XYZ color model
Luminance
Chrominance
By intersecting the XYZ space with plane X+Y+Z=1 and
projecting this intersection on the x-y plane we obtain the
CIE Chromaticity Diagram
RGB color model
• The simplest way to reproduce colors is to mix the beams from lights of three different colors.
• As a consequence of the principle of superposition, the color of an additive mixture is a strict
function of the colors of the primaries and of the fraction of each primary that is mixed. The
widest range of colors will be reproduced with red, green and blue lights.
• The RGB color model is represented as a cube.
RGB color spaces
Color Space Gamut White Point Primaries
• xR yR xG yG xB yB
• HDTV (ITU-R BT.709),
• sRGB CRT D65 0.64 0.33 0.30 0.60 0.15 0.06
• scRGB Unlimited (signed) D65 0.64 0.33 0.30 0.60 0.15 0.06
• ROMM RGB Wide D50 0.7347 0.2653 0.1596 0.8404 0.0366 0.0001
• Adobe RGB 98 CRT D65 0.64 0.33 0.21 0.71 0.15 0.06
• Apple RGB CRT D65 0.625 0.34 0.28 0.595 0.155 0.07
• ……….
White point
sRGB
gamut
• sRGB (created cooperatively by HP and Microsoft in 1996) uses the same primaries as the
ITU-R BT.709 primaries, standardized for studio monitors and HDTV. It is the reference
standard used for monitors, printers and on the Internet. LCDs, digital cameras, printers, and
scanners all follow the sRGB standard.
• For this reason, one can generally assume, in the absence of any other information, that
any 8-bit-per-channel image file or any 8-bit-per-channel image API or device interface can
be treated as being in the sRGB color space.
• The transformation between XYZ and sRGB and viceversa is obtained applying a linear
tranformation followed by a second transformation as below:
where: Rlinear, Glinear and Blinear for in-gamut colors are defined to be in the range [0,1]
Clinear is Rlinear, Glinear, or Blinear, and Csrgb is Rsrgb,Gsrgb or Bsrgb: .
sRGB space
where a = 0.055
RGB cameras
• Photographic digital cameras that use a CMOS or CCD often operate with a variation of the
RGB model in a Bayer filter arrangement: green is given twice as many detectors as red and
blue (ratio 1:2:1) in order to achieve higher luminance than chrominance resolution.
• The sensor has a grid of red, green, and blue detectors arranged so that the first row is
RGRGRGRG, the next is GBGBGBGB, and that sequence is repeated in subsequent rows. For
every channel, missing pixels are obtained by interpolation to build up the complete image.
• Due to the standardization of sRGB on the Internet, on computers, and on printers, many low-
to medium-end consumer digital cameras and scanners use sRGB as the default working color
space to convert the camera RGB measurements into a standard RGB color space
Truecolor is a method of representing and storing graphical image information. Truecolor defines 256 (28)
shades of red, green, and blue for each pixel of the digital picture, which results in 2563 or 16,777,216
(approximately 16.7 million) color variations for each pixel.
• The rgb (RGB normalized) colour space aims to separate the chromatic components
from the brightness components. It is used to eliminate the influence of varying illumination. The red, green and blue channel can be transformed to their normalised counterpart r,g,b according to:
BGR
B
BGR
G
BGR
R
b
g
r
rgb colour space
• One of these normalised channels is redundant since r+g+b = 1. Therefore the normalised
RGB space is sufficiently represented by two chromatic components (e.g. r,g) and a
brightness component (R+G+B).
HSB – HLS - HSV color models
• HSI (Hue, Saturation, intensity) , HLS (Hue, Saturation, Luminance) and HSV (Hue,
Saturation, Value)…. all specify colors using three values: hue (the color dominant
wavelenght), saturation (how much the color spectral distribution is around a certain
wavelenght) and luminance (the amount of gray), closely to human perception.
HSV Color Space
•
Saturation
Hue
Value
Red (0o)
Yellow (60o)Green (120o)
Cyan
(180o)
Blue (240o) Magenta
(300o)
Black
White
HSV values can be derived by the RGB values
V = 1/3 (R+G+B)
S = 1 – [3/(R+G+B)][min(R,G,B)]
H = 180 [0.5(R-G) + (R-B)] / [(R-G)2 + (R-B)(G-B)]1/2
H = undefined if S=0
H = 360 – H if B/V GT G/V
HSI colour space definitions
BB
GG
RR
HH SS
BGR
BGR
BGR
BRGR
BGarctg
I
S
H
),,(min1
)()(
)(3
HSI colour space
• Human perception combines (R,G,B) response of the eye in opponent colours.
Opponent colours can be expressed in RGB space :
l
l
l
Luminance
BlueYellow
PurpleGreen
R G B
1
2( R G )
1
4( 2 B R G )
Opponent color space
Achromatic
Red-Green
Blue-Yellow
Yellow++
+
++
+-
+-
Red cone
Rods
Green cone
Blue cone
+
Distances in color space
• Hardware oriented color models are not perceptually uniform: uniform quantization of
these spaces results into perceptually redundant bins and perceptual holes and a
distance function such as the Euclidean doesnot provide satisfactory results.
• A difference between green and greenish-yellow is relatively large, whereas the
distance distinguishing blue and red is quite small.
• The infinitesimal distance between two colors ij with
coordinates Xi and Xj = Xi+dX in the chromaticity diagram
can be written as dij2 = cij dXidXj where cij measures the
ability of humans to perceive small differences.
• If these quantities were constant, the space would be
Euclidean and the distance between two colors would be
proportional to the lenght of their connecting line. Instead
colors corresponding to points that have the same distance
from a certain point are not perceived as similar colors.
• Mac Adams ellipses account for this phenomenon. Ellipses
are such that colors inside them are not distinguishable from
the color in the center.
• CIE solved this problem in 1976 with the development of the L*a*b* perceptual color space. Other perceptual spaces are L*u*v* and L*c*h*. All are based on transformations that approximate the XYZ Riemann space into an Euclidean space. Therefore, in the L*a*b* and L*u*v* perceptual color spaces distances can be computed as Euclidean distances:
D(c1,c2) = [ (L*1 - L*2)2 + (a*1 - a*2)
2 + (b*1 – b*2)2 ] 1/2
D(c1,c2) = [ (L*1 - L*2)2 + (u*1 - u*2)
2 + (v*1 – v*2)2 ] 1/2
• However, mathematical approximations introduced cause deviations from this property in certain parts of the spaces:
– In L*u*v* Red is more represented than Green and Blue.
– In L*a*b* there is a greater sensibility to Green than to Red and Blue. Blue is however more represented than in L*u*v* .
Perceptual color
models
L*a*b* perceptual color space
L*
u*
v*
500 550 green 600 red700
450 blue a*
b*
L*
360
750600 red
550 green
450 blue
903,3 Y/Yn if Y/Yn < 0,008856
116 (Y/Yn)1/3 otherwise
u* = 13L* (u’ – u’n) u’ = 4X/(X+15Y+3Z)
v* = 13L* (v’ – v’n) v’ = 9Y/(X+15Y+3Z)
Yn un vn are the values of the reference white
L*u*v* and L*a*b* spaces
L* =
903,3 Y/Yn if Y/Yn < 0,008856
116 (Y/Yn)1/3 otherwise
a* = 500 [f(X/Xn) – f(Y/Yn ) ]
b* = 200 [f(Y/Yn) – f(Z/Zn ) ]
f(t) = (t)1/3 for t GT 0.008856
f(t) = 7.787 t + 16/116 otherwise
Xn Yn Zn are the XYZ values of the reference white
L* =
RGB (Red Green Blue) – additive color system based on tri-chromatic theory. – easy to implement but non-linear with visual perception. – device dependent with semi-intuitive specification of colours. – very common, used in virtually every computer system, television, video etc.
HSL (Hue Saturation and Lightness), HSI (Intensity), …– intuitive specification of color. – The claimed separation of the luminance component from chrominance (colour) is stated
to have advantages in applications such as image processing. – Obtained as linear transform from RGB and therefore device dependent and non-linear. – Appropriate only for moderate luminance levels. Real world environments are not
suitably represented in this space.
HSV (Value) – Invariant under the orientation of the object with respect to the illumination and camera
direction.
Opponent color axes– Advantage of isolating the brightness information on the third axis.– Invariant to changes in illumination intensity and shadows.
CIE L*u*v*
– attempt to linearise the perceptibility of unit vector color differences.
– non-linear color space, but the conversions are reversible.
– Colouring information is centered on the colour of the white point of the system,
subscript n
– The non-linear relationship for L* u* v* is intended to mimic the logarithmic
response of the eye.
CIE L*a*b*
– based directly on CIE XYZ.
– non-linear color spaces, and the conversions are reversible.
– Colouring information is referred to the colour of the white point of the system,
subscript n.
– The non-linear relationships for L* a* and b* are the same as for CIELUV and
are intended to mimic the logarithmic response of the eye.
– Suitable for real world scene color representation
Histogram representation of color distributions
Color histogram
• We can divide the color space into a small number of zones, each of which is clearly distinct with others for human eyes. Each of the zones is assigned a sequence number beginning from zero.
• Given one image and a tassellated color space the vector holding the count of pixels for each color zone is called color histogram. The color histogram of an image defines the image colour distribution. For gray level images we can built in a similar way the grey level histogram.
General problems with histogram representations
• General problems with histogram and histogram based representations are:
– dimensionality course: histograms must have a high number of bins (>64) for a meaningful representation. This requires high-dimensional indexes has a negative impact on the efficiency of indexing methods.
– Quadratic distance form: the quadratic distance between color histograms closely mimics the way in which human perception of color differences work. Nevertheless “cross-talk” requires a significant computational complexity. Multidimensional indexes implicitely assume no cross-talk.
– Global representation: color histograms provide a global representation of image content. Sometimes this is a too inaccurate representation of image color content. In that case representation must account for local properties and spatial configuration of image regions.
– Image: I
– Quantized colors:
– Distance between two pixels
– Pixel set with color c
– Given distance in consideration: k
• Correlogram:
Dimensionality m m d if number of different k is d
• Auto-correlogram:
Dimensionality m d
Practical consideration: use auto-correlogram with m <= 64, d = 1, k = 1
• Correlogram first proposed in a paper in CVPR 1997. The main idea is
describing the global distribution of local spatial correlation of colors.
• Correlogram is based on the estimation of the probability of finding a pixel of
color j at a distance k from a pixel of color i in an image
Color Correlogram
The colour circle: warm and cool colours
Warm coloursCool colours
Color combinations can be used to define a representation of image content at the
emotional level. According to Goethe blue color gives a feeling of coolness and
yellow color has a warming effect. He divided the colors in two groups:
from red to yellow (through orange) and from green to blue (through violet)
Colors in the first group provoke excitement / cheerfulness.
Those in the second provoke weakness / unsettled feelings.
Goethe’s theories were implemented by painters in late XIXth and XXth century.
Higher level representations: emotional color models
Itten color model
hue
luminance saturation
Representative colors can be arranged in a geometric model that translates properties of color combinations into geometric relationships.These representations can be used on top of color space representation to associate some semantic meaning to color distributions.
V. Kandinsky. and J. Itten developed empirical color models where positions of colors were associated to meanings and emotions. These representations depend on the culture and behavior of individuals. Relations between
emotions and colors were further analyzed in the late XXth by semiotics.
• The Itten Color model: uses 180 fundamental colors
arranged in a sphere (the Itten-Runge sphere).
– 12 pure hues, located along the equatorial circle
– 3 saturations along the radius
– 7 luminance levels including black and white
along the meridians.
• Can be interpreted as a simplified view of the
opponent color model, and can be derived from it,
by bucketing the color space.
• Itten’s layout allows objective identification of color peculiarities and provides means for characterizing color coupling in terms of contrast and accordance
• Contrast– Saturated colors – determines a sense of tense
– Dark-light colors – determines a sense of plasticity
– Warm-cold (at any hue) – warm colors (reddish) are perceived as closer; cold colors (greenish) as farther
• Accordance (three colors forming a equilateral triangle in one of the circular sections of the sphere)
– Complementary colors (their sum is white) – determine a sense of calmness. The absence of one color determines anxiety
– Harmonic colors (their sum is grey) – sense of equilibrium
Harmonic accordance
(sphere central)
Complementary accordance
(sphere top)
Individual colors can be associated with meanings according to the local culture.
Kobayashi, 1990,Kobayashi, 1990,
Color image scaleColor image scale
Color and semiotics
coldfeminine
calm
• In commercials semiotics studied by Floch distinguish four basic
categories that can be linked to the presence/absence of visual signs
Texture
Texture
• Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result from the presence of a single color.
• Texturedness of a surface depends on the scale at which the surface is observed. Textures at a certain scale are not textures at a coarser scale. Differently from color, texture is a property associated with some pixel neighbourhood, not with a single pixel.
• Widely accepted classifications of textures are based on psychology studies, that consider how humans perceive and classify textures
• Textures can be described according to spatial, frequency or perceptual properties. The most used approaches are:
– Statistics: statistical measures are in relation with aspect properties like contrast, correlation, entropy
– Stochastic models: stochastic models assume that a texture is the result of a stochastic process that has tunable parameters. Model parameters are therefore the texture descriptors
– Structure: structure measures assume that texture is a repetition of some atomic texture
• For the purpose of matching any model can be used. For the purpose of clustering or categorization perceptual features are most significant:
– Tamura’s features (coarseness, contrast, directionality, line-likeness, regularity and roughness)
– busyness, complexity and texture strength
– repetitiveness, orientation, and complexity
Co-occurrence matrix
• A basic measure for statistical description of textures is the gray level co-occurrence matrix. Given a texture, the image co-occurrence matrix meaures the frequency of adjacent pixels. Each element P(i,j) in the matrix indicates the relative frequency at which two pixels of grey level i and j occur:
Co-occurrence matrixOriginal image
Space based models
25
P ( i j )( p1 , p 2 ) I | ( p1 i ) ( p 2 j )
I
• Statistical measures that can be obtained from GLC:
• Statistics of co-occurrence probabilities can be used to characterize
properties of a textured region. Among them:
• Wavelet parameters can be used to model textures
Wavelet transform
• Coefficients of the wavelet transform can be used to represent frequency
properties of a texture pattern. Gabor wavelet decomposition has been used.
• Used in MPEG7
Frequency-based models
• Tamura’s features : Tamura’s features are based on psycophysical studies of the characterizing elements that are perceived in textures by humans :
– Contrast
– Directionality
– Coarseness
– Linelikeness
– Regularity
– Roughness
• They can be computed as follows
Perceptual models
• Contrast
measures the way in which gray levels q vary in I and to what extent
their distribution is biased to black or white
– where:
contrast
( 4 )n
2( q m )
2Pr( q | I )
4
1
4( q m )
4Pr( q | I )
q 0
qmax
is the variance
is the kurtosis
n = 0.25 recommended
• Directionality
takes into account the edge strenght and the directional angle. They are computed using pixelwise derivatives according to Sobel’s edge detector
x, y are the pixel differences in the x and y directions
-1 0 1
-1 0 1
-1 0 1
1 1 1
0 0 0
-1 -1 -1
yx
directionality angle arctanx
y 2
edge strenght 0 .5 ( | x ( x , y ) | | y ( x , y ) | )
• Coarseness
Measures the scale of a texture. For a fixed window size a texture with
a smaller number of texture elements is said more coarse than one
with a larger number.
– At each pixel p(x,y) compute six averages for the windows of size
k = 0,1,2,..5 around the pixel
P
20 21 22 23
………
– At each pixel compute the absolute differences Ek (x,y) between
pairs of nonoverlapping averages in the horizontal and vertical
directions
– At each pixel find the value of k that minimizes the difference
Ek (x,y) in either direction. The best window size S best is 2k.
– Compute coarseness by averaging S best over the entire image
coarseness1
mn ( 2kp ( x , y )
xy
)
• To deal with textures of multiple coarseness a histogram of the
distribution of the Sbest can be derived
• Linelikeness
it is defined as the average coincidence of edge directions that cooccur
at pixels separated by a distance d along the direction
• Regularity
it is defined as:
1 – r( coarseness + contrast + directionality + linelikeness )
Being r a normalising factor and the standard deviation of the feature
in each subimage of the texture
• Roughness
it is defined as: Coarseness + Contrast
Angle
Distance =
d