IMAGE RETRIEVAL Low-Intermediate level features€¦ · – Some colour spaces are perceptually...

IMAGE features Low-Intermediate level features

Image Features

• Features are local, meaningful, detectable parts of the image. Features are used for the

purpose of matching in image to image comparison and image retrieval, or to extract more

meaningful content for the purpose of image understanding, semantic annotation and

multimedia processing.

• Low-intermediate level features that can be used to describe contents of images are:

– Color

– Texture

– Shape

– Edge, Lines

– Gradient-based for local regions at keypoints

We shortly review here holistic representations of images, where a single descriptor is

given for the representation of the full image or large image regions and the MPEG7

standard for content representation.

We then discuss in greater detail features at local regions of interesting points, that

typically provide more useful information for the purpose of matching 2

Colour Image Formation

Observer (Camera)

Incident light Reflected light

( )( ) ( )

• Colour image formation is determined by the relative radiant power distribution of the

incident light, the reflection of the materials and the characteristics of the observer.

• The appearance of an object is determined by its reflectance and the visible wavelenghts

of the light it is exposed with (and angle).

Observer/Sensor f

Eye Response Camera Response

• Reflected light spectrum is represented by a 3 element vector

( )

Camera spectral integration

dfEB

dfEG

dfER

BSkin

GSkin

RSkin

)()()(

)()()(

)()()(

• Having the light spectrum and the spectral reflectance curve of the object the appearance of

the object depends on the spectral sensitivity of the observer: considering Tristimulus RGB

values of camera = Colour * Tristimulus.

• If the spectrum of the light source changes then the colour of the reflected light also changes.

xA

i

xxifdxxf

/

1

)()(

Color spaces

• A color space is a three-dimensional definition of a color system. The attributes of the color system are mapped onto the coordinate axes of the color space.

• Different color spaces exist: each has advantages and disadvantages for colorselection and specification for different applications:

– Some colour spaces are perceptually linear, a change in stimulus will produce the same

change in perception wherever it is applied. Other colour spaces, e.g. computer graphics

color spaces, are not linear.

– Some colour spaces are intuitive to use, i.e. it is easy for the user creating desired colours

from space navigation. Other spaces require to manage parameters with abstract

relationships to the perceived colour.

– Some colour spaces are tied to specific equipments while others are equally valid on

whatever device they are used.

– …..Models Applications

CIE

Colorimetiric

XYZ Colorimetric calculations

Device-

oriented

Non-uniform

spaces

RGB, YIQ, YCC

Storage, processing, analysis,

coding, color TV

Uniform spaces

L* a* b*, L* u* v*

Color difference evaluation,

analysis, color management

systems

Device-

oriented and

User-oriented

HSI, HSV, HSL,

I1I2I3 ....

Human color perception,

computer graphics

Munsell Human visual system

• The color matching experiment was devised in 1920 to characterize the relationship between the physical spectra and the perceived color, measuring the mixtures of different spectral distributions that are required for human observers to match colors.

• In 1931 CIE standardized a set of spectral weighting functions that model the

human perception of color. They are referred as the x, y and z color matching

functions. CIE recommended them as monochromatic primaries.

They roughly correspond to colour sensations of red green and blue. Almost any spectral composition can be achieved by a suitably chosen mix of

these three monochromatic primaries.

•

•The mixture of the three CIE primaries may be specified by three numbers called tristimulus values.

blue

greenredx, y, z color matching

functions.

1.5

1

0.5

CIE primaries and color matching functions

• The CIE XYZ color model is based on the tristimulus values of the spectral matching functions of the three primaries identified by CIE.

• The Luminance Y of a source is obtained by integrating the source’s Spectral Power Distribution, weighted by the y color matching function. The two other components X and Z, are concerned with hue and saturation and are similarly

computed using the x and z color matching functions.

CIE XYZ color model

Luminance

Chrominance

By intersecting the XYZ space with plane X+Y+Z=1 and

projecting this intersection on the x-y plane we obtain the

CIE Chromaticity Diagram

RGB color model

• The simplest way to reproduce colors is to mix the beams from lights of three different colors.

• As a consequence of the principle of superposition, the color of an additive mixture is a strict

function of the colors of the primaries and of the fraction of each primary that is mixed. The

widest range of colors will be reproduced with red, green and blue lights.

• The RGB color model is represented as a cube.

RGB color spaces

Color Space Gamut White Point Primaries

• xR yR xG yG xB yB

• HDTV (ITU-R BT.709),

• sRGB CRT D65 0.64 0.33 0.30 0.60 0.15 0.06

• scRGB Unlimited (signed) D65 0.64 0.33 0.30 0.60 0.15 0.06

• ROMM RGB Wide D50 0.7347 0.2653 0.1596 0.8404 0.0366 0.0001

• Adobe RGB 98 CRT D65 0.64 0.33 0.21 0.71 0.15 0.06

• Apple RGB CRT D65 0.625 0.34 0.28 0.595 0.155 0.07

• ……….

White point

sRGB

gamut

• sRGB (created cooperatively by HP and Microsoft in 1996) uses the same primaries as the

ITU-R BT.709 primaries, standardized for studio monitors and HDTV. It is the reference

standard used for monitors, printers and on the Internet. LCDs, digital cameras, printers, and

scanners all follow the sRGB standard.

• For this reason, one can generally assume, in the absence of any other information, that

any 8-bit-per-channel image file or any 8-bit-per-channel image API or device interface can

be treated as being in the sRGB color space.

• The transformation between XYZ and sRGB and viceversa is obtained applying a linear

tranformation followed by a second transformation as below:

where: Rlinear, Glinear and Blinear for in-gamut colors are defined to be in the range [0,1]

Clinear is Rlinear, Glinear, or Blinear, and Csrgb is Rsrgb,Gsrgb or Bsrgb: .

sRGB space

where a = 0.055

RGB cameras

• Photographic digital cameras that use a CMOS or CCD often operate with a variation of the

RGB model in a Bayer filter arrangement: green is given twice as many detectors as red and

blue (ratio 1:2:1) in order to achieve higher luminance than chrominance resolution.

• The sensor has a grid of red, green, and blue detectors arranged so that the first row is

RGRGRGRG, the next is GBGBGBGB, and that sequence is repeated in subsequent rows. For

every channel, missing pixels are obtained by interpolation to build up the complete image.

• Due to the standardization of sRGB on the Internet, on computers, and on printers, many low-

to medium-end consumer digital cameras and scanners use sRGB as the default working color

space to convert the camera RGB measurements into a standard RGB color space

Truecolor is a method of representing and storing graphical image information. Truecolor defines 256 (28)

shades of red, green, and blue for each pixel of the digital picture, which results in 2563 or 16,777,216

(approximately 16.7 million) color variations for each pixel.

• The rgb (RGB normalized) colour space aims to separate the chromatic components

from the brightness components. It is used to eliminate the influence of varying illumination. The red, green and blue channel can be transformed to their normalised counterpart r,g,b according to:

BGR

B

BGR

G

BGR

R

b

g

r

rgb colour space

• One of these normalised channels is redundant since r+g+b = 1. Therefore the normalised

RGB space is sufficiently represented by two chromatic components (e.g. r,g) and a

brightness component (R+G+B).

HSB – HLS - HSV color models

• HSI (Hue, Saturation, intensity) , HLS (Hue, Saturation, Luminance) and HSV (Hue,

Saturation, Value)…. all specify colors using three values: hue (the color dominant

wavelenght), saturation (how much the color spectral distribution is around a certain

wavelenght) and luminance (the amount of gray), closely to human perception.

HSV Color Space

•

Saturation

Hue

Value

Red (0o)

Yellow (60o)Green (120o)

Cyan

(180o)

Blue (240o) Magenta

(300o)

Black

White

HSV values can be derived by the RGB values

V = 1/3 (R+G+B)

S = 1 – [3/(R+G+B)][min(R,G,B)]

H = 180 [0.5(R-G) + (R-B)] / [(R-G)2 + (R-B)(G-B)]1/2

H = undefined if S=0

H = 360 – H if B/V GT G/V

HSI colour space definitions

BB

GG

RR

HH SS

BGR

BGR

BGR

BRGR

BGarctg

I

S

H

),,(min1

)()(

)(3

HSI colour space

• Human perception combines (R,G,B) response of the eye in opponent colours.

Opponent colours can be expressed in RGB space :

l

l

l

Luminance

BlueYellow

PurpleGreen

R G B

1

2( R G )

1

4( 2 B R G )

Opponent color space

Achromatic

Red-Green

Blue-Yellow

Yellow++

+

++

+-

+-

Red cone

Rods

Green cone

Blue cone

+

Distances in color space

• Hardware oriented color models are not perceptually uniform: uniform quantization of

these spaces results into perceptually redundant bins and perceptual holes and a

distance function such as the Euclidean doesnot provide satisfactory results.

• A difference between green and greenish-yellow is relatively large, whereas the

distance distinguishing blue and red is quite small.

• The infinitesimal distance between two colors ij with

coordinates Xi and Xj = Xi+dX in the chromaticity diagram

can be written as dij2 = cij dXidXj where cij measures the

ability of humans to perceive small differences.

• If these quantities were constant, the space would be

Euclidean and the distance between two colors would be

proportional to the lenght of their connecting line. Instead

colors corresponding to points that have the same distance

from a certain point are not perceived as similar colors.

• Mac Adams ellipses account for this phenomenon. Ellipses

are such that colors inside them are not distinguishable from

the color in the center.

• CIE solved this problem in 1976 with the development of the L*a*b* perceptual color space. Other perceptual spaces are L*u*v* and L*c*h*. All are based on transformations that approximate the XYZ Riemann space into an Euclidean space. Therefore, in the L*a*b* and L*u*v* perceptual color spaces distances can be computed as Euclidean distances:

D(c1,c2) = [ (L*1 - L*2)2 + (a*1 - a*2)

2 + (b*1 – b*2)2 ] 1/2

D(c1,c2) = [ (L*1 - L*2)2 + (u*1 - u*2)

2 + (v*1 – v*2)2 ] 1/2

• However, mathematical approximations introduced cause deviations from this property in certain parts of the spaces:

– In L*u*v* Red is more represented than Green and Blue.

– In L*a*b* there is a greater sensibility to Green than to Red and Blue. Blue is however more represented than in L*u*v* .

Perceptual color

models

L*a*b* perceptual color space

L*

u*

v*

500 550 green 600 red700

450 blue a*

b*

L*

360

750600 red

550 green

450 blue

903,3 Y/Yn if Y/Yn < 0,008856

116 (Y/Yn)1/3 otherwise

u* = 13L* (u’ – u’n) u’ = 4X/(X+15Y+3Z)

v* = 13L* (v’ – v’n) v’ = 9Y/(X+15Y+3Z)

Yn un vn are the values of the reference white

L*u*v* and L*a*b* spaces

L* =

903,3 Y/Yn if Y/Yn < 0,008856

116 (Y/Yn)1/3 otherwise

a* = 500 [f(X/Xn) – f(Y/Yn ) ]

b* = 200 [f(Y/Yn) – f(Z/Zn ) ]

f(t) = (t)1/3 for t GT 0.008856

f(t) = 7.787 t + 16/116 otherwise

Xn Yn Zn are the XYZ values of the reference white

L* =

RGB (Red Green Blue) – additive color system based on tri-chromatic theory. – easy to implement but non-linear with visual perception. – device dependent with semi-intuitive specification of colours. – very common, used in virtually every computer system, television, video etc.

HSL (Hue Saturation and Lightness), HSI (Intensity), …– intuitive specification of color. – The claimed separation of the luminance component from chrominance (colour) is stated

to have advantages in applications such as image processing. – Obtained as linear transform from RGB and therefore device dependent and non-linear. – Appropriate only for moderate luminance levels. Real world environments are not

suitably represented in this space.

HSV (Value) – Invariant under the orientation of the object with respect to the illumination and camera

direction.

Opponent color axes– Advantage of isolating the brightness information on the third axis.– Invariant to changes in illumination intensity and shadows.

CIE L*u*v*

– attempt to linearise the perceptibility of unit vector color differences.

– non-linear color space, but the conversions are reversible.

– Colouring information is centered on the colour of the white point of the system,

subscript n

– The non-linear relationship for L* u* v* is intended to mimic the logarithmic

response of the eye.

CIE L*a*b*

– based directly on CIE XYZ.

– non-linear color spaces, and the conversions are reversible.

– Colouring information is referred to the colour of the white point of the system,

subscript n.

– The non-linear relationships for L* a* and b* are the same as for CIELUV and

are intended to mimic the logarithmic response of the eye.

– Suitable for real world scene color representation

Histogram representation of color distributions

Color histogram

• We can divide the color space into a small number of zones, each of which is clearly distinct with others for human eyes. Each of the zones is assigned a sequence number beginning from zero.

• Given one image and a tassellated color space the vector holding the count of pixels for each color zone is called color histogram. The color histogram of an image defines the image colour distribution. For gray level images we can built in a similar way the grey level histogram.

General problems with histogram representations

• General problems with histogram and histogram based representations are:

– dimensionality course: histograms must have a high number of bins (>64) for a meaningful representation. This requires high-dimensional indexes has a negative impact on the efficiency of indexing methods.

– Quadratic distance form: the quadratic distance between color histograms closely mimics the way in which human perception of color differences work. Nevertheless “cross-talk” requires a significant computational complexity. Multidimensional indexes implicitely assume no cross-talk.

– Global representation: color histograms provide a global representation of image content. Sometimes this is a too inaccurate representation of image color content. In that case representation must account for local properties and spatial configuration of image regions.

– Image: I

– Quantized colors:

– Distance between two pixels

– Pixel set with color c

– Given distance in consideration: k

• Correlogram:

Dimensionality m m d if number of different k is d

• Auto-correlogram:

Dimensionality m d

Practical consideration: use auto-correlogram with m <= 64, d = 1, k = 1

• Correlogram first proposed in a paper in CVPR 1997. The main idea is

describing the global distribution of local spatial correlation of colors.

• Correlogram is based on the estimation of the probability of finding a pixel of

color j at a distance k from a pixel of color i in an image

Color Correlogram

The colour circle: warm and cool colours

Warm coloursCool colours

Color combinations can be used to define a representation of image content at the

emotional level. According to Goethe blue color gives a feeling of coolness and

yellow color has a warming effect. He divided the colors in two groups:

from red to yellow (through orange) and from green to blue (through violet)

Colors in the first group provoke excitement / cheerfulness.

Those in the second provoke weakness / unsettled feelings.

Goethe’s theories were implemented by painters in late XIXth and XXth century.

Higher level representations: emotional color models

Itten color model

hue

luminance saturation

Representative colors can be arranged in a geometric model that translates properties of color combinations into geometric relationships.These representations can be used on top of color space representation to associate some semantic meaning to color distributions.

V. Kandinsky. and J. Itten developed empirical color models where positions of colors were associated to meanings and emotions. These representations depend on the culture and behavior of individuals. Relations between

emotions and colors were further analyzed in the late XXth by semiotics.

• The Itten Color model: uses 180 fundamental colors

arranged in a sphere (the Itten-Runge sphere).

– 12 pure hues, located along the equatorial circle

– 3 saturations along the radius

– 7 luminance levels including black and white

along the meridians.

• Can be interpreted as a simplified view of the

opponent color model, and can be derived from it,

by bucketing the color space.

• Itten’s layout allows objective identification of color peculiarities and provides means for characterizing color coupling in terms of contrast and accordance

• Contrast– Saturated colors – determines a sense of tense

– Dark-light colors – determines a sense of plasticity

– Warm-cold (at any hue) – warm colors (reddish) are perceived as closer; cold colors (greenish) as farther

• Accordance (three colors forming a equilateral triangle in one of the circular sections of the sphere)

– Complementary colors (their sum is white) – determine a sense of calmness. The absence of one color determines anxiety

– Harmonic colors (their sum is grey) – sense of equilibrium

Harmonic accordance

(sphere central)

Complementary accordance

(sphere top)

Individual colors can be associated with meanings according to the local culture.

Kobayashi, 1990,Kobayashi, 1990,

Color image scaleColor image scale

Color and semiotics

coldfeminine

calm

• In commercials semiotics studied by Floch distinguish four basic

categories that can be linked to the presence/absence of visual signs

Texture

Texture

• Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result from the presence of a single color.

• Texturedness of a surface depends on the scale at which the surface is observed. Textures at a certain scale are not textures at a coarser scale. Differently from color, texture is a property associated with some pixel neighbourhood, not with a single pixel.

• Widely accepted classifications of textures are based on psychology studies, that consider how humans perceive and classify textures

• Textures can be described according to spatial, frequency or perceptual properties. The most used approaches are:

– Statistics: statistical measures are in relation with aspect properties like contrast, correlation, entropy

– Stochastic models: stochastic models assume that a texture is the result of a stochastic process that has tunable parameters. Model parameters are therefore the texture descriptors

– Structure: structure measures assume that texture is a repetition of some atomic texture

• For the purpose of matching any model can be used. For the purpose of clustering or categorization perceptual features are most significant:

– Tamura’s features (coarseness, contrast, directionality, line-likeness, regularity and roughness)

– busyness, complexity and texture strength

– repetitiveness, orientation, and complexity

Co-occurrence matrix

• A basic measure for statistical description of textures is the gray level co-occurrence matrix. Given a texture, the image co-occurrence matrix meaures the frequency of adjacent pixels. Each element P(i,j) in the matrix indicates the relative frequency at which two pixels of grey level i and j occur:

Co-occurrence matrixOriginal image

Space based models

25

P ( i j )( p1 , p 2 ) I | ( p1 i ) ( p 2 j )

I

• Statistical measures that can be obtained from GLC:

• Statistics of co-occurrence probabilities can be used to characterize

properties of a textured region. Among them:

• Wavelet parameters can be used to model textures

Wavelet transform

• Coefficients of the wavelet transform can be used to represent frequency

properties of a texture pattern. Gabor wavelet decomposition has been used.

• Used in MPEG7

Frequency-based models

• Tamura’s features : Tamura’s features are based on psycophysical studies of the characterizing elements that are perceived in textures by humans :

– Contrast

– Directionality

– Coarseness

– Linelikeness

– Regularity

– Roughness

• They can be computed as follows

Perceptual models

• Contrast

measures the way in which gray levels q vary in I and to what extent

their distribution is biased to black or white

– where:

contrast

( 4 )n

2( q m )

2Pr( q | I )

4

1

4( q m )

4Pr( q | I )

q 0

qmax

is the variance

is the kurtosis

n = 0.25 recommended

• Directionality

takes into account the edge strenght and the directional angle. They are computed using pixelwise derivatives according to Sobel’s edge detector

x, y are the pixel differences in the x and y directions

-1 0 1

-1 0 1

-1 0 1

1 1 1

0 0 0

-1 -1 -1

yx

directionality angle arctanx

y 2

edge strenght 0 .5 ( | x ( x , y ) | | y ( x , y ) | )

• Coarseness

Measures the scale of a texture. For a fixed window size a texture with

a smaller number of texture elements is said more coarse than one

with a larger number.

– At each pixel p(x,y) compute six averages for the windows of size

k = 0,1,2,..5 around the pixel

P

20 21 22 23

………

– At each pixel compute the absolute differences Ek (x,y) between

pairs of nonoverlapping averages in the horizontal and vertical

directions

– At each pixel find the value of k that minimizes the difference

Ek (x,y) in either direction. The best window size S best is 2k.

– Compute coarseness by averaging S best over the entire image

coarseness1

mn ( 2kp ( x , y )

xy

)

• To deal with textures of multiple coarseness a histogram of the

distribution of the Sbest can be derived

• Linelikeness

it is defined as the average coincidence of edge directions that cooccur

at pixels separated by a distance d along the direction

• Regularity

it is defined as:

1 – r( coarseness + contrast + directionality + linelikeness )

Being r a normalising factor and the standard deviation of the feature

in each subimage of the texture

• Roughness

it is defined as: Coarseness + Contrast

Angle

Distance =

d

IMAGE RETRIEVAL Low-Intermediate level features€¦ · – Some colour spaces are perceptually...

Documents

Transcript of IMAGE RETRIEVAL Low-Intermediate level features€¦ · – Some colour spaces are perceptually...