Image Understanding

Introduction

Computer visionGive machines the ability to seeThe goal is to duplicate the effect of human

visual processingWe live in a 3-D world, but camera sensors

can only capture 2-D information.Computer vision is the “flip” side of computer

graphics – but much harder!

Introduction

Computer vision is composed of: Image processing Image analysis Image understanding

Introduction

Image processingThe goal is to present the image to the

system in a useful form image capture and early processing remove noise detect luminance differences detect edges enhance image

Introduction

Image analysisThe goal is to extract useful information from

the processed image identify boundaries find connected components label regions segment parts of objects group parts together into whole objects

Introduction

Image understandingThe goal is to make sense of the information.

Draw qualitative, or semantic, conclusions from the quantitative information.

make a decision about the quantitative information classify the parts recognize objects understand the objects’ usage and the meaning of

the scene

Introduction

Image understanding uses techniques and methods from:Physics – models of the visual worldMathematics - statistics and differential

calculusSpatial pattern recognitionArtificial intelligencePsychophysics

Robot: Count all the Chairs

Source: Bülthoff, Max Planck Institute for Biological Cybernetics (MPIK), Tübingen, Germanyhttp://www.ercim.org/publication/Ercim_News/enw53/christensen.html

Robot: Which One is a Car?

The Importance of Context

Are these letters “A” or “H”?

The Importance of Context

Hwo nmya wrdso cna oyu erad ni ihts entsnece?

Low-level Representations

Low-level: little knowledge about the world The data that is manipulated usually resembles the

data that is captured. For example, if the image is captured using a CCD camera (2-D), the representation can be described by an image function whose value is brightness depending on 2 parameters: the x-y coordinates of the location of the brightness value.

High-level Representations

High-level: incorporate knowledge about the world external to the image Image may be mapped to a formalized model of the

world (model may change dynamically as new information becomes available)

Data to be processed is dramatically reduced: instead of dealing with pixel values, deal with features such as shape, size, relationships, etc

Usually expressed in symbolic form

Low-level Mechanisms

Low-level vision only takes us to the sophistication of a very expensive digital camera

High-level Mechanisms

High-level vision and perception requires brain functions that we do not fully understand yet

High-level Mechanisms

Image from https://plus.google.com/107117483540235115863/posts/MBtyGRBvwkH

Bottom-up or Top-down?Top-Down?Bottom-up?

Inf o

r ma t

ion

flowInform

ation flow

Visual Completion:

Top-down Control

Expectation and Learning

From Palmer (1999)

Occlusion Illusion

Which semi-circle appears larger?

The Human Visual System

Optical information from the eyes is transmitted to the primary visual cortex in the occipital lobe at the back of the head.


- 20 mm focal length lens- iris controls amount of light entering eye by changing the size of the pupil

Light enters the eye through the cornea, aqueous humor, lens, and vitreous humor before striking the light-sensitive receptors of the retina.

After striking the retina, light is converted into electrochemical signals that are carried to the brain via the optic nerve.


image from www.photo.net/photo/edscott/vis00010.htm


From Palmer (1999)

The distribution of rods and cones across the retina is highly uneven

The fovea contains the highest concentration of cones for high visual acuity

How much do we really see?

+

How much do we really see?

+ If you can read this you must be cheating

Change Blindness

Lack of attention to an object causes failure to perceive it

People find it difficult to detect major changes in a scene if those changes occur in objects that are not the focus of attention

Our impression that our visual capabilities give us a rich, complete, and detailed representation of the world around us is a grand illusion!

Center-Surround Organization The receptive field of a neuron in the retina can be described as

having a center-surround organization. When light covers the receptive field uniformly, a random pattern of action potentials results. However, if light activates only the central part of the receptive field and not the surrounding area, an elevated response in terms of the firing rate with respect to the random response will result, and the neuron is said to have an on-center/off-surround organization. For this case, light activating only the inhibitory surround will cause a significant decrease in the firing rate. A neuron exhibiting the opposite pattern of activation is said to have an off-center/on-surround organization.

Center-Surround OrganizationStimulus

On-Center/Off-Surround Off-Center/On-Surround

Response Response

Center-Surround Organization and Contrast Sensitivity

Spatial frequency (cycles per degree)

Con

tras

t

1 10 100high

low

1 10 100Spatial frequency (cycles per degree)

Con

tras

t

high

low

Center-Surround Organization and Contrast Sensitivity

Lateral Inhibition

Lateral Inhibition

10

5

Input light level

Output perception

Lateral Inhibition

A biological neural network in which neurons inhibit spatially neighboring neurons. Architecture of first few layers of retina.

Input light level

Receptors

Output Cells

10 10 10 5 5 5

Output perception 3 3 2 7 6 6

10-2-2 = 10-2-2 = 10-2-1 = 5-2-1 = 5-1-1 = 5-1-1 =

+1 +1 +1 +1 +1 +1

-0.2 -0.2-0.2 -0.2-0.2 -0.2 -0.2

10 5

Lateral Inhibition

Lateral Inhibition

+ +----

- -

Lots of inhibition

Not much inhibition

Less lateral inhibition in the fovea as compared to the periphery?

Simultaneous Contrast

Two regions that have identical spectra result in different color (lightness) perceptions due to the spectra of the surrounding regions

Background color can visibly affect the perceived color of the target


Profile

Lightintensity

Horizontal position

left square right square

5

10

0


5

10

0

5 10 10 5 5 10 10 0 0 5 5 0 0 5

5 10 10 5 5 10 10 0 0 5 5 0 0 5Light intensity

left square

right square

Excitation (+1)

Left inhibition (-0.2) -1 -1 -2 -2 -1 -1 -2 -2 0 0 -1 -1 0 0

-2 -2 -1 -1 -2 -2 0 0 -1 -1 0 0 -1 -1

2 7 7 2 2 7 8 -2 -1 4 4 -1 -1 4

Right inhibition (-0.2)

Output (Sum)

left square right square

Simultaneous Contrast?

According to simultaneous contrast theory, the gray cross on the left shouldappear lighter than the cross on the right, because it is surrounded by darksquares. Instead, it appears darker. Could it be because we prefer to see agray square floating over a white (black) background, rather than a cross?

Lightness Constancy

Indoors - 100 units of light total.White paper reflects 90 units, and black ink reflects 10 units.

Outdoors - 10,000 units of light total.White paper reflects 9000 units, and black ink reflects 1000 units.

Why does the black ink outside (1000 units reflected) look darkerthan the white page does indoors (only 90 units reflected)?

Color Vision

The objective description of color is that it is thevisible portion of the electro-magnetic spectrum.


Color Vision

“The rays to speak properly are not colored ... Colors in the object are nothing but a disposition to reflect this or that sort of rays more copiously than the rest.”

- Sir Isaac Newton, 1666

Color Vision

S-cones

M-cones

L-cones

The physical description of color is that itis the spectral response of three types of cones.


violet blue green yellow orange red | | | | | |

Color Vision

The psychological description of color is that itis a point in a three-dimensional color space.

hue

lightness

saturation

Theories of Color Vision

Trichromatic theoryPalmer (1777), Sir Thomas Young (1802),

Maxwell (1855), Helmholtz (1867/1927) Opponent Process theory

Hering (1867/1964) Dual Process theory

von Kries (1905), Müller and Schrödinger (1920s), Hurvich and Jameson (1957)

Trichromatic Theory

The pattern of activation across the three receptor types determines the perceived color

Evidence in support of theory 3 colors are sufficient to match any color Explains color blindness

Opponent Process Theory

The three receptor types define a polarity between red/green, blue/yellow, and black/white

Evidence in support of theoryColor experiences are always lost in certain

pairs: red/green or blue/yellowYellow seems to be a primary color - not a

mixture of other colors

Opponent Process Theory

Dual Process Theory

2 stages - trichromatic stage, followed by a opponent process stage

Evidence in support of theoryThe amount of “blueness” in any given light

can be measured by mixing it with enough “yellow” light to neutralize the blueness (the resulting light looks neither blue nor yellow)

Reparameterizing Color Space

lightness

saturation

M cones

S cones

L cones

Black

White

Yellow

Blue

Red Green

Photoreceptor Responses Color-Opponent Space Hue, Saturation, Lightness

Physical Psychological

hue

Spatial Frequency Analysis

http://www.billcasselman.com/sinewave.gif

Period T = 1 Cycle

Frequency f = 1/T

Amplitude


Four Cycles of a Single Sinusoidal Wavein the Spatial Domain, Orientation = 0 degrees


An image consists of the summation of a very large number of sine waves of varying amplitude, frequency, orientation, and phase

Lena image from http://www.ece.rice.edu/~wakin/images/lenaTest3.jpg


increasing frequency

Amplitude

Many Sinusoidal WavesIn the Frequency Domain

Amplitude Spectrum


increasing frequency

Phase Phase Spectrum

Many Sinusoidal WavesIn the Frequency Domain


Amplitude Lena

Amplitude Peppers

Phase Lena Phase Peppers

Phase Wins!Image by: Thomas Kinsman, CIS

Aliasing High-frequency information can be perceived as low-frequency

information if the sampling rate is too low Applies to temporal as well as spatial frequencies

http://www.svi.nl/wikiimg/StFargeaux_kasteel_buiten1_aliased.jpg

Aliasing

A function in the spatial domain, f(x), is band-limited if it has a highest frequency s0 in the frequency domain, F(s).

x

f(x)

s- s0 s0

F(s)

Spatial Domain Frequency (spectral) Domain

Aliasing

If we sample f(x) at equal intervals, , we get multiple copies of the spectrum in the frequency domain:

Multiplying f(x) by a sampling (delta, or “spike”) function is equivalent to a convolution of F(s) with the Fourier transform of the sampling function. This is known as the Convolution Theorem.

s- s0 s0

G(s)

x

g(x)

- 1/ 1/

g(x) is the sampled functionG(s) is the sampled function inthe frequency domain

Aliasing

Can we recover the original function intact from the sample points? In other words, can we recover F(s) from G(s)?Yes, if we eliminate all of the replicas of F(s),

except the central one. To do this, multiply G(s) by a window function,

or convolve the sampled function g(x) with an interpolation function, sinc (x) = sin (x) / x

Aliasing

Restrictions: f(x) must be band-limited (have a highest

frequency) at s0 and:The relationship between the sampling

interval and the band-limit, s0 must be: < = 1 / (2 • s0)

A function sampled at a uniform spacing can be completely recovered from the samples, provided thatwhere the function is band-limited at s0

< = 1 / (2 • s0)

Aliasing

Under-samplingSuppose that Then the replicas will overlap and sum

together

> 1 / (2 • s0)

s- s0 s0

G(s)

x

g(x)

- 1/ 1/

Energy above the frequency s0 is “folded back” below s0, making the high frequency components appear to below-frequency - this is known as aliasing

Image Understanding

Documents

Transcript of Image Understanding