Image Understanding
description
Transcript of Image Understanding
Image Understanding
Introduction
Computer visionGive machines the ability to seeThe goal is to duplicate the effect of human
visual processingWe live in a 3-D world, but camera sensors
can only capture 2-D information.Computer vision is the “flip” side of computer
graphics – but much harder!
Introduction
Computer vision is composed of: Image processing Image analysis Image understanding
Introduction
Image processingThe goal is to present the image to the
system in a useful form image capture and early processing remove noise detect luminance differences detect edges enhance image
Introduction
Image analysisThe goal is to extract useful information from
the processed image identify boundaries find connected components label regions segment parts of objects group parts together into whole objects
Introduction
Image understandingThe goal is to make sense of the information.
Draw qualitative, or semantic, conclusions from the quantitative information.
make a decision about the quantitative information classify the parts recognize objects understand the objects’ usage and the meaning of
the scene
Introduction
Image understanding uses techniques and methods from:Physics – models of the visual worldMathematics - statistics and differential
calculusSpatial pattern recognitionArtificial intelligencePsychophysics
Robot: Count all the Chairs
Source: Bülthoff, Max Planck Institute for Biological Cybernetics (MPIK), Tübingen, Germanyhttp://www.ercim.org/publication/Ercim_News/enw53/christensen.html
Robot: Which One is a Car?
The Importance of Context
Are these letters “A” or “H”?
The Importance of Context
Are these letters “A” or “H”?
The Importance of Context
Hwo nmya wrdso cna oyu erad ni ihts entsnece?
Low-level Representations
Low-level: little knowledge about the world The data that is manipulated usually resembles the
data that is captured. For example, if the image is captured using a CCD camera (2-D), the representation can be described by an image function whose value is brightness depending on 2 parameters: the x-y coordinates of the location of the brightness value.
High-level Representations
High-level: incorporate knowledge about the world external to the image Image may be mapped to a formalized model of the
world (model may change dynamically as new information becomes available)
Data to be processed is dramatically reduced: instead of dealing with pixel values, deal with features such as shape, size, relationships, etc
Usually expressed in symbolic form
Low-level Mechanisms
Low-level vision only takes us to the sophistication of a very expensive digital camera
High-level Mechanisms
High-level vision and perception requires brain functions that we do not fully understand yet
High-level Mechanisms
Image from https://plus.google.com/107117483540235115863/posts/MBtyGRBvwkH
Bottom-up or Top-down?Top-Down?Bottom-up?
Inf o
r ma t
ion
flowInform
ation flow
Visual Completion:
Top-down Control
Visual Completion:
Top-down Control
Visual Completion:
Top-down Control
Visual Completion:
Top-down Control
Expectation and Learning
From Palmer (1999)
Occlusion Illusion
Which semi-circle appears larger?
Occlusion Illusion
Which semi-circle appears larger?
The Human Visual System
Optical information from the eyes is transmitted to the primary visual cortex in the occipital lobe at the back of the head.
The Human Visual System
- 20 mm focal length lens- iris controls amount of light entering eye by changing the size of the pupil
Light enters the eye through the cornea, aqueous humor, lens, and vitreous humor before striking the light-sensitive receptors of the retina.
After striking the retina, light is converted into electrochemical signals that are carried to the brain via the optic nerve.
The Human Visual System
image from www.photo.net/photo/edscott/vis00010.htm
The Human Visual System
From Palmer (1999)
The distribution of rods and cones across the retina is highly uneven
The fovea contains the highest concentration of cones for high visual acuity
How much do we really see?
+
How much do we really see?
+ If you can read this you must be cheating
Change Blindness
Lack of attention to an object causes failure to perceive it
People find it difficult to detect major changes in a scene if those changes occur in objects that are not the focus of attention
Our impression that our visual capabilities give us a rich, complete, and detailed representation of the world around us is a grand illusion!
Center-Surround Organization The receptive field of a neuron in the retina can be described as
having a center-surround organization. When light covers the receptive field uniformly, a random pattern of action potentials results. However, if light activates only the central part of the receptive field and not the surrounding area, an elevated response in terms of the firing rate with respect to the random response will result, and the neuron is said to have an on-center/off-surround organization. For this case, light activating only the inhibitory surround will cause a significant decrease in the firing rate. A neuron exhibiting the opposite pattern of activation is said to have an off-center/on-surround organization.
Center-Surround OrganizationStimulus
On-Center/Off-Surround Off-Center/On-Surround
Response Response
Center-Surround Organization and Contrast Sensitivity
Spatial frequency (cycles per degree)
Con
tras
t
1 10 100high
low
Center-Surround Organization and Contrast Sensitivity
Spatial frequency (cycles per degree)
Con
tras
t
1 10 100high
low
1 10 100Spatial frequency (cycles per degree)
Con
tras
t
high
low
Center-Surround Organization and Contrast Sensitivity
Lateral Inhibition
Lateral Inhibition
10
5
Input light level
Output perception
Lateral Inhibition
A biological neural network in which neurons inhibit spatially neighboring neurons. Architecture of first few layers of retina.
Input light level
Receptors
Output Cells
10 10 10 5 5 5
Output perception 3 3 2 7 6 6
10-2-2 = 10-2-2 = 10-2-1 = 5-2-1 = 5-1-1 = 5-1-1 =
+1 +1 +1 +1 +1 +1
-0.2 -0.2-0.2 -0.2-0.2 -0.2 -0.2
10 5
Lateral Inhibition
Lateral Inhibition
+ +----
- -
Lots of inhibition
Not much inhibition
Less lateral inhibition in the fovea as compared to the periphery?
Simultaneous Contrast
Two regions that have identical spectra result in different color (lightness) perceptions due to the spectra of the surrounding regions
Background color can visibly affect the perceived color of the target
Simultaneous Contrast
Simultaneous Contrast
Simultaneous Contrast
Profile
Lightintensity
Horizontal position
left square right square
5
10
0
Simultaneous Contrast
5
10
0
5 10 10 5 5 10 10 0 0 5 5 0 0 5
5 10 10 5 5 10 10 0 0 5 5 0 0 5Light intensity
left square
right square
Excitation (+1)
Left inhibition (-0.2) -1 -1 -2 -2 -1 -1 -2 -2 0 0 -1 -1 0 0
-2 -2 -1 -1 -2 -2 0 0 -1 -1 0 0 -1 -1
2 7 7 2 2 7 8 -2 -1 4 4 -1 -1 4
Right inhibition (-0.2)
Output (Sum)
left square right square
Simultaneous Contrast?
According to simultaneous contrast theory, the gray cross on the left shouldappear lighter than the cross on the right, because it is surrounded by darksquares. Instead, it appears darker. Could it be because we prefer to see agray square floating over a white (black) background, rather than a cross?
Lightness Constancy
Indoors - 100 units of light total.White paper reflects 90 units, and black ink reflects 10 units.
Outdoors - 10,000 units of light total.White paper reflects 9000 units, and black ink reflects 1000 units.
Why does the black ink outside (1000 units reflected) look darkerthan the white page does indoors (only 90 units reflected)?
Color Vision
The objective description of color is that it is thevisible portion of the electro-magnetic spectrum.
image from www.photo.net/photo/edscott/vis00010.htm
Color Vision
“The rays to speak properly are not colored ... Colors in the object are nothing but a disposition to reflect this or that sort of rays more copiously than the rest.”
- Sir Isaac Newton, 1666
Color Vision
S-cones
M-cones
L-cones
The physical description of color is that itis the spectral response of three types of cones.
image from www.photo.net/photo/edscott/vis00010.htm
violet blue green yellow orange red | | | | | |
Color Vision
The psychological description of color is that itis a point in a three-dimensional color space.
hue
lightness
saturation
Theories of Color Vision
Trichromatic theoryPalmer (1777), Sir Thomas Young (1802),
Maxwell (1855), Helmholtz (1867/1927) Opponent Process theory
Hering (1867/1964) Dual Process theory
von Kries (1905), Müller and Schrödinger (1920s), Hurvich and Jameson (1957)
Trichromatic Theory
The pattern of activation across the three receptor types determines the perceived color
Evidence in support of theory 3 colors are sufficient to match any color Explains color blindness
Opponent Process Theory
The three receptor types define a polarity between red/green, blue/yellow, and black/white
Evidence in support of theoryColor experiences are always lost in certain
pairs: red/green or blue/yellowYellow seems to be a primary color - not a
mixture of other colors
Opponent Process Theory
Opponent Process Theory
Dual Process Theory
2 stages - trichromatic stage, followed by a opponent process stage
Evidence in support of theoryThe amount of “blueness” in any given light
can be measured by mixing it with enough “yellow” light to neutralize the blueness (the resulting light looks neither blue nor yellow)
Reparameterizing Color Space
lightness
saturation
M cones
S cones
L cones
Black
White
Yellow
Blue
Red Green
Photoreceptor Responses Color-Opponent Space Hue, Saturation, Lightness
Physical Psychological
hue
Spatial Frequency Analysis
http://www.billcasselman.com/sinewave.gif
Period T = 1 Cycle
Frequency f = 1/T
Amplitude
Spatial Frequency Analysis
Four Cycles of a Single Sinusoidal Wavein the Spatial Domain, Orientation = 0 degrees
Spatial Frequency Analysis
An image consists of the summation of a very large number of sine waves of varying amplitude, frequency, orientation, and phase
Lena image from http://www.ece.rice.edu/~wakin/images/lenaTest3.jpg
Spatial Frequency Analysis
increasing frequency
Amplitude
Many Sinusoidal WavesIn the Frequency Domain
Amplitude Spectrum
Spatial Frequency Analysis
increasing frequency
Phase Phase Spectrum
Many Sinusoidal WavesIn the Frequency Domain
Spatial Frequency Analysis
Amplitude Lena
Amplitude Peppers
Phase Lena Phase Peppers
Phase Wins!Image by: Thomas Kinsman, CIS
Aliasing High-frequency information can be perceived as low-frequency
information if the sampling rate is too low Applies to temporal as well as spatial frequencies
http://www.svi.nl/wikiimg/StFargeaux_kasteel_buiten1_aliased.jpg
Aliasing
A function in the spatial domain, f(x), is band-limited if it has a highest frequency s0 in the frequency domain, F(s).
x
f(x)
s- s0 s0
F(s)
Spatial Domain Frequency (spectral) Domain
Aliasing
If we sample f(x) at equal intervals, , we get multiple copies of the spectrum in the frequency domain:
Multiplying f(x) by a sampling (delta, or “spike”) function is equivalent to a convolution of F(s) with the Fourier transform of the sampling function. This is known as the Convolution Theorem.
s- s0 s0
G(s)
x
g(x)
- 1/ 1/
g(x) is the sampled functionG(s) is the sampled function inthe frequency domain
Aliasing
Can we recover the original function intact from the sample points? In other words, can we recover F(s) from G(s)?Yes, if we eliminate all of the replicas of F(s),
except the central one. To do this, multiply G(s) by a window function,
or convolve the sampled function g(x) with an interpolation function, sinc (x) = sin (x) / x
Aliasing
Restrictions: f(x) must be band-limited (have a highest
frequency) at s0 and:The relationship between the sampling
interval and the band-limit, s0 must be: < = 1 / (2 • s0)
A function sampled at a uniform spacing can be completely recovered from the samples, provided thatwhere the function is band-limited at s0
< = 1 / (2 • s0)
Aliasing
Under-samplingSuppose that Then the replicas will overlap and sum
together
> 1 / (2 • s0)
s- s0 s0
G(s)
x
g(x)
- 1/ 1/
Energy above the frequency s0 is “folded back” below s0, making the high frequency components appear to below-frequency - this is known as aliasing