Visual Cognition & Visual Routines by Shimon Ulman.

Visual Cognition & Visual Routines

by Shimon Ulman

1. Introduction.2. Requirements for the Analysis of

Visual Stimulus.3. Two Main Stages of Visual

Information Processing.4. Elemental Operations.

5. Conclusions.

• Figure 1. Two circles (A, B) with different diameters. The diameter of circle B is clearly larger than diameter of circle A.

• Figure 2. The shapes comprising the image of figure a are easily perceived as a face. The same shapes are not longer recognized as a face when they are re-arranged as in figure b.

INTRODUCTION.

• The purpose of this paper is to present and examine several operations which may be responsible for determining abstract relations amongst shapes and object.

• There are complex operations involved to determine spatial relations with considerable efficiency which are easily determined by out visual system ,and seem to occur immediately without awareness.

• The visual analysis of shape properties and spatial relations is called “visual cognition”.

• Perception of shape properties and spatial relations is achieved by the application of “ visual routines” to the early representations.

• Using a fixed set of basic operations, the visual system can ensemble different routines and extract a variety of shape properties and spatial relations.

• An explanation of how we determine a particular relation such as “above”, “inside”, “longer-than”, or “touching” would require a specification of the visual routine used to extract the shape or property in question.

INTRODUCTION.

Perception of Inside/Outside Relations

• Figure 3. Inside / Outside

Relationships allow for the ability of determining whether some object are within or out of some other object or shape.

• The algorithms developed to compute the inside/outside relationships.

• Ray-Intersection /Ray Tracing Method.

• The“Coloring” Method.

The algorithms developed to compute the inside/outside relationships.

• Ray-Intersection /Ray Tracing Method.

• Even # of intersections: Target is outside of the shape.

• Odd # of intersections: Target is inside the shape.

• Breakdown of the Ray Intersection Method.

• Coloring.• Although the coloring method does

overcome some of the limitations associated with the ray tracing method, it still has its share of shortcomings and once again human system is superior.e.g The coloring method will fail when the shape is not closed.

Requirements for the Analysis of Visual Stimulus. • Abstractness: • Intuitively the concept of being “inside” is

abstract, because it does not refer to any particular shape, but can appear in many different forms.

• Consider a set of shapes S, satisfying some property p ( the set of all shapes which are closed). How to determine whether a shape B belongs to S?

• Take the advantage of any regularities the set S may contain.

• Regularities allow for membership testing in the set S to be “broken down” into a set of operations which allow one to test whether B is included in the set S quicker and more efficiently in comparison to template matching.

• Abstract Shape Properties refers to the establishment of properties and relations of some large set by using any regularities it may contain. According to the author, the human visual system has this ability and it is performed by using a combination of processes which in turn are composed of several basic “fundamental” or “elemental” operations. The processes which are comprised of these elemental operations are referred to as visual Routines.

• Open-endedness: • The processes/visual Routines used to

determine some property ( shape properties and spatial relations )share the same elemental operations e.g. the elemental operations are not created by some process when it is needed but rather, are available for any process to use.

• The open-endedness requirement requires that these operations be combined to perform new computations as required by the visual system.

• Complexity:• The elemental operations used by some

process will be the same operation and as mentioned above, are not created by some process when needed.

• Furthermore, these elemental operations which may be operating on different locations share some of their "resources".

• The application of a visual routine at different spatial locations is restricted and require the sequencing of the elemental operations.

Open Problems so far:

• What are the elemental operations?

• How are the basic elemental operations integrated into meaningful routines.

• How the visual routines are selected and controlled?

• What triggers the execution of different routines during the performance of visual tasks?

• How new routines are generated to meet specific needs?

• How they are stored and modified with practice?

Visual Pathway

Rods and cones..

Disks in rods & cones are the site of base representation. Disks in cones are pigmented and filter light at different wavelengths for red, green and blue color.The rods take care of the black and white.

Visual pathwaysOnce axons leave the optic chiasm, a few fibres project to the midbrain, where they participate in control of eye movement.

Most axons, however, project to the lateral geniculate body of the thalamus, where the optic fibres synapse onto neurones leading to the visual cortex.

This topographical organisation is maintained in the visual cortex, with the six layers of neurones grouped into vertical columns. Connection between the left & right cortices are made via the corpus callosum and it is these that give rise to thestereoscopic vision from the binocular input from the optic nerve.

Corpus callosum

Two Main Stages of Visual Information Processing

• Bottom-up Processing.

• Top-down Processing.• Incremental Representations.• Parallel Processing of Visual Information.

Bottom-up Processing.

• Initially, a base representation is created. This representation provides a local description of the visual input and includes information of objects such as depth, orientation, color, motion etc. with respect to the viewer (e.g. this representation may change if the viewer's position / orientation changes).

• This representation is dependant on the visual input only - no higher level processing is performed to the input, does not rely on memory or any cognitive processes.

• Furthermore, this representation remains fixed and is not modified. As a result, a change in the viewer's position / orientation would result in the creation of a new base representation.

Top-down Processing

• Visual routines are applied to the base representation in order to obtain properties and relations between the objects in the base representation.

• This step relies on more than just the visual input and may use memory and other cognitive processes and is applied to a portion of the base representation only.

Top-down Processing • Incremental Representations• While the visual routines are performing computations, other "middle"

representations are created and modified during the processing in order to maintain the results obtained after the visual routines perform their operations.

• These representations depend on the particular visual routines applied and the same input on different occasions may lead to a different representation whereas the same input will always lead to the same base representation.

• Other routines may then take advantage of the incremental representations allowing them to operate more efficiently and faster as a result.

• As an example of an incremental representation, consider determining whether some target is inside some polygon by using for example area coloring. If it is determined that the target is inside the polygon, then it may have been determined that the polygon is closed. The fact that the polygon is closed will be stored in the incremental operation and any other routine which may need to know if the polygon is closed, will not have to apply any elemental operations to determine this.

Top-down Processing

• Parallel Processing of Visual Information• Is the visual system capable of performing operations in parallel? It

appears as though it does and the aurthor agrees. Furthermore, the author defines three different types of parrallel processing which the visual system is capable of.

• Spatial: the same operation applied simultaneously at different locations.

• Functional: different operations applied simultaneously the same location.

• Temporal: the "simultaneous application of different processing stages to different inputs" (also known in the field of computer architecture as pipelining).

• Visual routines are capable of operating in all three parallelisms, however, there appears to be a limited use of spatial parallelism!

Elemental Operations

• Shifting of Processing Focus.

• Indexing.

• Coloring.

• Boundary Tracing and Activation.

• Marking

Shifting of Processing Focus

• This operation refers to the ability to control the location to which operations take place, or in other words, the ability to selectively apply our visual attention.

• Since visual attention is directed to some particular location, it suggests that processing of visual information does not occur simultaneously at all locations and aids in keeping the amount of information being processed to a minimum.

• This shift of attention can be achieved partly through eye movements however, it can also occur without any eye movements at all. Finally, plenty of psychological studies and physiological studies indicate the ability to direct visual attention.

Indexing

• The Indexing operation itself consists of the following three subdivisions:

• The properties used for indexing are computed using the information available in the base representation. The "indexable" properties may include color, motion, curvature, and texture in addition to others, and are computed in parallel.

• Using the information computation, odd man out locations are located by comparing regions of the base representation to their surroundings and computing a difference image.

• Finally, processing focus is shifted to the location corresponding to the strongest difference signal. After processing has been completed at this location, items similar to the indexed item or in close proximity to the indexed item are processed next. Studies also indicate that this final step may itself be further subdivided

Indexing

• As described , we are capable of directing visual attention (or shifting the processing focus) to particular locations.

• How are these particular locations actually selected? According to the author, we select locations with the elemental operation referred to as indexing, which directs processing focus to locations which are drastically different than their surroundings (such locations are also referred to "odd man out" locations).

• As an example of such an odd man out location, for the single "A", which is red instead of black. This red A immediately "jumps" out and is easily noticeable. This "A" is referred to as an indexable location.

Indexable Location

Coloring

• Coloring refers to the same operation described earlier with respect to the inside / outside relations and consists of marking or activating some region surrounding a point. There are several problems associated with this operation however. For example, referring to figure 9, the small target red circle is inside the larger circle. However, the boundary of the circle is actually a broken line and as a result, the coloring procedure would reach infinity and it would be erroneously concluded that the point is outside the circle.

• Clearly the human system is capable of correctly determining the target is inside the circle and does not have the shortcomings associated with the given the coloring algorithm.

• Unfortunately, the exact coloring process carried out by the human system is not entirely known.

• Breakdown of the Coloring Method.

Boundary Tracing and Activation

• This operation allows for the tracing (following) of contours in the base representation and is actually a useful routine employed in a variety of tasks. For example, as shown in figure 10, it can be used to determine whether two points are on the same contour. This can be accomplished as follows:

• Start at one point and mark this point (marking is described below)

• Trace the contour until one of the following occurs: a) The next point is encountered - in this case, both points are on the same contour. b) The original starting point is encountered - in this case, both points are not on the same contour.

• In a similar manner, determining whether a curve is closed or not can also be accomplished using a similar method. This is accomplished by starting at some point on the curve and tracing the curve. If the original starting point is encountered then the curve is closed otherwise it is not.

• Boundary Tracing and Activation Example.

• Although the tracing method appears simple, it is actually quite complex and does require certain restrictions. For example, how are incomplete boundaries traced or boundaries with breaks in them such as the circle of figure 9. -how about tracing across intersections or branches - which branch is followed

Marking

• Allows for keeping track ("remembering") locations already processed thereby permitting routines to shift between locations without having return and re-process previous locations again.

• Marking may be used to determine whether some curve is closed - mark the starting point, trace the curve. If the marked point is encountered, curve is closed otherwise if tracing continues to "infinity" curve is not closed.

Conclusions

• The ability to extract abstract shape properties is a fundamental requirement for visual perception. • The visual processing is divided into two stages.

• In the first stage, a "base representation" is constructed using a bottom-up approach and relying on the visual input alone.

• Second stage the Top-down prcocessing is then applied to the base representation through the execution of visual routines, to determine definitions of objects and spatial relations amongst them.

• Visual routines themselves are composed of set of elemental operations which can be combined to perform specific computations and provide specific goals.

• Incremental representations store the results after applying visual routines to portions of the base representation and can be used by future routines thereby allowing them to be more efficient.

• Several basic elemental operations are shifting of processing focus, indexing, boundary tracing and activation, and marking however all operations employed by the visual system are currently unknown! Finally, perception of spatial relations appears to be very fast and simple for humans.

• We are constantly determining such relations without any conscious awareness and "without any thought".

• Furthermore, even with today’s best and most powerful computing technology, we are not even close to imitating the human visual system.

• Our visual system is clearly astonishing!

• The Central Question of Visual• Neurophysiology (V648)• David Hubel, Eye, Brain and Vision, p 2:• "The questions that I will be addressing can be• simply stated. When we look at the outside world,• the primary event is that light is focused on an array• of 125 million receptors in the retina of each eye. The• receptors, called rods and cones, are nerve cells• specialized to emit electrical signals when light hits• them. The task of the retina and of the• brain proper is to make sense of these signals, to• extract information that is biologically useful to us.• The result is the scene as we perceive it [consciously• or sub-consciously], with all its intricacy of form,• depth, movement, color , and texture. We want to• know how the brain accomplishes this feat."

Sequential / Parallel Processing Model

• Transformations of the Neural Image

• • optic nerve has limited capacity (bottleneck)

• • neural image must be compressed before leaving

• the eye, plus topographically re-organized

• • neural image is expanded into numerous forms

• by central nervous system (CNS)

Advanced Visual System

Comments!! & Questions??

• • Support or contest the following: Ullman claims that our low-level visual attention is attracted to odd-man-out locations, such as those that have a distinguishing color, or shape, or combination of shape and color.

• • Support or contest the following: Ullman believes that the vision system interacts with the language system as if through a diode. That is, information flows from the visual system to the language system, but little, if any flows back.

• • Support or contest the following: If you are going to study vision, then you might as well study pigeons as people, because the questions answerable by the pigeon visual system are the same as the questions answerable by the human visual system. That is, the pigeon's view of the world is the same as ours.

Visual Cognition & Visual Routines by Shimon Ulman.

Documents

Transcript of Visual Cognition & Visual Routines by Shimon Ulman.