Low Cost – FPGA based system for pothole …...Low Cost – FPGA based system for pothole...

Low Cost – FPGA based system for pothole detection on

Indian Roads

M. Tech Project Report – I stage

Submitted in partial fulfillment of the requirements for the degree of

Master of Technology

by

Shonil Vijay

Roll No: 05329001

under the guidance of

Dr. Kavi Arya

Kanwal Rekhi School of Information Technology

Indian Institute of Technology, Bombay

July 2006

ii

Acknowledgments

I express my deep sense of gratitude to Dr. Kavi Arya, for his invaluable help and guidance during the first stage of project. I am highly indebted to them for constantly encouraging me by giving their critics on my work. I am grateful to him for having given me the support and confidence.

Shonil Vijay July 2006, Indian Institute of Technology, Bombay

iii

Abstract

The objective of this project is to develop a low cost vision-based driver assistance system over FPGA, to provide a solution of detection and avoidance of potholes in the path of a vehicle on road. The project will use an FPGA model to deploy image processing algorithms efficiently so that the output can be achieved in real time.

Pothole avoidance may be considered similar to other obstacle avoidance except that the potholes are depressions rather than extrusions from a surface. A vision approach was used since the potholes were different visually from the background surface. The detection of defects in road surfaces is necessary for locating a pothole. The system as a whole will be far more efficient than common methods such as manual inspection and almost as accurate.

iv

Contents

1. Introduction…………………………………………………………. 1 1.1 Motivation……………………………………………………... 1 1.2 Problem Statement……………………………………………... 2

2. Design Choices………….…………………………………………... 3 2.1 PC Digital Signal Processing Programs…….…………………. 3 2.2 Application Specific Integrated Circuits………………………. 4 2.3 Dedicated Digital Signal Processors…………………………... 4 2.4 Field Programmable Gate Arrays……………………………... 5

3. Obstacle Detection Methods………………………………………... 5

3.1 Off-Road Obstacle Detection……………....…………………. 5 3.2 Optical Flow…………………………..………………………. 7 3.3 Stereo Vision…………………………………………………... 9 3.4 RADAR (Radio Detection and Ranging).…………………….. 10

3.4 LADAR (Laser Detection and Ranging)……………………… 10 3.6 Other Vision-Based Detection Modalities…………………….. 12

4. DSP (Digital Signal Processing) Algorithms.…………………….... 12 4.1 Introduction to Windowing Operators ...………………………. 13 4.2 Rank Order Filter…………...………………………………….. 13 4.3 Morphological Operators..……………………………………... 14 4.4 Convolution……………………………………………………. 16 5. Summary and Future Work………………...………………………... 17 6. References………………………………………………………….... 17

- 1 -

1. Introduction

1.1 Motivation

Interest in Collision Avoidance Systems comes from the problems caused by traffic congestion worldwide and a synergy of new information technologies for simulation, real-time control and communications networks. Traffic congestion has been increasing world-wide as a result of increased motorization, urbanization, population growth and changes in population density. Congestion reduces utilization of the transportation infrastructure and increases travel time, air pollution, fuel consumption and most importantly traffic accidents.

Worldwide, around 1.2 million people died as a result of road traffic accidents in 2002. This represents an average of 3,242 persons dying each day around the world. In addition to these deaths, around 50 million people globally are estimated to be injured or disabled every year. Projections indicate that these figures will increase by about 65% over the next 20 years. Road accidents are currently world's eleventh leading cause of death, but by 2020, it will become third, behind deaths linked to heart disease and mental illness. In United States alone, around 6.2 million traffic accidents occur due to automobile crashes in 2004. These accidents accounted for 42,636 deaths and 2,788,000 nonfatal injuries. 31,693 occupants of passenger cars were killed in traffic crashes and an additional 2,543,000 were injured, accounting for 85% of all fatalities and 95% of all injuries. In 2000, the economic cost of motor vehicle crashes was $230.6 billion.

While different factors contribute to vehicle crashes, driver behavior is considered to

be the leading cause of around 95% percent of all accidents. More recent data has identified inattention, including distraction, “looked but did not see” and falling asleep at the wheel as the primary cause of accidents, accounting for at least 25% of the crashes. The driver behavior can be improved by either alerting him about the probable collision or controlling the vehicle itself. Car companies are already introducing incremental improvements (although largely in luxury models) such as collision avoidance, GPS navigation, roadway monitoring and wireless communications. In the works are Lane Departure Warning Systems, and other neat smart capabilities. The most visionary proposal is the “driverless vehicle” which, armed with GPS (global positioning system) and anti-collision radar etc., is driven by an on-board computer linked to the traffic control center and thus automatically uses the best route to the chosen destination.

At present, the commercially available traffic detecting equipments include loop detectors, pressure sensors, infrared, radar, ladar or ultrasound based sensors and video cameras. Although loop detectors are cheap to manufacture, their installation and repair are very expensive because they involve digging and re-surfacing of the road, which is labor intensive, time consuming and causes disruption to the traffic. The pressure based traffic sensors have the same problem. Infrared, radar and ultrasound sensors, on the other hand, are more expensive to make. The use of these active devices in urban areas may have safety and other regulatory implications. In addition, their functionality can be affected by bad weather. Video cameras have long been deployed at key traffic bottlenecks. The effectiveness of this type of sensors can also be affected by bad weather.

- 2 -

Also the images have to be processed further to obtain useful data from them. There are a number of research projects on-going aiming to extract objective traffic condition indicators from the video images. But the huge volume of data involved in video images dictates that such systems requires substantial computation power, and consequently, will not be low cost devices for a considerable period of time. Also these laser beam or radar based systems adds $1500 to $3000 to the costs of a car, and thus are being used in luxury cars only. Finally GPS based systems have very high operational costs as they need to obtain the precise positions of the vehicles in every few milliseconds.

1.2 Problem Statement An examination of automobile accident statistics in India reveals that, despite the declining number of fatalities, the number of accidents resulting in injury is steadily increasing. The major cause of traffic accidents is “Improper Driving” due to driver’s inattention and fatigue. Another major contribution to these mishaps on roads is the increasing rate of defects on these roads. These defects are a result of the increasing amount of traffic and the climatic conditions, both of which can not be changed.

Fig 1: Example defects. From left, cracking, popouts, wear and polishing and

potholes. The different types of defects can be briefly described as:

• Cracking – areas where the road surface has been split apart, classified as longitudinal transverse and crocodile or combination cracking.

• Pop-outs – areas where pressure from blow has forced the surface of the road to be raised.

• Wear and Polishing – areas of smooth black tar occurring on asphalt surfaces.

• Potholes – areas where there is an absence of asphalt or a hole. Potholes usually emerge from areas where there has been severe cracking.

Of the above different types of defects, the one of particular interest and the problem

at hand are the potholes. Early detection of cracks in road surfaces, allows maintenance to be performed before cracks develop into more serious problems, such as potholes and pop-outs.

This research discusses a solution for the automatic detection and avoidance of such potholes in the path of an autonomous vehicle operating in an unstructured environment. Pothole avoidance may be considered similar to other obstacle avoidance except that the potholes are depressions rather than extrusions from a surface.

A vision approach is used since the potholes were significantly different visually from

the background surface. According to electromagnetic spectrum, nowadays vision based

- 3 -

researches can be categorized into three classes according to the range of operating wavelength, namely, visible spectrum region (0.38~0.8µm), Near Infra-Red region (NIR) (0.8~1.2µm), and Far Infra-Red region (FIR) (7~14µm).

However, human cannot perceive the wavelength of the above region expect the

visible spectrum, and they can see the NIR or FIR images through the sensors which are capable of detecting energy radiated by a band of the electromagnetic spectrum. For cost and performance considerations, the CCD camera which operates in the visible spectrum band will be taken as our sensing device to develop this system.

Recently, Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of algorithms suited to video image processing applications. The unique architecture of the FPGA has allowed the technology to be used in many such applications encompassing all aspects of video image processing.

As image sizes and bit depths grow larger, software has become less useful in the video processing realm. Real-time systems such as those that are the target of this project are required for the high speeds needed in processing video. DSP systems are being employed to selectively reduce the amount of data to process, ensuring that only relevant data is passed on to a human user. Eventually, it is expected that most video processing can and will take place in DSP systems, with little human interaction. This is obviously advantageous, since humans are perhaps not entirely accurate.

2. Design Choices There are several different choices a designer has when implementing a DSP system

of any sort. Hardware, of course, offers much greater speed than a software implementation, but one must consider the increase in development time inherent in creating a hardware design. Most software designers are familiar with C, but in order to develop a hardware system, one must either learn a hardware design language such as VHDL or Verilog, or use a software-to-hardware conversion scheme, such as Streams-C, which converts C code to VHDL, or MATCH, which converts MATLAB code to VHDL. While the goals of such conversion schemes are admirable, they are currently in development and surely not suited to high-speed applications such as video processing. Ptolemy is a system that allows modeling, design, and simulation of embedded systems. Ptolemy provides software synthesis from models. While this type of system may be a dominant design platform in the future, it is still under much development, meaning that it may not be a viable design choice for some time. A discussion on the various viable options for DSP system design is found below.

2.1 PC Digital Signal Processing Programs

Signal processing programs used on a PC allow for rapid development of algorithms, as well as equally rapid debug and test capabilities. It is common for many hardware designers to use some sort of PC programming environment to implement a design to verify functionality prior to a lengthy hardware design.

- 4 -

MATLAB is such an environment. Although it was created for manipulating matrices in general, it is well suited to some image processing applications. MATLAB treats an image as a matrix, allowing a designer to develop optimized matrix operations implementing an algorithm. However, if the eventual goal is a hardware device, the algorithms are instead often written to operate similarly to the proposed hardware system, which results in an even slower algorithm.

However, even specialized image processing programs running on PCs cannot

adequately process large amounts of high-resolution streaming data, since PC processors are made to be for general use. Further optimization must take place on a hardware device.

2.2 Application Specific Integrated Circuits

Application Specific Integrated Circuits (ASICs) represent a technology in which engineers create a fixed hardware design using a variety of tools. Once a design has been programmed onto an ASIC, it cannot be changed. Since these chips represent true, custom hardware, highly optimized, parallel algorithms are possible. However, except in high-volume commercial applications, ASICs are often considered too costly for many designs. In addition, if an error exists in the hardware design and is not discovered before product shipment, it cannot be corrected without a very costly product recall.

2.3 Dedicated Digital Signal Processors

Digital Signal Processors (DSPs) such as those available from Texas Instruments are a class of hardware devices that fall somewhere between an ASIC and a PC in terms of performance and design complexity. They can be programmed with either assembly code or the C programming language, which is one of the platform’s distinct advantages. Hardware design knowledge is still required, but the learning curve is significantly lower than some other design choices, since many engineers have knowledge of C prior to exposure to DSP systems. However, algorithms designed for a DSP cannot be highly parallel without using multiple DSPs. Algorithm performance is certainly higher than on a PC, but in some cases, ASIC or FPGA systems are the only choice for a design. Still, DSPs are a very common and efficient method of processing real-time data.

One area where DSPs are particularly useful is the design of floating point systems.

On ASICs and FPGAs, floating-point operations are rather difficult to implement. For the scope of this project, this is not an issue because all images consist of only integer data.

Recent advances in DSP technology have resulted in very high-speed algorithm

implementations. While the advantages of ASICs and FPGAs are still applicable, this new generation of DSPs has made some engineers reconsider FPGA development. Still, as new DSPs arrive to the market, so do new FPGAs, and it is expected that the two architectures will have similarly increasing performance for each new generation of processors.

- 5 -

2.4 Field Programmable Gate Arrays Field Programmable Gate Arrays (FPGAs) represent reconfigurable computing

technology, which is in some ways ideally suited for video processing. Reconfigurable computers are processors which can be programmed with a design, and then reprogrammed (or reconfigured) with virtually limitless designs as the designer’s needs change. FPGAs generally consist of a system of logic blocks (usually look up tables and flip-flops) and some amount of Random Access Memory (RAM), all wired together using a vast array of interconnects. All of the logic in an FPGA can be rewired, or reconfigured, with a different design as often as the designer likes. This type of architecture allows a large variety of logic designs dependent on the processor’s resources), which can be interchanged for a new design as soon as the device can be reprogrammed.

Today, FPGAs can be developed to implement parallel design methodology, which

is not possible in dedicated DSP designs. ASIC design methods can be used for FPGA design, allowing the designer to implement designs at gate level. However, usually engineers use a hardware language such as VHDL or Verilog, which allows for a design methodology similar to software design. This software view of hardware design allows for a lower overall support cost and design abstraction.

The goal of this project is for real-time (30 frames per second) processing of

grayscale image data, a goal in which an FPGA system using parallel algorithms should have little difficultly achieving.

3. Obstacle Detection Methods

There are a variety of competing methods that have been proposed for obstacle detection and many papers written on such methods. There have been few papers, however, which have offered more than a trivial comparison of the various methods, especially in any analytical fashion. This chapter seeks to fill that gap by examining the various obstacle detection methods for a highway environment from an analytic point of view. By examining each of the basic methods, I hope to illuminate which methods ought to work best for the specific problem of on-road obstacle detection in real time.

3.1 Off-Road Obstacle Detection There has also been extensive work on the off-road, or cross-country, obstacle detection problem. Two natural questions to ask are: Can we learn anything from this body of work? Can we use the same methods? The answers are “yes” and “no.” We may categorize the requirements for an obstacle detection system into two basic types: system requirements and detect ability/sensor requirements. The system requirements try to abstract from the specific sensor and deal with response and throughput of the system. The detect ability requirements deal with the accuracy and resolution of the sensor and are often at odds with the system requirements. A quick comparison of the off-road and on-road problems will show that we cannot use the off-road detection methods without serious modification and still hope to succeed. However, much of the basic system analysis still holds.

- 6 -

Basic system analysis has been neglected from most obstacle detection papers, but it is necessary to examine both the detection requirements and the system requirements as a whole so that adequate trade-offs can be made to ensure good system performance. This chapter will explore several basic methods used for on-road obstacle detection including optical flow and stereo, but I will also briefly consider the possibility of other detection modalities.

Through no lack of effort, there have been no satisfactory solutions so far to the problem of small static obstacle detection at highway speeds. Although there have been good results published for vehicle detection, there have been few results reported on the detection of small static obstacles. Moreover, these few results have generally been reported in vague terms which give the reader little ability to compare methods. Papers on cross-country obstacle detection systems are typically no better at reporting results in a standardized way, although this is probably more excusable since rough terrain is difficult to describe in an analytical fashion and an obstacle may be less well-defined in cross-country applications.

There has been some success in cross-country obstacle detection, so it is worth briefly examining the problem. Typical speeds for cross-country applications are on the order of 5 to 10 m.p.h. Typical highway speeds, however, are many times this. Since stopping distance is proportional to the square of the speed, the stopping distance for on-road applications is much larger, perhaps by 2 orders of magnitude. In addition, a cross-country vehicle moving at relatively slow speeds can climb larger objects without damage to the vehicle than the typical passenger car traveling on the highway. For the highway scenario, we must be able to detect any objects larger than a few inches. Coupling these facts together, we see that the resolution necessary for on-road obstacle detection may be 2 to 3 orders of magnitude greater if we use standard techniques from cross-country with similar field-of-view. Given higher speeds, we must also examine a greater amount of terrain. Although the effect of latencies on look-ahead distance is generally dwarfed by the stopping distance for highway applications, it is still important to have small latencies in the processing system. High latency systems can cause problems in vehicle control.

These difficulties might make the on-road problem seem almost intractable given

that the off-road obstacle detection problem is still far from being solved. Fortunately, however, roads are reasonably planar. Given our previous analysis, it should be quite clear that we need to use this fact in order to win in this scenario. Assuming planarity of the road improves our signal-to-noise ratio greatly. This simplifies a number of things. We don’t need to build maps, estimate heights of objects, and check for rollover conditions, etc. We only need to find areas which violate the planarity constraint and avoid them.

For many of the computer vision methods I describe, there are both dense and discrete methods. Dense methods provide some estimate (range, optical flow, etc.) at every image pixel. Discrete methods first use some feature detection method to pick out candidate potential obstacles and then attempt to calculate range, optical flow, or some

- 7 -

other measure in order to determine whether a feature (or group of features) actually belongs to an obstacle.

In the near future, it is likely that only discrete methods will be able to achieve the

low processing latencies necessary for highway obstacle detection without special purpose hardware. For this reason, this chapter focuses on discrete methods, although I do not ignore the dense methods since these will become feasible as computers become faster.

I name the process of finding obstacles once given a set of image features or patches,

obstacle determination. I focus on the problem of obstacle determination in this chapter. Because an image is a mapping of 3-D space to 2-D space, a single image can not determine whether an image patch belongs to the ground plane or not (Fig 2). Additional information, in the form of more images or certain assumptions or a priori knowledge about potential obstacles, must be used to perform obstacle determination. If an image feature or patch belongs to the ground plane, then it is not a threat. If it does not, however, it should be considered an obstacle and avoided. I refer to the obstacle determination problem throughout the chapter.

Fig 2: A single image cannot distinguish whether image patch C is caused by A (an obstacle), or B (a

ground plane feature, i.e. no obstacle). We call the problem of using additional information to disambiguate the two obstacle determination.

3.2 Optical Flow One method of performing obstacle determination is optical flow. Optical flow uses the motion of either the objects or the camera to determine the structure of the environment. Optical flow uses two or more images taken at different times to perform obstacle detection. The basic concept of using optical flow for obstacle determination is illustrated below (Fig 3). A point in the image, y0, can be mapped to either obstacle point P or ground point Q (or any point in between) at time t0. At time t1, the camera takes another image after the vehicle has moved forward by distance d. A matching technique or flow calculation is employed to map y0 to a point in the new image. If this point is

- 8 -

close to yobs, then it is likely that it corresponds to an obstacle. If it is closer to ygnd, however, then it most likely corresponds to a ground-plane point. By examining the flow (in the y-direction in the illustrated case), i.e. the movement of y0 at t0 to a new point in the second image, the algorithm can theoretically detect whether the point belongs to an obstacle or to the ground plane.

Fig 3: A point y0 on the image plane at time t0 can be mapped to either point P or Q in the world (or any point in-between). Another image taken at time t1 after the camera has moved a distance d can be used to

disambiguate between P and Q depending on whether it is mapped to yobs or ygnd.

There has been a number of optical flow or similar techniques proposed in the

obstacle detection literature. Although some of these techniques have had success in detecting large static obstacles, optical flow methods are best at detecting moving objects which have motion fields significantly different from the background. Another vehicle, for example, can be detected and tracked using optical flow techniques, even if it is moving with low relative velocity with respect to our own vehicle because it is still moving with respect to the background. This may be sensed by comparing the direction of the calculated flow vector with the direction of the model flow vector. I briefly describe several methods for calculating optical flow, and then illustrate why optical flow is too insensitive to be effectively used in the basic static obstacle determination problem for small obstacles at large distances. Edge-based methods have been used with success for detecting vehicles and tracking them over time. Typically, these methods involve an edge extraction step followed by a dynamic programming method to match edges between images. The algorithms tend to run faster than analytical methods since they compute one flow vector only for every edge rather than every pixel. Reliable edges can be matched fairly robustly in these schemes because of the length and orientation constraints that can be used in matching.

- 9 -

One problem with these methods is that the flow component along the edge is generally unreliable insofar as the edge-detectors themselves often find broken lines, etc. Broken lines can cause mismatches that make the flow component normal to the edge wrong too, but this happens less frequently.

Pixel-based discrete methods, on the other hand, only compute the optical flow at

points where the flow is fully constrained without resorting to smoothness constraints. Typically, the algorithm will choose local intensity minima or maxima or use an interest operator, e.g. Moravec interest operator, to select points in an image. Then the algorithm matches points using correlation similar to that in stereo matching. Using the observation that under a planar ground assumption, an obstacle only changes the magnitude and not the direction of the flow vector, it is possible to constrain the search to a single dimension along the direction of the model flow vector (with the drawback that it may not allow the detection of moving objects). The flow vector for the pixel is then just the vector from the original pixel position to the location of the maximum correlation in the new image.

3.3 Stereo Vision Binocular stereo vision has been a popular approach for obstacle detection in both

cross-country and on-road scenarios. Stereo can be calculated in either a dense or discrete (point or edge-based) manner, but its calculation is the same in all cases. Discrete stereo vision is very similar to discrete optical flow except that a spatial baseline is used rather than a temporal one. For point-based discrete stereo methods, an identical approach can be used to choose the features for correspondence as in optical flow, i.e. an operator which finds local minima, maxima, or corners. A correlation approach is then used to perform correspondence between the features in the first image and features in the second image (hopefully the same). In edge-based methods, the same edges that are matched in optical flow calculations can be used for matching in stereo algorithms.

As in optical flow, the search space for a corresponding point in the second image

can be reduced to one dimension. In the case of stereo, the match must fall along the epi-polar line. Given a single image, the same features are found and essentially the same matching procedure is run to match features to the second image, although the search is made along different directions. Stereo has the advantage, however, that if the cameras are situated with the optical axes perpendicular to the line connecting the optical centers, the search space is constrained to the same image row. This makes stereo simpler to implement and faster than optical flow. The conceptual difference is that one set of images is taken with a spatial base line, and one with a temporal baseline. I show that in these highway scenarios, the spatial baseline provides better detect ability.

- 10 -

Fig 4: Stereo camera setup. The distance of the point from the cameras is proportional to the baseline

and inversely proportional to the disparity

3.4 RADAR (Radio Detection and Ranging) Radar is an excellent means of detecting other vehicles because radar works at long ranges and is relatively unaffected by rain or snow. The radar was capable of detecting vehicles at distances of up to 200 meters with a range resolution of approximately 0.1 meters. The sensor had a 3o vertical field of view and a 12 o horizontal field of view. Bearing to a target could be estimated via wavefront reconstruction, and when combined with geometric information about the road, potential obstacles could be mapped to an individual lane. Since radars provide direct range and may also provide a doppler velocity measurement, they will most likely be a standard sensor for automated vehicles.

Unfortunately, current radars are not able to reliably detect small objects at ample distances. Metal surfaces are good radar reflectors, and hence make vehicle detection fairly easy. The ease with which an object may be detected at a given range is related to its radar cross section. Vehicles have a much larger radar cross section (10 m2) than

people (0.2 to 2m2), and most road debris will have an even smaller radar cross section,

making them undetectable.

3.5 LADAR (Laser Detection and Ranging) Although we are unaware of any systems that use laser reflectance for obstacle

detection, there have been many systems which have used laser range measurements for obstacle detection. Range measurements can be either indirect or direct.

Laser striping is an indirect ranging method that has been used on a number of robots (especially indoor robots). Laser striping uses a fixed CCD camera and a laser stripe (visible to the CCD) that is swept across a scene. Triangulating between the known direction of the laser and the direction provided by the laser stripe position on the CCD provides a range estimate at each image row. Laser striping can provide 3-D geometry fairly quickly since the laser only needs to be scanned in one direction, and computation is simple. There are a couple problems with laser striping. First, the laser must be easily detectable within the image (significantly stronger than ambient light). As with any triangulation system, range accuracy improves as the distance between the laser and CCD

- 11 -

is increased. However, as this distance is increased, the problem of “shadowing” worsens. Shadowing occurs when points visible to the laser may not be visible to the camera and vice versa. For the simplest systems (a beam diverged by a cylindrical lens), laser striping requires more laser power than direct methods and hence is most useful indoors or at short distances.

Laser rangefinders avoid the shadowing problem by keeping the transmitted and

received beams approximately co-axial, and measuring range directly. A 3-D laser scanner operates by sweeping a laser across the scene in two dimensions. At each pixel, the instrument measures the time that it takes for a laser beam to leave the sensor, strike a surface, and return. There are several methods for measuring the time. Many sensors also provide an intensity measurement at each pixel by measuring the energy of the returned laser signal. Thus, a full sweep in two-dimensions can provide both a depth map and an intensity image.

Once the laser provides a depth map for the scene, the data is typically transformed

into an elevation map based on the known position of the sensor. Terrain-typing algorithms may then be run on the resulting elevation map, and discrete obstacles may be detected by looking for discontinuities in the elevation map. However, there are a number of problems with this method. First, it is computationally expensive. Second, the discrete grid size cannot represent vertical surfaces, and hence may miss some potentially dangerous obstacles.

Most ranging software systems have been image-based, meaning that the system

waited for an entire laser image to be acquired and processed before obstacles were reported to the path generation module. Hebert proposed a system that processes range data pixel by pixel to reduce the latencies involved and improve system efficiency. A pixel by pixel method also reduces the dependency on a particular range sensor, since methods which use entire images are tuned to the specific field-of-view and geometry of the sensor. Reported latencies were reduced to under 100 ms or under. With the addition of a planning module, the system was demonstrated in multi-kilometer traverses through unknown terrain.

Despite problems with previous laser range methods, laser range would likely provide adequate obstacle detection capability for highway environments given adequate laser power. The distance between an obstacle and the road surface behind it is the same as in the stereo vision system. A laser sensor mounted at a height of 1.5 meters looking at the road at 60 meters would see an obstacle of 20 cm at a range of 52 meters. Given adequate power, this difference should be easily detectable by a laser rangefinder. Our current laser scanner is not powerful enough to provide reliable range estimates from the road at these distances. The real problem with laser rangefinders currently is that they are too expensive as a solution for obstacle detection for consumer vehicles.

- 12 -

3.6 Other Vision-Based Detection Modalities Although the most promising obstacle detection methods have been discussed, there

is the potential for new methods based on numerous visual cues. At least 12 depth cues are available to humans, only two of which are binocular. And there are other cues unrelated to depth that are used for obstacle detection (shape, color, luminance, etc.). Unfortunately, many of the depth cues operate at distances shorter or longer than the distances with which we are concerned (on the order of 50 meters). Most are also difficult to encode in a computer program because they require a priori knowledge: typical size of the obstacle, etc. It would be impractical to teach a system what every potential obstacle might look like. Nevertheless, some of these other cues may be useful to future obstacle detection systems. One depth cue that can easily be used on highways to estimate the distance to a suspected obstacle, however, is linear perspective effects. Linear perspective states that the distance between two points subtends a smaller angle at larger depths. Since we know approximately the width of the road, we can use the linear perspective effects to estimate the distance to a given image row. Relative height may be used as a depth cue because given level ground, objects which are closer to the horizon are also farther away. Once calibrated using linear perspective effects, the vertical location of an object in the image could be used to estimate its distance.

4. DSP (Digital Signal Processing) Algorithms Based on the computational and communication characteristics, computer vision tasks can be divided into a three-level hierarchy, namely low-level, intermediate-level, and high-level.

• Low-level vision tasks consist of pixel-based operations such as filtering and edge detection. The tasks at this level are characterized by a large amount of data (pixels), small neighborhood operators, and relatively simple operations (e.g., multiply and add).

• The pixel grouping operations, such as segmentation and region labeling, are intermediate-level vision tasks. These tasks are again characterized by local data access but more complex pixel operations.

• High-level vision tasks are more decision oriented, such as point matching, tree matching, and graph matching. These tasks are characterized by non-local data access and nondeterministic and complex algorithms. Communication requirements of vision tasks tend to be unstructured in intermediate and high level vision tasks.

Several examples of vision tasks belonging to this three-level hierarchy are shown in

Table 1.

This project will be focused on developing hardware implementations of image processing algorithms for use in an FPGA -based video processing system. This section discusses these algorithms.

- 13 -

Task level Computational characteristics

Examples

Low Small neighborhood data access, simple operations, large amount of data

Edge detection, filtering,

image morphology

Intermediate Small neighborhood, more complex

operations

Hough transform, connected component

labeling, relaxation

High Nonlocal data access, nonpolynomial and complex operations

Point matching, tree matching, graph

matching

Table 1: Examples of the three-level vision task hierarchy

4.1 Introduction to Windowing Operators In image processing, several algorithms belong to a category called windowing

operators. Windowing operators use a window, or neighborhood of pixels, to calculate their output. For example, windowing operator may perform an operation like finding the average of all pixels in the neighborhood of a pixel. The pixel around which the window is found is called the origin. Fig. 5 shows a 3 by 3 pixel window and the corresponding origin.

Fig 5: Pixel Window and Origin

The work for this project is based on the usage of image processing algorithms using these pixel windows to calculate their output. Although a pixel window may be of any size and shape, a square 3x3 size was chosen for this application because it is large enough to work properly and small enough to implement efficiently on hardware.

4.2 Rank Order Filter

The rank order filter is a particularly common algorithm in image processing systems. It is a nonlinear filter, so while it is easy to develop, it is difficult to understand its properties. It offers several useful effects, such as smoothing and noise removal. The median filter, which is a rank order filter, is especially useful in noise removal.

4.2.1 Algorithm

This filter works by analyzing a neighborhood of pixels around an origin pixel, for every valid pixel in an image. Often, a 3x3 area, or window, of pixels is used to calculate its output. For every pixel in an image, the window of neighboring pixels is found. Then the pixel values are sorted in ascending, or rank, order. Next, the pixel in the output

- 14 -

image corresponding to the origin pixel in the input image is replaced with the value specified by the filter order. Fig. 6 shows an example of this algorithm for a median filter (order 5), a filter that is quite useful in noise filtering.

Fig. 6: Graphic Depiction of Rank Order Filter Operation

As is evident in the above figure, it is possible to use any order up to the number of pixels in the window. Therefore a rank order filter using a 3x3 window has 9 possible orders and a rank order filter using a 5x5 window has 25 possible orders. No matter what the window size used in a particular rank order filter, using the middle value in the sorted list will always result in a median filter. Similarly, using the maximum and minimum values in the sorted list always results in the flat dilation and erosion of the image, respectively. These two operations are considered part of the morphological operations, and are discussed in the next sub-section.

4.3 Morphological Operators The term morphological image processing refers to a class of algorithm that is

interested in the geometric structure of an image. Morphology can be used on binary and grayscale images, and is useful in many areas of image processing, such as skeletonization, edge detection, restoration, and texture analysis.

A morphological operator uses a structuring element to process an image. We

usually think of a structuring element as a window passing over an image, which is similar to the pixel window used in the rank order filter. Similarly, the structuring element can be of any size, but 3x3 and 5x5 sizes are common. When the structuring element passes over an element in the image, either the structuring element fits or does not fit. At the places where the structuring element fits, we achieve a resultant image that represents the structure of the image. Fig. 7 demonstrates the concept of a structuring element fitting and not fitting inside an image object.

- 15 -

Fig. 7: Concept of Structuring Element Fitting and Not Fitting

Algorithm

There are two fundamental operations in morphology: erosion and dilation. It is common to think of erosion as shrinking (eroding) an object in an image. Dilation does the opposite; it grows the image object. Both of these concepts depend on the structuring element and how it fits within the object. For example, if a binary image is eroded, the resultant image is one where there is a foreground pixel for every origin pixel where it’s surrounding structuring element-sized fit within the object. The output of a dilation operation is a foreground pixel for every point in the structuring element at a point where the origin fits within an image object.

Grayscale morphology is more powerful and more difficult to understand. The

concepts are the same, but instead of the structuring element fitting inside a two -dimensional object, it is thought to either fit or not fit within a three -dimensional object. Grayscale morphology also allows the use of grayscale structuring elements. Binary structuring elements are termed flat structuring elements in grayscale morphology. The combination of grayscale images and grayscale structuring elements can be quite powerful.

One of the strongest features of morphological image processing extends from the fact that the basic operators, performed in different orders, can yield many different, useful results. For example, if the output of an erosion operation is dilated, the resulting operation is called an opening. The dual of opening, called closing, is a dilation followed by erosion. These two secondary morphological operations can be useful in image restoration, and their iterative use can yield further interesting results, such as

skeletonization and granulometries of an input image.

Grayscale erosion and dilation can be achieved by using a rank order filter as well. Erosion corresponds to a rank order filter of minimum order, and dilation corresponds to a rank order filter of maximum order. The reason for this is that the result of a minimum order ran k order filter is the minimum value in the pixel neighborhood, which is exactly what an erosion operation is doing. This also holds true for a maximum order rank order filter and a dilation operation. However, the rank order filter only works as a morphological operation with a flat structuring element. This is because the rank order filter window works as a sort of structuring element consisting of all ones. Still, this is a

- 16 -

powerful feature, since grayscale morphology using flat structuring elements accounts for the most common usage of morphology.

In a grayscale image, erosion tends to grow darker areas, and dilation tends to grow

lighter areas. Opening and closing each tend to emphasize certain features in the image, while de -emphasizing others. Iteratively, the morphological operations can be used to pick out specific features in an image, such as horizontal or vertical lines.

4.4 Convolution

Convolution is another commonly used algorithm in DSP systems. It is from a class of algorithms called spatial filters. Spatial filters use a wide variety of masks, also known as kernels, to calculate different results, depending on the function desired. For example, certain masks yield smoothing, while others yield low pass filtering or edge detection.

4.4.1 Algorithm

The convolution algorithm can be calculated in the following manner. For each input pixel window, the values in that window are multiplied by the convolution mask. Next, those results are added together and divided by the number of pixels in the window. This value is the output for the origin pixel of the output image for that position.

The input pixel window is always the same size as the convolution mask. The output pixel is rounded to the nearest integer. As an example, Fig. 9 shows an input pixel window, the convolution mask, and the resulting output. This convolution mask in this example is often used as a noise –cleaning filter.

Fig. 9: Convolution Algorithm Example

Convolution Output

= (50*1 + 10*1 + 20*1 + 30*1 + 70*2 + 90*1 + 40*1 + 60*1 + 80*1)/9 = 57.7778 => 58

The results for this algorithm carried over an entire input image will result in an

output image with reduced salt-and-pepper noise. An important aspect of the convolution algorithm is that it supports a virtually infinite variety of masks, each with its own feature. This flexibility allows for many powerful uses.

- 17 -

5. Summary and Future Work Many of the key imaging functions break down into highly repetitive tasks that are well suited to modern FPGAs, especially those with built-in hardware multipliers and on-chip RAM. The remaining tasks require hardware more suited to control flows and decision making such as DSPs. To gain the full benefit of both approaches, systems are required that can effectively combine both FPGAs and DSPs. With the addition of standard imaging functions written in either VHDL or C all of the key building blocks are available for making an image processing system. Using the DSP functions provided with the camera we can implement the hardware for the project.

This research has been an excellent learning experience in various engineering disciplines. The Digital Signal Processing algorithms are now to be implemented in MATLAB so that the results can be compared as well as simulations can be done for optimum performance.

6. References [1] Hussain, Z.: “Digital Image Processing – Practical Applications of Parallel

Processing Techniques,” Ellis Horwood, West Sussex, UK, 1991. [2] Pratt, W.: “Digital Image Processing,” Wiley, New York, NY, 1978.

[3] Ratha, N. K. & Jain, A. K.: “Computer Vision Algorithms on Reconfigurable Logic

Arrays,” IEEE, 1999

[4] Evans, T.: “Semi-Automated Detection of Defects in Road Surfaces,” Thesis, Monash University, 2004

[5] Ethans, P. M.: “Real-Time Image Processing on a Custom Computing Platform,” IEEE, 1995

[6] Kruger, W. & Enkelman, W.: “Real-Time Estimation and Tracking of Optical Flow

Vectors for Obstacle Detection,” Germany [7] Nakai, H. & Takeda, N.: “A Practical Stereo Scheme for Obstacle Detection in

Automotive Use,” IEEE, 2004 [8] Wang, C. & Huang S.: “Driver Assistance System for Lane Detection and Vehicle

Recognition with Night Vision,” National Taiwan University, Taipei, Taiwan [9] Hancock, J.: “Laser Intensity-Based Obstacle Detection and Tracking,” Thesis,

Carnegie Mellon University, Pittsburgh, Pennsylvania

Low Cost – FPGA based system for pothole …...Low Cost – FPGA based system for pothole...

Documents

Transcript of Low Cost – FPGA based system for pothole …...Low Cost – FPGA based system for pothole...