Range-intensity histogram for segmenting LADAR images

16
Pattern Recognition Letters 13 (1992) 41-56 North-Holland January 1992 Range-intensity histogram for segmenting LADAR images Anil Jain, Tim Newman and Michael Goulish Department of Computer Science, Michigan State University, East Lansing, M! 48824, USA Received I I February 1991 Revised 20 August 1991 Abstract Jain, A., T. Newman and M. Goulish, Range-intensity histogram for segmenting LADAR images, Pattern Recognition Letters 13 (1992) 41-56. We discuss a segmentation and preliminary recognition scheme for integrated sensory data. We perform pixel-level fusion of range and reflectance (intensity) data from a laser radar (ladar) sensor by constructing range-based intensity histograms contain- ing the sum of intensity values of all pixels at regular range intervals. The ranges at which peaks occur in the histogram are hypothesized as ranges at which an object of interest might be present in the input scene. Likely objects are then segmented and subsequently identified using a nearest neighbor classifier. Keywords. Laser radar, range image, segmentation, sensor fusion, target recognition. 1. Introduction The goal of this project was the identification of certain objects of interest through the integration of laser radar (iadar) range and reflectance (inten- sity) images. Our images were perfectly registered, solving the correspondence problem and easing the task of integration immensely. Identification of the objects was more difficult, however, because distant 3D objects needed to be recognized from essentially 2-~- D data. Factors such as partial occlu- sions, noise, and object articulations further com- plicate the identification task. Sensor integration is desirable for several reasons. Perhaps the most fundamental of these is that sensor fusion can allow better feature extrac- tion than is possible with reliance on a single sen- Research supported by a grant from the Advanced Technology Lab, Northrop Electromechanical Division, Anaheim, CA sor. Additionally, the sensor fusion model seems to be favored by natural vision systems. Mitiche and Aggarwal have written that nearly all higher life forms use a variety of sensors to solve basic survival problems [8]. Seventy-nine 128 × 128 ladar images of military vehicles were gathered at various poses and ranges between 350 m and 1.24 km. The images were gathered using the United Technologies Research Center/Air Force Armament Laboratory's CO2 ladar sensor operating at 87 KHz. The sensor has a 2°× 2 ° scan pattern with a range precision of 0.3 m. Range values are the measured time delay from pulse transmission to arrival time after reflec- tance off an object. Reflectance (intensity) is meas- ured from the amplitude of the returned pulse. (For more details about the sensor, see [11].) The military vehicles to be identified were a jeep, tank, truck, and armored personnel carrier (APC). Some of the scenes also contained calibration plates, so 0167-8655/92/$05.00 © 1992 -- Elsevier Science Publishers B.V. All rights reserved 41

Transcript of Range-intensity histogram for segmenting LADAR images

Pattern Recognition Letters 13 (1992) 41-56 North-Holland

January 1992

Range-intensity histogram for segmenting LADAR images

Ani l Ja in , T im N e w m a n and Michael Goul i sh

Department of Computer Science, Michigan State University, East Lansing, M! 48824, USA

Received I I February 1991 Revised 20 August 1991

Abstract

Jain, A., T. Newman and M. Goulish, Range-intensity histogram for segmenting LADAR images, Pattern Recognition Letters 13 (1992) 41-56.

We discuss a segmentation and preliminary recognition scheme for integrated sensory data. We perform pixel-level fusion of range and reflectance (intensity) data from a laser radar (ladar) sensor by constructing range-based intensity histograms contain- ing the sum of intensity values of all pixels at regular range intervals. The ranges at which peaks occur in the histogram are hypothesized as ranges at which an object of interest might be present in the input scene. Likely objects are then segmented and subsequently identified using a nearest neighbor classifier.

Keywords. Laser radar, range image, segmentation, sensor fusion, target recognition.

1. Introduction

The goal of this project was the identification of certain objects of interest through the integration of laser radar (iadar) range and reflectance (inten- sity) images. Our images were perfectly registered, solving the correspondence problem and easing the task of integration immensely. Identification of the objects was more difficult, however, because distant 3D objects needed to be recognized from essentially 2-~- D data. Factors such as partial occlu- sions, noise, and object articulations further com- plicate the identification task.

Sensor integration is desirable for several reasons. Perhaps the most fundamental of these is that sensor fusion can allow better feature extrac- tion than is possible with reliance on a single sen-

Research supported by a grant from the Advanced Technology Lab, Northrop Electromechanical Division, Anaheim, CA

sor. Additionally, the sensor fusion model seems to be favored by natural vision systems. Mitiche and Aggarwal have written that nearly all higher life forms use a variety of sensors to solve basic survival problems [8].

Seventy-nine 128 × 128 ladar images of military vehicles were gathered at various poses and ranges between 350 m and 1.24 km. The images were gathered using the United Technologies Research Center/Air Force Armament Laboratory's CO2 ladar sensor operating at 87 KHz. The sensor has a 2°× 2 ° scan pattern with a range precision of 0.3 m. Range values are the measured time delay from pulse transmission to arrival time after reflec- tance off an object. Reflectance (intensity) is meas- ured from the amplitude of the returned pulse. (For more details about the sensor, see [11].) The military vehicles to be identified were a jeep, tank, truck, and armored personnel carrier (APC). Some of the scenes also contained calibration plates, so

0167-8655/92/$05.00 © 1992 - - Elsevier Science Publishers B.V. All rights reserved 41

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

it was necessary to identify them as well. The im- ages were gathered in natural environments which usu~ly contmned many trees and bushes. In some of the images, portions o f the vehicles were oc- cluded by branches and brush.

Several of the t ~ i c a l range and reflectance im-

ages from the database are shown in Figures 1 through 5. For each figure, (a) is the range image and (b) is the intensity image of the scene. In each of these images, the output has been scaled to 64 gray levels. The range images are displayed such that darker gray shades indicate greater distance

2089, O0

IOe? 8D

t566 75

r" 1305 &e

0 U

10q~ 50 , , 4

U1 r"

783 3;P5

52~ E50

E6! 1~5

(c)

oo oooo ~ ~ . . . . oo IO~ eo;+e 3o~e 4o~s ~'z~o 61:o,, ~l~e sx~

Range Figure !. Images and range=intensity histogram for scene with truck at 350 m. (a) Range image. (b) Intensity image. (c) Range-intensity

histogram.

42

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

from the sensor to the scene point. In the reflec- tance imagery stronger reflections have been displayed as bri~ihter whites while negligible reflec-

tions are coded in darkest grays and blacks. Figures 1 (a) and (b) are the range and intensity im- ages, respectively, from a scene of a truck at a

distance of 350 m from the sensor. Figures 2 (a) and (b) contain the range and intensity images from a scene of an APC at a range of 500 m. Figures 3(a) and (b) are the range and intensity im- ages from a jeep scene. The vehicle is at a distance of 500 m from the sensor. A tank is present in the

. . . . ,iii~ ~ i~ i ,i ¸ i ~ i ~ i i i i ~ , ~

, , ~ o o o (c)

6E6.500

53? 000

C qq? 500

0 (.J

358 000

C

268'500 ) , . . ,

179 000

99 5000

O0 000000 io~ eo4e ".o~e e

4096

Range

Figure 2. Images and range-intensity histogram for scene with APC at 500 m. (a) Range image. (b) Intensity image. (c) Range-intensity histogram.

43

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

scenes of Figures 4 and 5. The tank is at a distance of 500 m in Figure 4 and 1.24 km in Figure 5. Figure 5(b) has been enhanced by histogram equa!ization for display purposes. It is very typical of the quality of the intensity imagery in our

database.

In the remainder of this section we briefly review some of the work which uses integrated range and intensity imagery a n d / o r which pertains to our work. Our review is not meant to be exhaustive.

The presentations of Nitzan et al. [9] and of Duda et al. [4] are important precursors to our

I i n oo (c|

IOeO ID

llilt R50

r. ?3q 3P5 :3 0

(.)

5oP 500

m c:

qq0 6E5 s==e

e93 ?50

IqG 0~5

oo oooo oo

M ~.al_l 1 ~ , v - , . ~ . m L _ _ ~ . . . . . . ~ . . . . ,.

t0e~ e0~e 30~,a 4096 ~teo 6:,o,t 7t6o etga I

Range

Figure 3. images and range-intensity histogram for scene with jeep at 500 m. (a) Range image. (b) Intensity image. (c) Range-intensity histogram.

44

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

work. Their method, which integrates range and intensity data, is useful for segmenting images of scenes which are in close proximity to the sensor, like in an industrial environment. Horizontal, ver- tical, and then arbitrarily-oriented planes are se- quentially identified in range imagery. Horizontal

planes were detected by construction of an histo- gram of range values. The peaks in this histogram were hypothesized to be central planes. Those regions whose range was approximately the same as the hypothesized central plane were referred to as 'sandwich regions' around the central plane.

IE3?.O0

lOBE 38

927 7SO

r- 773 125 3 0

0

3~ ,.J G18,500

C

~ q63.875

309 eSO

154 625

(c)

o o o o o o [ "" ~ - - - . . . . . . . ' . . . . . . ' " oo Io~',, 2o;,e 3o~,a ,,0,6 ~lao Sl4~ 7,~,B .z92

Range

Figure 4. Images and range-intensity histogram for scene with tank at 500 m. (a) Range image. (b) Intensity image. (c) Range-intensity histogram.

45

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

The 'sandwich region' around the central plane was assumed to contain a horizontal surface of significant size. Next, vertical planes were found by projecting all the image points onto an horizon- t ~ plane and then using an Hough technique to detect lines in this plane. The peaks in the Hough space correspond to large vertical surfaces in the image.

In [4], 'arbitrary', or rather, non-horizontal, non-vertical planes were then found by construc- ting intensity histograms of the connected com- ponents which remained in the image after the removal of the detected horizontal and vertical planar surfaces. A plane was fit to each set of pixels whose intensities were at or near an histo- gram peak. This method assumed that surface

......... .......

qO 0000

~Z 0000

(c)

30 0000

C ~5 0000

o t,.j

::r, 20 000o

c

c 15 0000

10 O00O

5 00000

oo oooo oo Ioe4 ~04e 30re 409s 5leo 6z44 ~zse e l g ~

Ranqe

Figure 5. Images and range-intensity histogram for scene with tank at i .24 km. (a) Range image. (b) Intensity image. (c) Range-intensi- ty histogram.

46

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

reflectance was essentially constant in significant portions of a plane.

Verly et al. [12] have recognized military vehicles in identically registered range and intensity images produced from infrared-radar range imagery. Their recognition scheme matched object silhouet- tes against appearance models (AM) of known ob- jects. An AM is not a 3D model; instead, it represents how the object would appear in a 2D image produced by the sensor. AMs describe ob- jects in terms of their parts and the interrelation- ship between parts. Two approaches to recognition were used, one that was contour-based (with primitives such as corners and arcs) and another that was region-based (which used primitives such as subregions and the spatial relations between subregions).

Chu, Nandhakumar, and Aggarwal [3] have presented a segmentation scheme for man-made vehicles in ladar imagery. Their method first seg- ments range and intensity in, age pairs separately and then integrates the segmentations.

Chert and Jain [2] have presented a segmentation procedure for extracting military vehicles from ladar imagery by background removal. Our paper uses the same image database as Chen and Jain but is more straightforward.

Magee et al. [71 used intensity information to obtain clues of 'interesting' feature points. They then proposed selectively sensing range at those points. Object recognition was performed through graph matching; graph representations of the polyhedral vertices and circles in the images were matched to the corresponding graph representa- tions of database objects.

More recently, Jain and Hoffman [6] have recognized objects in range images by first group- ing object pixels to form surface patches and then merging the patches using a priori knowledge of database objects. Salient information about the patches was used to support or reject hypothesized object identities. Bolles and Horaud [1] have per- formed object recognition in range images by employing a hypothesize-and-verify scheme which matches a key, or focus, feature and adds compati- ble features one-by-one.

We discuss our method of pixei-level sensor in- tegration in Section 2 of this paper. We present the

segmentation technique in Section 3. Our pre- liminary efforts at object recognition are then presented in Section 4. Section 5 concludes the paper.

2. Integration of sensory data

The data used in our experiment are pairs of im- ages of outdoor scenes taken with an imaging laser range finder. The ladar sensor consists of a laser, a device for rapidly aiming the laser beam, a timer, and a detector. To form the range and reflectance image pair, the laser beam is fired in rapid pulses. The pulses are directed to form a square grid pat- tern 128 pulses on a side. The total pattern subtends about 2 degrees of arc.

Each pulse leaves the laser, passes through the atmosphere, and reflects off the first opaque ob- ject that it contacts. Some of the reflected radia- tion returns to the sensor and is detected. When the return pulse is detected, measurements of the time for the pulse to return to the sensor (in nano- seconds) and of the amplitude of the pulse are taken. The time required for the round-trip flight of the laser pulse is directly proportional to the distance from the laser emitter to the reflecting ob- ject. The returned amplitude provides the reflec- tance component for each pulse.

The quality of the range imagery is quite good, while that of the intensity imagery is generally poor. Typically, objects are apparent in the range images, but even metallic objects are often not distinct in the intensity imagery.

The range-intensity image pairs produced by the ladar sensor are identically registered. That is, pix- el (i, j ) in a particular range image corresponds to pixel (i, j ) in the corresponding intensity image. There are are many applications in machine vision, such as stereopsis, in which the identification of corresponding pixels in the two images of the scene is extremely difficult. The availability of registered range and intensity information motivated our usage of pixel-level fusion. We wish to contrast pixel-level fusion with feature-level fusion, which is often used in machine vision research.

An image feature is a group of pixels extracted from an image which correspond to some iden-

47

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

tifiable attribute of the object tO be recognized. This could be a straight edge or a flat surface, for example. Feature-level fusion is the process of ex- tracting these features separately from different sensory inputs, and then combining knowledge of the different features to build higher-level abstrac- tiom. Normally, feature-level fusion would save effort and not lose very much information.

Since the multiple sensor information is so tight- ly coupled for our data, however, pixel-level fusion is woRhwhile. A feature-level fusion approach, pursued in parallel to our pixel-level fusion ap- proach, might offer better results with only a small increase in time complexity. At an even higher level, decision-level fusion would also be possible; e.g. several identification schemes could hypo- thesize in parallel.

We achieve sensor integration by constructing an histogram of the intensity values at regular range intervals. Since the amplitude of the re- flected radiation is stronger from metallic objects than from non-metallic objects, intensity is an im- portant consideration for segmentation of metallic objects. For this reason, the bins of the histogram are generated by summing the intensity values for all points in the image which are within the bin's range interval. The highest peaks of this histo- gram, which we call a range-intensity histogram (RIH), are hypothesized as the most likely ranges at which specular or metallic objects could exist. Figures l(c), 2(c), 3(c), 4(c), and 5(c) are histo- grams for the images in parts (a) and (b) of Figures I-5, respectively. Note that each of these histo- grams contain two very strong spikes. These spikes correspond to probable metallic objects. (Most of the scenes contain calibration plates and a metallic vehicle.)

2.1. Histogram construction

For an N x N range and intensity image pair, where Range_Image [i,j] is the entry in the i-th row and j-th column of the range image and Intensi- ty_Image if, j] is the entry in the i-th row and j-th column of the intensity image, a range-intensity histogram (RIH) is constructed using the following algorithm. Note that we have used a bin size (Bin_Size) of 16 and a Max_Range of 8192 (since

it seems that only the lower 13 bit o f the 16 bits of range informat ion are reliable).

Range-lntensity Histogram Construction Algorithm

RIH[l..(Max_Range / Bin_Size)] = 0; For i = I to N

For j = I to N Range_Bin ,-- Range_Image[i,j]/Bin_Size; Current_Intensity ,-- Intensity_image if, j]; RIH[Range_Bin] ,-- RIH[Range_Bin] +

Current_Intensity;

In the range-intensity histogram, the presence of highly reflective objects is indicated by large spikes. The metallic vehicles in our database all strongly reflect the ladar pulses. Thus, we select the K highest spikes in the range-intensity histo- gram as candidate ranges where objects of interest might be present in the input imagery. Our ex- periments have shown that K=3 seems to be a reasonable number of peaks to consider.

This pixel-level fusion approach utilizes the fact that the corresponding range value is known for every pixel in the intensity image. Hence, it is easy to determine the total intensity that is being reflected to the sensor from any particular range. The intensity data thus guides range image segmen- tation by suggesting ranges at which there is an high likelihood of an object of interest. The range-intensity histogram integrates data at the pixel level but still allows rapid generation of only a few hypotheses, each of which is very likely to be a range which actually contains an object of interest.

3. Segmentation

The peaks of the range-intensity histogram are very useful for the isolation of objects of interest from the background. The K highest peaks serve as cues of ranges which likely contain metallic ob- jects. We consider K 'range slices', such that the i-th slice consists of only those rangels (range pix- els) which are contained within a window or 'sand- wich region' about the i-th highest local maximum in the histogram. In each of these range slice im-

48

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

ages, a segmentation is attempted to find an object of interest.

3.1. Range slice generation

Range slice images are analogous to Duda 's 'sandwich regions' [4]. A range slice image consists of all those pixels in the range image which are at

the same range or within a small range interval centered around the histogram spike. The toler- ance interval used in this study is the maximum dimension of the five objects of interest (the max- imum horizontal dimension of the tank is about eight meters). An algorithm to accomplish this is presented below, where R is the range at which the spike was observed, RSI is the N x N range slice

I (i)

d

entation of slice 1. (c) Final segmentation of slice 1. (d) Second slice. (e) Initial segmentation of slice 2. (f) Final segmentation of slice 2. (g) Third slice. (h) Initial segmentation

of slice 3. (i) Final segmentation of slice 3.

49

Volume 13, N u m ~ r I PATTERN RECOGNITION LETTERS January 1992

D

[¢ib} Figure 7. ange sli~e ~ d i~ s~m~ntati0n for scene ~ th A C a 500 m (Figure 2). (a) Range slice from highest peak. (b) Final segmen-

tation.

0

50

Volun,e ~ 3, Number ! PATTERN RECOGNITION LETTERS January 1992

image and Max_Dimension is the maximum dimension of the five objects of interest:

Range Slice Generation Algorithm For i = 1 to N

For j = 1 to N IfIR-Range_Image[i,j][ < Max_Dimension

RSI[i,j] ,-- Range_Image[i,j]; Else

RSI[i,j] ~ 0;

Thus, each range slice image is created by apply- ing a two-way threshold to the original range im- age. The threshold captures all pixels which fall in the 'sandwich region'.

If a vehicle appears in the input range image as well as in the intensity image, then it is captured in one of the range slice images. However, the range-intensity histogramming technique would not be very useful if it were necessary to examine many range slice images, since too much time would be spent examining and rejecting useless range slice images.

One measure of the effectiveness of our ap- proach to segmentation is the number of peaks in the range-intensity histogram which must be con- verted into range slice images before a slice is found which contains a vehicle. For our current database of 79 ladar images, 43 of the scenes con- tain vehicles which appear in the first three range slice images. Twenty-eight of the other ladar im- ages consist of either very poor quality intensity or range images, or sometimes both. We believe that this is caused by improper calibration of the sen- sor. Unfortunately, since generation of a reliable range-intensity histogram depends on both inputs, our segmentation method fails when either of the images is of poor quality. For the remaining eight images, the vehicle-induced histogram peaks ap- pear somewhere between the fourth and thirty- second highest peaks in the histogram. Therefore in the current implementation of our algorithm, we use K= 3 peaks.

Figures 6(a), (d), and (g) show the range slice im-

ages for the three highest histogram peaks for the ladar images in Figure 1. The slice image cor- responding to the global maximum in the histo- gram is shown in Figure 6(a). The slice image corresponding to the second largest maximum is shown in Figure 6(d) and the slice image cor- responding to the third highest spike is shown in Figure 6(g). Figure 6(a) contains calibration plates while the truck is in Figure 6(d), the range slice im- age from the second highest histogram peak.

Figure 7(a) shows the range slice image cor- responding to the global maximum from the histo- gram of Figure 2(c). Figure 8(a) is the slice at the range of the highest peak of the histogram in Figure 3(c). It contains the calibration plates and ground scatter. The vehicle appears in the second range slice image, shown in Figure 8(c). Similarly, Figure 9(a) contains the range slice associated with the highest peak of the histogram from Figure 4(c). Figure 9(c) contains the second range slice for that scene. Figure 10(a) shows the second range slice ex- tracted from the scene imaged in Figure 5. The range slice image associated with the largest peak of the histogram for Figure 5's imagery contains only a few scattered noise points.

Each ladar image pair has at this point been reduced to three range slice images, each one of which contains only those points in the narrow band of range values making up the 'sandwich region'. Slice images which contain vehicles also typically contain a strip of ground on which the object of interest rests. This strip often extends across nearly the entire width of the image. Bushes or trees at the same range as the primary object of interest also appear in the slice images. Figures 8(a) and 9(a) exemplify this. Naturally, some slices con- tain only trees and/or bushes. Usually trees and bushes appear no earlier than the third slice, as in Figure 6(g). Usually, some salt and pepper noise is also present.

3.2. Segmentation of objects of interest

It might be necessary to perform a detailed

Figure 8. Range slice images and segmentations for scene with jeep at 500 m (Figme 3). (a) Range slice from highest peak. (c) Range slice from second peak. (b), (d) Final segmentations.

51

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

i ¸

at 500 m (Figure 4). (a) Range slice from highest peak. (c) Range (b), (d) Final segmentations.

(b)

$2

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

shape analysis for each slice to perfectly dis- tinguish objects of interest from the background clutter. To avoid this time-consuming process, we employ a simple strategy using a priori knowledge of vehicle size to quickly place reasonable bounds on the lateral location of the vehicle in the slice im- age. This helps us quickly isolate vehicles from ground scatter in the image.

Since the sizes of the expected vehicles are known beforehand, it is a simple matter to cal- culate the maximum object dimension, L. A sliding N × L window is moved across the horizon of the N × N image. We count the number of non- zero pixels in the sliding window and find the win- dow position which contains the maximum count. We hypothesize that this position is the most likely lateral position of the metallic object. Those pixels falling outside the window containing the hypothe- sized object are deleted from the range slice image. Next we remove all isolated pixels from the range slice image. The elimination and bushes from the range segmentation of the vehicles

of extraneous ground slice image eases the and helps reduce false

identification. Figures 6(b), (e), and (h) show the results of the ground scatter reduction operation.

Many of the range slice images are generated because of clutter (e.g., trees and bushes), noise, and extraneous object parts. A simple area thresh- old eliminates these range slice images from fur- ther consideration. The area, A, of the ferret box which bounds the maximum sized-vehicle (tank) is calculated. We require that every candidate slice image contain an object whose area is greater than aA, where a is 0.05. We chose this small value of

to compensate for potential occlusions. It is im- portant to note that none of the slice images which were eliminated by this area threshold contained any vehicles.

Next, we find the outlines of connected com- ponents in the slice image. The range slice image is searched in raster scan order for the first nonzero pixel and a boundary-following algorithm [10] is used to find the connected component. A con-

nected component is considered a potential object of interest only if its boundary contains more than ten pixeis. Components which have small contours are due to either noise or background clutter.

Once a candidate object has been selected, fur- ther ground strip reduction is necessary. To do this, we use knowledge about the typical orienta- tion of vehicles with respect to the ground. The vehicles' longitudinal axis is parellel to the horizon, so those pixels which form the 'ground' around the base of the vehicle can be removed from the range slice image. This is accomplished by 'shrinking' the object first from its leftmost and then from its rightmost point in the image. Col- um ns of pixeis are removed from a side until a col- umn is encountered whose highest nonzero element is at least cH, where H is the height of the segmented object. The threshold, c, was 0.35. This reduction removed most of the remaining ex- traneous ground strips while preserving important object features.

The results of this operation are shown in Figures 6(c), (f), and (i), and in Figures 7(b), 8(b) and (d), 9(b) and (d) and 10(b).

Our segmentation method took between 50 and 90 seconds to completely process a range and in- tensity image pair on a SUN 3/60.

4. Object identification

Object identification is done in two stages. The preprocessing stage deletes clutter based on five 2D shape features. These features are: height (H), width (W), ratio of height to width (HW), ratio of contour perimeter to area (PA), and ratio of object area to bounding (or ferret) box area (AB). If two or more of the five measured features fall outside of prescribed intervals, the object is rejected (i.e., classified as clutter). The largest and smallest vehicles (the tank and jeep, respectively) were used to construct the bounds on feature values. This stage eliminates small objects, prunes large ob-

Figure 10. Range slice image and segmentation for scene with tank at 1.24 km (Figure 5). (a) Range slice from second highest peak. (b) Final segmentation.

53

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

Table I

Confusion matrix showing nearest neighbor classifier perfor- mance using H and W features

Object Decision

Tank APC Jeep Truck Plate

Tank 10 0 0 1 0 APC 0 8 0 1 0 Jeep 0 1 8 0 2 Truck 0 0 0 12 0 Plate 0 0 0 0 30

jects, and re jec ts unidentifiable objects. The original set of 129 objects in the 43 scenes (one ob- ject from each of the three slices per scene) is reduced to 73 objects of interest. These 73 objects include 30 calibration plates, eleven tanks, nine APCs, eleven jeeps, and twelve trucks. It is impor- tant to note that none of the rejected objects were vehicles.

The object identification scheme uses a nearest neighbor classifier under a leave-one-out method- ology. Leave-one-out refers to treating each pat- tern as a test pattern while all the others serve as training patterns. We did not wish to make any assumptions about the distribution of the features, so we chose to use this nonparameteric approach.

We modified Whitney's feature selection method [13] to determine the discriminability of features. Whitney's method of forward feature selection generates a set of features by successively joining the next best feature to the set of already selected features. The next best feature is that which, when joined with the existing set, misclassifies the least number of patterns. Our modification uses sequen- tial backward feature selection. This method itera- tively removes that feature which causes the fewest number of patterns to be misclassified by the re- maining features.

The leave-one-out nearest neighbor classifier identifies the objects using the features: height (H), width (W), ratio of height to width (HW), ratio of contour perimeter to area (PA), and ratio of object area to bounding (or ferret) box area (AB).

Sixty-seven of the 73 objects were correctly classified using all five of the features. Removal of the HW feature allowed 68 of the objects to be

properly identified. Using the H, W, and PA features allowed 67 of the objects to be correctly classified.

The two simplest features, height (H) and width (W), when used together classify 68 of the 73 ob- jects correctly (see the Confusion Matrix in Table 1). The 'curse of dimensionality' [5] is the probable cause of this phenomenon; using more features can increase the probability of misclassification due to small sample size. The single best feature is object width (W) which classifies only 48 of the 73 objects correctly, however.

A separate test was also performed, again using the nearest neighbor classifier under the leave-one- out method with backward feature selection, on only the non-calibration plate objects. Thirty-eight of the 43 vehicles were correctiy classified using just the H and W features. Using all the five features together resulted in a correct classification of 34 of the 43 objects.

Therefore, use of just the two best features, H and W, resulted in correct classification of 88~/0 of the objects of interest under both of the nearest neighbor classification tests. These two features capture the shape information necessary for dis- crimination between the object types in this small database.

$. Results and conclusions

We have presented a new segmentation technique which finds objects in registered range and in- tensity images. Our method merges the data using a range-intensity histogram and hypothesizes the existence of objects at the ranges corresponding to the highest histogram peaks. A nearest neighbor classification scheme is used for identification after the set of potential objects has been pruned by removal of clutter.

Several problems do exist with the strategy we have implemented here. The major problem is that the generation of the range-intensity histogram is impossible unless good range and good intensity images exist. If the ladar sensor is not correctly calibrated, as apparently was the case in several of our images, the histogramming process will fail. When attempts to fuse such data fail using our

Volume 13, Number 1 PATTERN RECOGNITION LETTERS January 1992

131 000

114 625

98 2500

81 8750

65 5000

49 1250

32 7500

~ 375o ~ ~~~ oo oooo o~n~ oo~Juu,n , , ~ © ~ o o

-90 -67 -45 -e2 oo 23 45 68

Angular Deflection

Figure ! 1. Plot of angular deflection.

strategy, it would be desirable to process each im- age separately. This implies that several concurrent or parallel segmentation or identification algo- rithms should be implemented. The sensor used also suffers from difficulties at long range. At vehicle range 800 m, for instance, only ten of twen- ty vehicles were successfully segmented. Still, it is encouraging to note the success of our scheme when images of a reasonable quality were available.

We are experimenting with several improve- ments to our system. Estimation of object pose, for instance, could improve our recognition scheme somewhat. We believe that vehicle orientation can be estimated using the direction of object surface normals projected onto the x - z plane. (We define x to be along the horizon and z to be along the sen- sor axis.) Currently, we are able to accurately determine the orientation of vehicles which have only one face visible to the sensor.

Figure 11, for example, shows the angular de- flection from the z-axis of the surface normals for all the interior points of the outlined truck from Figure 6(f). We compute surface normals over 3 x 3 pixel neighborhoods around each pixel in the

object by transforming each neighborhood to the coordinate-system origin and finding the eigen- vector associated with the minimum eigenvalue of the nine points. Since the object in Figure 6(0 is pointed toward the sensor, we expected to find the surface normals clustered around a 0 ° deflection. It would be interesting to test this technique's robustness for vehicles in arbitrary poses with multiple faces exposed. Larger neighborhoods might provide better estimates of orientation in such scenes.

References

[!] Bolles, R.C. and P. Horaud (1986). 3DPO: A three dimen- sional part orientation system. Int. 3. Robotics Research 5

(3), 3-26. [2l Chen, S.-W. and A.K. Jain (1991). Object extraction from

laser radar imagery. Pattern Recognition 24, 587-600. [3] Chu, C.-C., N. Nandhakumar, and J.K. Aggarwal (1990).

Image segmentation using laser radar data. Pattern

Recognition 23, 569-581. [4] Duda, R.O., D. Nitzan, and P. Barrett (1979). Use of

range and reflectance data to find planar surface regions. IEEE Trans. Pattern Anal. Machine Inteli. !, 259-271.

55

Volume 13, Number I PATTERN RECOGNITION LETTERS January 1992

[5] Jain, A.K. and B. Chandrasekaran (1982). Dimensionality and sample size considerations in pattern recognition prac- rice, in: P.R. Krishnaiah and L.N. Kanal, eds., Handbook of Statistics, gol. 2. North-Holland, Amsterdam, 835- 855.

[6] Jain, A.K. and R. Hoffman (1988) Evidence-based recognition of 3-D objects. IEEE Trans. Pattern Anal. Machine inteU, i0, 783-802.

[7l Magee, M.J., B.A. Boyter, C.-H. Chien, and J.K. Agar- wal (1985) Experiments in intensity guided range sensing recognition of three-dimensional objects. IEEE Trans. Pattern Anal. Machine Intell. 7, 629-637.

[8] Mitiche, A. and J.K. Aggarwai (1986). Multiple sensor in- tegration/fusion through image processing: a review. Op- tical Engineering 25, 380-386.

[9] Nitzan, D., A.E. Brain, and R.O. Duda (1977). The meas- urement and use of registered reflectance and range data in sense analysis. Proc. IEEE 65, 206-220.

[10] T. Pavlidis (1982). Algorithms for Graphics and Image Processing. Computer Science Press, Rockville, MD.

[11] Riggins, J., M.J. Lankford, and W.J. Green, Jr. (1985). Absolute ranging CO2 ladar sensor: design and perfor- mance. Air Force Armament Laboratory/DLMl, Eglin Air Force Base, Florida and United Technologies Research Center, East Hartford, Connecticut, October 20, 1985.

[12] Verly, J.G., R.L. Delanoy, and D.E. Dudgeon (1989). Machine intelligence technology for automatic target recognition. The Lincoln Laboratory J. 2, 227-307.

[13] Whitney, A.W. (1971). A direct method of nonparametric measurement selection. IEEE Trans. Computers 20, 1100-1103.

56