[IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San...

6
Verifying Model-Based Alignments in the Presence of Uncertainty T. D. Alter & W. E. L. Grimson MIT Artificial Intelligence Laboratory 545 Technology Square Cambridge, MA 02139 Abstract This paper introduces a unified approach to the prob- lem of verifying Alignment hypotheses in the presence of substantial amounts of uncertainty in the predicted locations of projected model features. Our approach is independent of whether the uncertainty as distributed or bounded, and, moreover, incorporates information about the domain in a formally correct manner. Izformation which can be incorporated includes the error model, the distribution of background features, and the positions of the data features near each predicted model feature. Ex- periments are described that demonstrate the improve- ment over previously used methods. Furthermore, our method is eficient in that the number of operations is on the order of the number of image features that lie nearby the predicted model features. 1 Introduction The basic problem of model-based recognition is to find correspondences between features of a model and an image. Given a 3-D model composed of geometric fea- tures and a cluttered image, the image is run through a feature detector to get a set of 2-D features that are com- parable to the 3-D model features. Model-based systems commonly perform recognition by searching the space of all correspondences between model and image features. The Alignment approach first hypothesizes a small set of correspondences (the “hypothesis”) and then projects the model into the image for verification. The verifica- tion step consists of looking near each projected model feature for confirming image features, and then deciding whet,her the model is present based on some measure of similarity between the predicted and confirming features. In this article, we propose a method for deciding whether to accept or reject a hypothesis that is robust to the locational errors in the image features. The pro- posed method is efficient, in the sense that the number of operations is a small constant times the number of con- firming features. For robustness, the method is formally grounded in an error model (a model of the distribution of possible locations of a sensed feature). Furthermore, our method allows for both bounded and distributed (e.g., Gaussian) models of error, and our method avoids pre-determined thresholds. To avoid pre-set thresholds, the key idea is to use the “likelihood-ratio test” (e.g., [20]) to decide on-line whether to accept or reject a hy- pothesis. A second key idea of this work is to incorporate as much information as possible into the accept/reject decision. In fact, the likelihood-ratio test serves as part of a framework for incorporating any available informa- tion. The information we show how to incorporate in- cludes, for example, knowledge of the particular error model, knowledge of the distribution of background fea- tures, and knowledge of the number of data features near each model feature. Overall, our main result is a robust formula for evaluating how well aligned is a set of pro- jected model features with a set of data features. This paper closes the loop on our group’s investigation of error characteristics of Alignment methods (summa- rized in Table 1). Our previous analyses studied how error propagates from matched image features to pre- dicted model features, and these analyses were formally grounded in an error model. To then evaluate the hy- pothesis, however, a fixed threshold was used on the number of confirming features. In contrast, the eval- uation criterion that we derive here is again formally grounded in the error model, and is optimal in that the new method is guaranteed to perform better than other Alignment methods under the specified assump- tions. This method adds to an important set of op- timally robust techniques for model-based recognition, which currently provides for Backtracking Search [4, 51, Geometric Hashing [18, 81, and Pose-Space Search [7]. The literature on robust methods for object recog- nition has grown considerably over the last few years. The robust methods can be divided into those that incorporate a Gaussian error model [19, 8, 6, 21, 181 and those that incorporate a bounded error model [4, 15, 5, 7, 12, 21. For Alignment, Sarachik and Grim- son [19] used a Gaussian error assumption and pro- posed a robust Alignment method for recognizing two- dimensional objects composed of point features. This paper also studies the effects of Gaussian error, but our approach leads to a method that is much simpler and that is optimal for Alignment according to the Gaus- sian error model. Even so, the most basic differences with our method are that we handle three-dimensional objects and that we use line features for verification. Also assuming Gaussian error, Wells [21] cast the recognition problem as maximizing an objective function over the spaces of poses and correspondences. Beveridge et al. [6] also used a robust method to evaluate model poses. When the pose is overdetermined by given cor- respondences, Kumar and Hanson [16] and Hel-Or and Werman [13] have analyzed the effect that errors in im- age features have on the accuracy of a pose estimate. For Geometric Hashing, Costa et al. [8] and Rigoutsos and Hummel [18] examined how uniform and Gaussian 1063-6919/97 $10.00 0 1997 IEEE 344

Transcript of [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San...

Page 1: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

Verifying Model-Based Alignments in the Presence of Uncertainty

T. D. Alter & W. E. L. Grimson MIT Artificial Intelligence Laboratory

545 Technology Square Cambridge, MA 02139

Abstract This paper introduces a unified approach t o the prob-

lem of verifying Alignment hypotheses in the presence of substantial amounts of uncertainty in the predicted locations of projected model features. Our approach is independent of whether the uncertainty as distributed or bounded, and, moreover, incorporates information about the domain in a formally correct manner. Izformation which can be incorporated includes the error model, the distribution of background features, and the positions of the d a t a features near each predicted model feature. Ex- periments are described that demonstrate the improve- ment over previously used methods. Furthermore, our method is eficient in that the number of operations i s on the order of the number of image features that lie nearby the predicted model features.

1 Introduction The basic problem of model-based recognition is to

find correspondences between features of a model and an image. Given a 3-D model composed of geometric fea- tures and a cluttered image, the image is run through a feature detector to get a set of 2-D features that are com- parable to the 3-D model features. Model-based systems commonly perform recognition by searching the space of all correspondences between model and image features. The Alignment approach first hypothesizes a small set of correspondences (the “hypothesis”) and then projects the model into the image for verification. The verifica- tion step consists of looking near each projected model feature for confirming image features, and then deciding whet,her the model is present based on some measure of similarity between the predicted and confirming features.

In this article, we propose a method for deciding whether to accept or reject a hypothesis that is robust to the locational errors in the image features. The pro- posed method is efficient, in the sense that the number of operations is a small constant times the number of con- firming features. For robustness, the method is formally grounded in an error model (a model of the distribution of possible locations of a sensed feature). Furthermore, our method allows for both bounded and distributed (e.g., Gaussian) models of error, and our method avoids pre-determined thresholds. To avoid pre-set thresholds, the key idea is to use the “likelihood-ratio test” (e.g., [20]) to decide on-line whether to accept or reject a hy- pothesis. A second key idea of this work is to incorporate as much information as possible into the accept/reject decision. In fact, the likelihood-ratio test serves as part

of a framework for incorporating any available informa- tion. The information we show how to incorporate in- cludes, for example, knowledge of the particular error model, knowledge of the distribution of background fea- tures, and knowledge of the number of data features near each model feature. Overall, our main result is a robust formula for evaluating how well aligned is a set of pro- jected model features with a set of data features.

This paper closes the loop on our group’s investigation of error characteristics of Alignment methods (summa- rized in Table 1). Our previous analyses studied how error propagates from matched image features to pre- dicted model features, and these analyses were formally grounded in an error model. To then evaluate the hy- pothesis, however, a fixed threshold was used on the number of confirming features. In contrast, the eval- uation criterion that we derive here is again formally grounded in the error model, and is optimal in that the new method is guaranteed to perform better than other Alignment methods under the specified assump- tions. This method adds to an important set of op- timally robust techniques for model-based recognition, which currently provides for Backtracking Search [4, 51, Geometric Hashing [18, 81, and Pose-Space Search [7].

The literature on robust methods for object recog- nition has grown considerably over the last few years. The robust methods can be divided into those that incorporate a Gaussian error model [19, 8, 6, 21, 181 and those that incorporate a bounded error model [4, 15, 5, 7, 12, 21. For Alignment, Sarachik and Grim- son [19] used a Gaussian error assumption and pro- posed a robust Alignment method for recognizing two- dimensional objects composed of point features. This paper also studies the effects of Gaussian error, but our approach leads to a method that is much simpler and that is optimal for Alignment according to the Gaus- sian error model. Even so, the most basic differences with our method are that we handle three-dimensional objects and that we use line features for verification.

Also assuming Gaussian error, Wells [21] cast the recognition problem as maximizing an objective function over the spaces of poses and correspondences. Beveridge et al. [6] also used a robust method to evaluate model poses. When the pose is overdetermined by given cor- respondences, Kumar and Hanson [16] and Hel-Or and Werman [13] have analyzed the effect that errors in im- age features have on the accuracy of a pose estimate.

For Geometric Hashing, Costa et al. [8] and Rigoutsos and Hummel [18] examined how uniform and Gaussian

1063-6919/97 $10.00 0 1997 IEEE 344

Page 2: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

91 GHJ92 GHA92 SG93 Here

2 -D rigid, 3-D rigid line segments bounded 2-D affine planar, points bounded

3-D projection points bounded 2-D affine planar, points Gaussian

3-D projection points and line segments bounded and Gaussian

Table 1: Summary of error analyses of Alignment by the MIT AI Lab vision group. The model and data features are either points or line segments. A 2-D rigid imaging model implies that the model and data features are 2-D and exist in R’, and similarly for a 3-D rigid imaging model. A 2-D affine imaging model implies that the model and data are 2-D, but the model exists in R3, because a 2-D affine transformation of a 2-D (i.e., planar) model is equivalent to a scaled orthographic projection of that model. Lastly, 3-D projection implies that the model is in R3 and is 3-D (not necessarily planar), and the data is 2-D.

error in the image points affect the affine-invariant pa- rameters used for hashing. In addition, Lamdan and Wolfson [17] studied the problem of determining when three image points provide an unstable basis for Ge- ometric Hashing. Jacobs [15] determined exactly how bounded error affects hashing indices. Grimson et al. [la] and Sarachik and Grimson further developed this result and used it to analyze Geometric Hashing algorithms.

Other approaches have studied how to robustly match models and images in the presence of bounded error. Baird [4], derived linear constraints on model poses by assuming a linear projection model. Following Baird, Cass [7] also used linear constraints, and he showed that finding the model pose that aligns the most image and model features to within the error bounds is inherently a polyrioniial-lime probleni. Breuel [5] modified Baird’s approach to produce a tree-search algorithm that in the worst case runs in polynomial time.

2 Our Approach Given an initial set of correspondences, the fundamen-

tal question for an Alignment system is how lo deterniirie whether that hypothesis is correct. In making this de- termination, the basic steps of Alignment are:

1. to select a hypothesized set of model points and matching image points,

2. to use the matched points to compute a transforma- tion that brings the model into the image coordinate system and then project the model into the image,

3. to decide what region in the image to search about each projected model feature to find candidate im- age features, and

reject the hypothesis. 4. to use the candidate image features to accept or

In this paper, we focus on step 4, and this section dis- cusses our basic approach. In Sec. 4, we will perform experiments in which we consider in detail an approach that uses minimal sets of points for generating hypothe- ses (ah in [9, 141). In this case, hypotheses contain t,hree point pairs, since three is the minimal number needed to compute a pose. Further, we will model how a 3-D ob- ject is brought into the image as a rigid transformation followed by “weak perspective” projection (orthographic projection plus a scaling).

Figure 1: Region to search for candidate line segments and two possible candidates. One of the possible candidates can be extended to the left to intersect the smaller circle; so this segment is a candidate match for the model line segment. The other possible candidate cannot be extended to intersect both circles, and consequently would be discarded.

2.1 Uncertainty regions Given a hypothesis, errors in the locations of the

matched image features propagate to an uncertainty re- gion in the predicted position of any unmatched model feature. For the step of finding candidate image features for each predicted model feature, [14] found candidates by looking in a small region of fixed size and shape about ea,ch predicted feature. In contrast, our approach looks to compute a search region that includes almost all image features that could correspond to the predicted feature.

When minimal sets of points are used for comput- ing poses along with a weak-perspective imaging model, [a] has demonstrated that the correct search regions for predicted model points tend to be discs centered at the nominal locations of the predicted points and that the radii of the discs can vary considerably in size. More recently, Alter and Jacobs [3] explained why the regions tend to be circular and provided an efficient analytic ap- proximation for the radii.

As discussed in [a], this result can be used to bound the uncertainty in predicted line segments. For each model line segment we calculate the uncertainty circles for its endpoints. Then an overestimate of the set of im- age line segments that could match a model line segment is given by all line segments connecting pairs of points in the two circles. To allow for some fragmentation and partial occlusion, we would also accept any sub-segment of one of these line segments (Fig. 1).

If the initial hypothesis contains more than three points, then Alter and Jacobs’ analytic solution for the fourth point uncertainty region does not apply. In this situation, Alter and Jacobs showed how to compute the uncertainty regions by using linear programming. If the initial hypothesis is not restricted to point matches (e.g.,

345

Page 3: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

contains matched line segments), then there is currently no solution for the uncertainty regions of the predicted model features. 2.2 Likelihood-ratio criterion

Once the correct sea,rch regions have been computed, the next step is to use the candidate matches to evaluate the hypothesis. [9] accepted or rejected hypotheses using a fixed threshold on the total number of image features in the candidate matches, and [14] used a fixed thresh- old on the fraction of model features for which candidate image features were found. In general, we would prefer a formal method for picking out those hypotheses that are more likely. This formal method should make the best possible use of the information that is available. For instance, if an image feature arises inside a larger un- certainty region, then the feature is more likely to have come from some other object. As another example, an- other source of information for judging the likelihood of a hypothesis is the total number of image features; the more spurious features there are in the image, the less distinguishing is the set of projected model features. In addition, the more features there are in the model, the more distinguishing are the projected model features.

Yet another piece of information which we may have available is the distribution of error in the image fea- tures. Bounded error is a “no knowledge” assumption, and could be used if one suspects that the true distribu- tions may be significantly skewed from Gaussian. But if the error is believed to be Gaussian, then each can- didate image feature could be ranked according to how likely the model feature was to produce it.

To incorporate the available information, we make use of the likelihood-ratio test (e.g., [20]). Given a hy- pothesized pairing of p model and image features, we must decide whether to accept this hypothesis. Let H be the event that the pfeature match is correct. Given the image, which we call event I , we choose H if P r [ H ( I ] > P r [ p l I ] , and otherwise we choose g . Us- ing Bayes’ rule, we have

The left side of this rule is the likelihood ratio. The test minimizes the probability of making an error, and conseq~ent~ly treats false positives (choosing H when is true) and false negatives (choosing when H is true) equally. To penalize one more than the other, we could associate costs with correct and incorrect decisions. 2.3 Domain information

We are searching for an instance of a model in the image, and all image features that do not come from the model are spurious. To evaluate P r [ I l H ] / P r [ l l f l and P r [ H ] , we need to define more precisely what we mean by events I and H. In general we will suppose that there are two processes by which image features are produced: If the hypothesis is correct, we know that a “model process” produced an image feature for each un- matched model feature. By this process, image features

are produced by projecting 3-D model features and then corrupting them according to a noise distribution. The noise distribution allows for errors in feature detection, which results in uncertainty in feature position and, for line features, in orientation and length (which includes the effects of fragmentation and occlusion).

All image features that do not come from the model process are produced by a “random process.” This pro- cess produces image features by dropping spurious fea- tures uniformly over the image. When the hypothesis is correct, spurious features arise from events such as ob- ject edges that are not represented in the model, other objects in the scene, shadows, and specularities. When the hypothesis is incorrect, spurious features also arise from the model. So we are assuming that the features these events introduce effectively occur according to a uniform distribution. This assumption has been made before for analyzing the verification stage of model-based recognition and has yielded accurate results [ lo , 191. In particular, [19] found that the assumption was effective when applied locally to the area of the image around where the model projects. Local uniformity in the im- age features is sufficient for our work as well.

It is important to understand that the event I is de- pendent on the recognition al orithm. For instance, for the Alignment algorithm of [147, I is the event that a par- ticular fraction of predicted model point,s lie within E of some image point. The Alignment algorithms examined in [ lo , 12 , 11, 21 augment the verification method in [14] by instead using, for each predicted feature, the minimal search region that is guaranteed to include its matching image point if it exists. To use all possible information, we would define I to be the event that the image features arose at their precise locations. Computing the probabil- ity of this event, however, requires enforcing all the con- straints between model features. The idea of Alignment, however, is that we can ignore the constraints between the unmatched model features. This assumes that their projected image locations can be treated independent,ly. In the next section, we make a choice of I that is con- sistent with this Alignment philosophy and uses more information than [14, 10, 12, 11, 21.

3 Formula for the Probability Ratio This section computes P r [ l l H ] / P r [ I l g ] , for both

bounded and distributed uncertainty in the predicted model features. Our analysis supposes that the search regions do not intersect; when in fact the regions do intersect, our confidence in the hypothesis is overesti- mated. The analysis further supposes that the model noise process does not produce image features outside the uncertainty regions, which in practice means that the uncertainty regions should be guaranteed to contain the features up to some high probability.

To deal with occlusion, we assume that, if the hy- pothesis is correct, then any model feature whose search region contains at least one image feature is not in fact occluded. This approximation is most appropriate for situations in which the uncertainty regions tend to be relatively small. In these situations, if an object occludes a predicted model feature, then i t is likely that the object also occludes the entire uncertainty region. This view is consistent with the decision criteria in [14, 10, 12, 191.

346

Page 4: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

Below we first compute the ratio of the image probabil- ities for the case of no occlusion, and then extend the formula to account for occlusion as well.

Allowing for distributed error, we will take into ac- count the actual positions of the image features within the uncertainty regions. In so doing, our method will be consistent with the Alignment approach, where we do not have to enforce global consistency. In particular, we define I to be the event that the image features arose at their particular locations and that the model process produced features in the regions independently of one an- other. As in Sec. 2, €€ is the event that a p-point match is correct. Let m be the number of unmatched model features, and let r equal the number of unmatched im- age features. Further, let AI be the area of the image, and let VI be tmhe volume of translations and rotations that place a line segment of known length inside the im- age (a formula for is given in [a]). Then let p~ be the probability that a feature dropped randomly into the image lands in a particular position: pr equals 1/Ar for points and l/fi for lines. Then the probability of the r image features landing in their particular positions is p i . Since there are r! possible orderings of the image features, the probability of r spurious features occurring in the r positions is Pr[I17?] = r!p;.

Now suppose that the hypothesis is correct. Then for each search region in which there is at least one image feature, one image feature came from the model process and the rest occurred randomly. Let Ri be the search regions with at least one feature, for i = 1 , 2 , . . ., m, and let ri be the number of features in the i th search re- gion. We number t,he features in region Ri from t i = 1 to t i = ri , for i = 1 . . . m. Also, let Mi represent the ith model feature, and let D,;(ti) be the propa- gated distribution for the uncertainty region Ri, which we assume is known: Di(ti) = Pr[Mi = t i ] . We con- sider the probability of a particular assignment to the model features: {Mi = t ; , for i = 1 . . . m}. For the remaining features, there are always ( T - m)! ways for them to have landed in the image at random. Then the probability that the model process produced the particular assignment, {M; = t i , for i = 1 . . . m}, and that the random process produced the rest is P T [ M ~ = tl A = t 2 A . . ' A Mm = t m ] ( r - m)!p;-m, which equals D l ( t l ) D z ( t l ) . . *Dm( tm) ( r - m)!pl;-m, where in- dependence comes from our choice of I . For each region, the image feature that was produced by the model is free to be any of the ri candidates, where we assume again that the regions are disjoint. Summing over all of the possibilities for the correct image features, Pr[ I lH] =

m ) ! ~ l ; - ~ D l ( t i ) ) . ' . (C,',"=, Dm(tm)) . Then E;:=, ' .

P r [ I I H ] / P T [ I IE] =

Dl(t1) ' ' .Dm(tm)(r - m)!p;- = ( r -

This formula is our main result. The number of oper- ations approximately equals the number of nearby data features plus three times the number of predicted model

features, showing that we can both quickly evaluate an Alignment hypothesis and formally account for the un- certainty that is inherent in the image data.

Next we consider Gaussian uncertainty distributions, because when hypotheses consist of triples of point correspondences, Gaussian error in the image points propagates to Gaussian uncertainty in predicted model points [3]. Let Di = Gi = exp(--dZ/(2~;))/(2~$), where d is the distance of the image point to the the nominal image location of the unmatched model point, and where r is given in [3]. For line features, we unfortu- nately do not have an analytic solution for how Gaussian error in the image points propagates. One possible ap- proach, for future investigation, would be to try to fit an analytic function to a numerically sampled distribution.

For bounded error, we note that only knowing that the uncertainty regions are bounded is the same as having uniform distributions within the uncertainty re- gions, since then the actual positions of the image fea- tures in the regions are irrelevant. If H is true, then in each region one feature occurred due to the model and the rest arose randomly. Let pi be the probability that a feature dropped randomly into region Ri lands at a particular location, for i = 1 . . .m. For points, pi = l/&, and for lines, pi = l / K . Since the uncer- tainty is uniform, Eq. 2 becomes P r [ I l H ] / P r [ I ] m =

To allow for occlusion, let cz be the probability that a given model feature is occluded, for i = 1, . . . , k . If H is true, every empty search region was occluded with prob- ability c i , and for each search region with one feature, the feature was not occluded with probability 1 - ci. In addition, for the non-empty regions the likelihood ratio in Eq. 2 does not change (see [l] for details).

((r - m)!rlrZ'..rm/7:!)(P1P2 . . . P m / P ? ) .

4 Experiments We next perform experiments that compare the dis-

tributed and bounded verification measures that we de- rived in the last section to measures used in previous Alignment work. We focus on this particular set of algo- rithms in order to understand the degree of improvement from using our results, for situations in which Alignment is the desired approach. In general, when looking for a robust method to solve a given recognition problem, ro- bust versions of other recognition paradigms should be considered as well [4, 8, 18, 15, 5, 21, 71. To compare the effectiveness of the different measures, we empirically es- timate for each the expected number of mistakes that would be made by an Alignment algorithm for varying numbers of correct and incorrect hypotheses. This mea- sure of performance is important for recognition, which is our main concern in these experiments. If the Align- ment algorithm were instead to be used as a course fil- ter to a more expensive verification process, then other measures, such as the probability of the algorithm find- ing any correct hypothesis, or the expected percentage of incorrect hypotheses, may be more appropriate.

In terms of the basic Alignment steps listed at the be- ginning of Sec. 2, the verification measures are applied in step 4, and each measure intrinsically makes an as- sumption about the search region used in step 3. In the experiments, we consider four measures. The first two measures take as input P r [ H ] , the a priori probability

347

Page 5: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

of a hypothesis, and ci, the probabilities of occlusion. Let m be the number of predicted model features and T be the number of data features. The measures and the methods used to compute them are:

I Distributed Probability Ratio I To verify a hypothe-

Bounded Probability Ratio

I I sis, this method first computes the uncertainty region for every Dredicted model feature. Next, for each predicted

This method also begins

In the following experiment, we considers models that are composed of point features and the situation where P T [ H ] = .5, that is, where as far as one knows the hypothesis is equally likely to be correct or incorrect. We began by uniformly dropping T image points over a 500 x 500 image. Then we uniformly dropped a set of m uncertainty circles into the image, each with radius 17, the expected radius of an uncertainty circle from the data given in 121. At each of a series of trials, we generated one image where all the points were random, and one im- age where m of the image points came from the model. For image points that came from the model, each such point was sampled from a truncated Gaussian distribu- tion, with truncation at 5 pixels from the center, and a standard deviation of 2.04 (this gave 95% of the Gaus- sian being within the truncation radius). However, any model region could be occluded with a fixed probability (which we set to 0 or .25), in which case the occluded uncertainty region contained no image features (see dis- cussion in Sec. 3) . We repeated this process for 10000

,, . >

GHJ92 AG93 Here we implement the method from [ l a , 21. Their method computes the uncertainty re- gions for predicted model features, and then determines whether each uncertainty region contains at least one data feature. If the fraction of predicted features whose uncertainty regions are non-empty is at least a pre-set threshold, then the hypothesis i s accepted, and other- wise it is rejected. For the threshold, as in [la] we use the fraction of the model that is expected to be occluded, c. Our experiments also consider an optimal version of this method. In an optimal version, we try all m distinct thresholds and report the lowest error rate. This gives an upper bound on the performance of these methods.

[HU901 Huttenlocher and Ullman’s verification method simply searches about each predicted feature in- side some small, fixed-sized region. For point features, we search in an c disc about each predicted point (as in [14]). For line segments, we construct a search region based on the segment’s endpoints as in Fig. 1. The hy- pothesis is accepted if the fraction of predicted features that are accounted for is at least a pre-set threshold, and we use their same threshold, .5. As above, we also con- sider a version where the threshold is chosen optimally.

m

~

348

trials (20000 hypotheses), over which we computed the fraction of trials in which a false positive occurred ( p f p ) and the fraction in which a false negative occurred (p fn ) . The total error rate is the average of these two quantities: error] = p jpPr[H] + p j n p r [ ~ ] = !j ( p j p + pjn).

Table 2 shows the resulting error rates. In all cases the distributed formula gave an error rate that is as least as small as the bounded formula. As well, the distributed and bounded formulas gave error rates that are at least as small as the previous methods. The amount of im- provement in the error rates depends significantly on the numbers of model and data features, and in particular the difference in error rates tends to increase as the num- ber of data features increases. For instance, consider m = 30 model points. For T = 250 data points, there is almost no difference in the error rates between the dis- tributed probability-ratio method and the method from [la] and [2], whereas for T = 1000 data points, there is about a factor of ten difference.

For T = 250 and no occlusion the error rates are the same for the bounded probability ratio method and the method from [a] and [la]. This is expected for small numbers of image features, in which case it is unlikely for image features to arise in the regions at random.

The improvement with our approach is most pro- nounced when compared to (optimal) [HU90], that is, using simple c discs for search regions. In fact, the difference in the error rate from the distributed ratio method can be several orders of magnitude. Note that for the non-optimal version the error rate gets worse as the number of model points increases. We have observed that this happens because the chance of a false negative increases to near one. The error rate for [14] improves as the number of model points increases when when the threshold is chosen optimally, with the greatest improve- ment for larger numbers of model features.

[l] also considers the more realistic situation in which the uncertainty regions are not truncated, but instead they are chosen such that they are very likely to con- tain the correct data feature (with a 95% probability). As expected, the error rates tend to be greater for the untruncated case, but only a little greater. In addition, [l] experiments with PT[H] < 1/2 for situations where the vast majority of hypotheses are expected to be incor- rect. It is demonstrated that the correct solution is to use the proper uncertainty region and to explicitly take into account the expected rate of incorrect hypotheses via the term P T [ H ] , as opposed to just using very small search regions as in [14]. Further, [l] experiments with using line segments for verification. The error rates for line segments tend to be smaller than for points, even for the distributed ratio method. Consequently, it appears to be more important t o use extended features such as line segments for verification than to take advantage of the particular distribution of uncertainty.

5 Conclusion We have considered how to make Alignment robust

to locational errors in image features. We gave a new scheme for rapidly evaluating Alignment hypotheses that is formally grounded in a model of error, and our exper- iments demonstrated a substantial improvement in per-

Page 6: [IEEE Comput. Soc IEEE Computer Society Conference on Computer Vision and Pattern Recognition - San Juan, Puerto Rico (17-19 June 1997)] Proceedings of IEEE Computer Society Conference

I No Occlusion Method I m I 1 r = 250 500 1000

Prob. Ratio 10 .00090 .04240 .15600 1 30 1) .OOOOO .OOOOO .01135 Bounded I 1 I I .27120 .33535 .38500

HU90 10 .38820 .33070 ,27015 30 .47730 .43255 .26610

Occlusion r = 250 500 1000 .21970 .29480 .35605 .02560 ,07665 .17780 .00035 .00595 .03990 27120 ,33900 .38850‘

.03825 .12170 .25520

.00125 .01925 .09255 27120 .39350 .47820 .04475 .19125 .36230 .00315 .07315 .26750 .27120 ,39350 .47820 .04475 ,32990 .49530 .00315 .35955 .49990 .36770 .37 ,16935 .22:!: .06320 .11665 .18235 .36770 .37560 .39810 .43620 .39120 .31850 .49755 ,48115 ,36450

Fable 2: Error rates for P r [ H ] = .5. The left data set is for the case of no occlusion, and the right data set is for when half of the model eatures are occluded. The data is for different numbers of model features m and data features r .

formance. Using the likelihood-ratio test, our decision criterion incorporates the available information about the shapes and sizes of the uncertainty regions, the total numbers of model and data features, and the numbers of data features that arise in the individual regions. Lastly, the method applies equally well to bounded or Gaussian error, or any other error distribution for which the prop- agated distribution of uncertainty can be computed. Acknowledgements

We thank David Jacobs for his insightful comments. Support for this work done at the MIT Artificial Intelligence Laboratory was provided in part by the Advanced Research Projects Agency of the Dept. of Defense under ONR contract N00014-91-5-4038. We also thank the NEC Research Institute for the use of its facilities.

References [l] Alter, T. D., “The Role of Saliency and Error Propagation in

[2] Alter, T. D., and W. E. L. Grimson, “Fast and Robust 3D Recognition by Alignment,” in Proc. Fourth Inter. Conf. Computer Vision, pp. 113-120, May 1993.

[3] Alter, T. D., and D. W. Jacobs, “Error Propagation in Full

[4] Baird, H., Model-Based Image Matching Using Location, MIT

[5] Breuel, T., “Model Based Recognition using Pruned Corre-

[6] Beveridge, R., R. Weiss, and E. Riseman, “Combinatorial Op- timization Applied to Variable Scale 2D Model Matching,” Comp. Vis. Pat. Rec., pp. 18-23, 1990.

[7] Cass, T., “Polynomial Time Object Recognition in the Pres- ence of Clutter, Occlusion and Uncertainty,” Second European

[8] Costa, M., R. M. Haralick, and L. G. Shapiro, “Optimal Affine-Invariant Point Matching,” 6th Israeli Conf. A I , 1990.

Visual Object Recognition,” MIT Ph. D. Thesis, 1995.

3D-from-2D Object Recognition,” CVPR, June 1994.

Press, Cambridge, 1985.

spondence Search,” Comp. Vis. Pat. Rec., 1991.

Conf. o n Computer V i s i o n , pp. 834-842, 1992.

[9] Fischler, M. A., and R. C. Bolles, “Random Sample Consen- sus: A Paradigm for Model Fitting with Applications to Anal- ysis and Automated Cartography,” Communications Assoc. of Compalzng Machinery, 24(6):381-395, 1981.

[lo] Grimson, W. E. L., and D. P. Huttenlocher, “On the Verifica- tion of Hypothesized Matches in Model-Based Recognition,” IEEE Trans. Pat. Anal. Mach. Intell., 13(12), Dec. 1991.

[ll] Grimson, W. E. L., D. P. Huttenlocher, and T. D. Alter, “Recognizing 3D Objects from 2D Images: An Error Analy- sis,” IEEE Conf. Comp. Vis. Pat. Rec., pp. 316-321, 1992.

[I21 Grimson, W. E. L., D. P. Huttenlocher, and D. W. Jacobs, “A Study of Affine Matching with Bounded Sensor Error,” in Second European Conf. Comp. Vis., pp. 291-306, May 1992.

[13] Hel-Or, Y., and M. Werman, “Absolute Orientation from Un- certain Point Data: A Unified Approach,” PTOC. IEEE Conj. on Comp. Vis. and Pat. Rec., pp. 77-82, 1992.

[14] Huttenlocher. D. P., and S. Ullman, “Recognizing Solid Ob- jects by Alignment with an Image,” Inter. J . Comp. Vis., 5 (2) :195-212, 1990.

[15] Jacobs, D. W., “Optimal Matching of Planar Models in 3D Scenes,” Comp. Vis. Pat. Rec., pp. 269-2i4, 1991.

[16] Kumar, R., and A. Hanson, “Robust Estimation of Camera Location and Orientation from Noisy Data having Outliers,” IEEE Workshop on Interp. of 3D Scenes, pp. 52-60, 1989.

[ l i ] Lamdan, Y., and H. J. Wolfson, “On the Error Analysis of ‘Geometric Hashing’,’’ Comp. Vis. Pat. Rec., pp. 22-27, 1991.

[18] Rigoutsos, I., and R. Hummel, “Robust Similarity Invariant Matching in the Presence of Noise,” Eighth Israeli Conf. on Artij. Intell. Computer Vision, Tel Aviv, 1991.

[19] Sarachik, K. B., and W. E. L. Grimson, “GaussianError Mod- els for Object Recognition,” Comp. Vis. Pat. Rec., 1993.

[20] Therrien, C. H., Decision, Estimation, and Classification, Wiley & Sons, 1989.

[21] Wells, W., “MAP Model Matching,” CVPR, 1991.

349