Panoramic stereo vision and depth-assisted edge detectionpeterp/publications/aris.pdfPanoramic...

Panoramic stereo vision and depth-assisted edge detection

Aris Economopoulos, Drakoulis Martakos Peter Peer, Franc SolinaNational and Kapodistrian University of Athens, Greece University of Ljubljana, Slovenia

{pathway, martakos}@di.uoa.gr {peter.peer, franc.solina}@fri.uni-lj.si

Abstract

This article deals with the fusion of two initially indepen-dent systems. The first system utilizes panoramic imagesand rotational geometry to create stereo image pairs andrecover depth information using a single camera. The sec-ond system utilizes dense depth estimates in order to drivea process, which improves edge detection by factoring inthe distance (scale) from the camera at each point. Dis-tance data is fed to a smoothing pre-processor, which splitsthe image into layers based on pixel distance and indepen-dently smoothes each one, utilizing a different sigma. Thisvariation of sigma across image depth ensures that noise iseliminated with minimal loss of overall detail. Significantimprovements to the results of subsequent edge detectionoperations are observed for images with great depth vari-ations. The methods utilized by both systems are analysedand their benefits and drawbacks are exposed. The processof integration is also detailed, outlining problems and solu-tions that were adopted. The applications and constraintsof the system are also discussed.

1. Introduction

This paper deals with two discrete subjects. Two sep-arate systems are presented, one dealing with panoramicstereo vision and the other with stereo-assisted edge detec-tion. Both systems were developed independently.

1.1. Motivation

Edge detection is one of the most broadly used opera-tions in the analysis of images. There has been a plethoraof algorithms presented within this context and several dif-ferent approaches to the subject have been well documented[4, 6, 10, 13, 21].

In the vast majority of cases, edge detection has beenconsidered to be an early step in the process of analysis.This has allowed researchers to effectively utilize edge data

to guide higher-level operations. In addition to its utilizationin other areas, edge detection has also been used in severalapplications of stereoscopy in order to improve stereo cor-relation algorithms and to aid in overall system robustness[9, 11, 21].

There is very little research however, that looks at the re-lationship of stereopsis and edge detection from a differentstandpoint. Specifically, we submit that in humans, whosevision system our discipline is after all trying to model, thepresence of robust and efficient stereoscopy may actually beimplicitly aiding the process of edge detection, rather thanthe other way around. Modern medicine and neuroscienceare as of yet unable to provide a precise blueprint of thealgorithms inherent in human visual perception. We haveprecious little information on how the brain processes visualstimuli, and the order in which operators are applied and thepossible interdependencies that they exhibit are not amongit. To explore this hypothesis, and to establish whether itcould hold true as well as whether it could be applied tocomputer vision problems, we decided to analyze the work-ings of an existing edge detection operator and enhance itso as to make use of depth data.

The assumption made here, of course, is that a robuststereo vision system that does not depend on edge detec-tion techniques is available. Such a system is presented in[23]. An alternative approach that is able to produce densedepth maps, and hence drive the aforementioned process, ispanoramic stereo imaging. A standard camera has a limitedfield of view, which is usually smaller than the human fieldof vision. Because of that fact, people have always tried togenerate images with a wider field of view, up to a full 360degrees panorama [17]. Points and lines should be visibleon all scene images. This is a property of panoramic cam-eras and represents our fundamental motivation for the gen-eration of depth images using this technique. If we wouldtry to build two panoramic images simultaneously by usingtwo standard cameras, we would have problems since thescene would not be static.

In this article, we attempt to show how knowledge of ascene’s stereo geometry and corresponding disparities canbe used to assist in the task of edge detection. To test our

theories, we chose to work with the Canny algorithm foredge detection [2]; an operator that has received much at-tention, and rightly so, for it far outperforms most of itscompetitors. However, the Canny edge detector, like manyothers, has a serious drawback, if not limitation. To achieveusable results, the process of edge detection must be pre-ceded by the application of a Gaussian smoothing filter, thestandard deviation σ of which severely affects the quality ofthe output image.

More specifically, it is known that the process of elimi-nating noise without also eliminating crucial image detail isa delicate one. The more an image is smoothed, the more itsfiner details blend into the background and fail to be pickedup by the edge detector. Thus, a balance must be achievedhere; traditionally, this balance has been achieved by vary-ing σ until the results of the operation are satisfactory.

1.2. Structure of the article

In the next section, we present our mosaic-basedpanoramic stereo system. The geometry of the system, theepipolar geometry involved, and the procedure of stereo re-construction are explained in detail. Furthermore, we de-scribe the contribution of our work, and offer an overviewof related work. The focus of the section is on analyzing thesystem’s capabilities.

Section 3 gives a detailed description of our methodfor depth-assisted edge detection via layered scale-basedsmoothing. This method essentially improves the resultsof an existing edge detection operator by incorporating intothe process the depth data offered by the stereo system pre-sented in the previous section.

Section 4 gives experimental result for both systems anddescribes their integration.

The final section of the article is used to summarize themain conclusions.

2. Panoramic stereo system

2.1. System hardware

In Figure 1 you can see the hardware part of our system:a rotational arm is rotating around the vertical axis; a poleis fixed on the rotational arm, enabling the offset of the op-tical center of the camera from the system’s rotational cen-ter; on it, we have fixed one standard color camera lookingoutwards from the system’s rotational center. Panoramicimages are generated by moving the rotational arm by anangle corresponding to one column of the captured image.

2.2. Related work

One of the best known commercial packages for creat-ing mosaiced panoramic images is QTVR (QuickTime Vir-tual Reality). It works on the principle of sewing togethera number of standard images captured while rotating thecamera [3]. Peleg et al. [16] introduced the method for cre-ation of mosaiced panoramic images from standard imagescaptured with a handheld video camera. A similar methodwas suggested by Szeliski and Shum [20] which also do notstrictly constrain the camera path but assume that there is nogreat motion parallax effect present. All the methods men-tioned so far are used only for visualization purposes sincethe authors did not try to reconstruct the scene.

Ishiguro et al. [9] suggested a method which enables thereconstruction of the scene. They used a standard camerarotating on a circle. The scene is reconstructed by means ofmosaicing together panoramic images from the central col-umn of the captured images and moving the system to an-other location where the task of mosaicing is repeated. Twocreated panoramas are then used as input in stereo recon-struction procedure. The depth of an object was first esti-mated using projections in two images captured on differentlocations of the camera on the camera’s path. But since theirprimary goal was to create a global map of the room, theypreferred to move the system attached to the robot about theroom.

Peleg and Ben-Ezra [14, 15] introduced a method for thecreation of stereo panoramic images. Stereo panoramas arecreated without actually computing the 3D structure — thedepth effect is created in the viewer’s brain.

In [19], Shum and Szeliski described two methods usedfor creation of panoramic depth images, which are usingstandard procedures for stereo reconstruction. Both meth-ods are based on moving the camera on a circular path.Panoramas are built by taking one column out of the cap-tured image and mosaicing the columns. They call suchpanoramas multiperspective panoramas. The crucial prop-erty of two or more multiperspective panoramas is thatthey capture the information about the motion parallax ef-fect, while the columns forming the panoramas are capturedfrom different perspectives. The authors are using suchpanoramas as the input for the stereo reconstruction pro-cedure.

However, multiperspective panoramas are not somethingentirely unknown to the vision community [19]. Theyare a special case of multiperspective panoramas for cellanimation [22], and are very similar to images gener-ated with by the following procedures: multiple-center-of-projection [18], manifold projection [16] and circular pro-jection [14, 15]. The principle of constructing the multiper-spective panoramas is also very similar to the linear pushb-room camera principle for creating panoramas [7].

In articles closest to our work [9, 19], we missed twothings: system capabilities analysis and searching for cor-responding points using the standard correlation technique.This is why in this article we focus on these two issues.While in [9], the authors searched for corresponding pointsby tracking the feature from the column building the firstpanorama to the column building the second panorama, theauthors in [19] used an upgraded plain sweep stereo proce-dure.

2.3. System geometry

The geometry of our system for creating multiperspec-tive panoramic images is shown in Figure 1. When created,they are used as an input to create panoramic depth images.PointC denotes the system’s rotational center around whichthe camera is rotated. The offset of the camera’s optical cen-ter from the rotational center C is denoted as r, describingthe radius of the circular path of the camera. The camerais looking outwards from the rotational center. The opticalcenter of the camera is marked withO. The selected columnof pixels that will be sewn in the panoramic image containsthe projection of point P on the scene. The distance frompoint P to point C is the depth l and the distance from pointP to point O is denoted by d. θ defines the angle betweenthe line defined by point C and point O and the line definedby point C and point P . In the panoramic image, θ givesthe horizontal axis describing the path of the camera. Byϕ we denote the angle between the line defined by point Oand the middle column of pixels of the captured image andthe line defined by point O and selected column of pixelsthat will be mosaiced in the panoramic image. Angle ϕ canbe thought of as a reduction of the camera’s horizontal viewangle α.

Figure 1. System hardware and geometry forconstruction of a multiperspective panorama.

2.4. Epipolar geometry

It can be shown that the epipolar geometry is very simpleif we are doing the reconstruction based on a symmetric pairof stereo panoramic images. We get a symmetric pair ofstereo panoramic images when we take symmetric columns

on the left and on the right side from the captured imagecenter column.

Since we are limited with the length of the article wewill end this section with the following statement: epipolarlines of the symmetrical pair of panoramic images are imagerows. This statement is true for our system geometry. Forproof see [8].

2.5. Stereo reconstruction

Let us go back to Figure 1. Using trigonometric relationsevident from the sketch we can write the equation for depthestimation l of point P on the scene. Using the basic lawof sines for triangles, we can express the equation for depthestimation l as:

l =r · sin(180o − ϕ)

sin(ϕ− θ) =r · sinϕ

sin(ϕ− θ) . (1)

From eq. (1) follows that we can estimate the depth lonly if we know three parameters: r, ϕ and θ. r is given.Angle ϕ can be calculated with regard to the camera’s hori-zontal view angle α as:

2ϕ =α

W·W2ϕ, (2)

where W is the width of the captured image in pixels andW2ϕ is the width of the captured image between columnsforming the symmetrical pair of panoramic images, givenalso in pixels. To calculate the angle θ we have to find cor-responding points on panoramic images. Our system worksby moving the camera for the angle corresponding to onecolumn of captured image. If we denote this angle with θ0,we can write angle θ as:

θ = dx · θ02, (3)

where dx is the absolute value of difference between corre-sponding points image coordinates on horizontal axis x ofthe panoramic images.

We are using a procedure called “normalized correla-tion” to search for corresponding points. To increase theconfidence in estimated depth we are using a procedurecalled “back-correlation” [6]. With the back-correlation weare also solving the problem of occlusions.

2.6. System capabilities analysis

2.6.1. Constraining the search space on the epipolar line

Knowing that the width of the panoramic image is muchbigger than the width of the captured image, we would haveto search for corresponding points along a very long epipo-lar line. That is why we would really like to constrain thesearch space on the epipolar line as much as possible. A side

effect is also an increased confidence in estimated depth anda faster execution of the stereo reconstruction procedure.

If we derive from eq. (1), we can ascertain two thingswhich nicely constrain the search space: (1.) If θ0 presentsthe angle for which the camera is shifted, then 2θmin = θ0.This means that we have to make at least one basic shiftof the camera to get a scene point projected in a right col-umn of the captured image forming the left eye panorama,to be seen in the left column of the captured image formingthe right eye panorama. (2.) Theoretically, the estimationof depth is not constrained upwards, but from eq. (1), itis evident that the denominator must be non-zero. We canwrite this fact as: θmax = n · θ02 , where n = ϕ div θ0

2 andϕ mod θ0

2 �= 0. In the following sections we will show thatwe can not trust the depth estimates near the last point ofthe epipolar line search space, but we have proven that wecan effectively constrain the search space.

To illustrate the use of the specified constraints on realdata, let us write the following example which describesthe working process of our system: while the width of thepanorama is 1501 pixels, we have to check only n = 149pixels in a case of 2ϕ = 29.9625o and only n = 18 in thecase of 2ϕ = 3.6125o, when searching for a correspondingpoint.

From the last paragraph we could conclude that thestereo reconstruction procedure is much faster for a smallerangle ϕ. But we will show in the next section that a smallerangle ϕ does, unfortunately, also have a negative property.

2.6.2. The meaning of the error for a pixel in the estima-tion of angle θ

a) 2ϕ = 29.9625o b) 2ϕ = 3.6125o

Figure 2. Graphs showing dependence ofdepth function l from angle θ while radiusr = 30 cm and using different values of an-gle ϕ. To ease the comparison of the errorfor a pixel in estimation of angle θ we showedthe interval of width θ0

2 = 0.1o between thevertical lines around the third point.

Before we illustrate the meaning of the error for a pixelin the estimation of angle θ, let us take a look at the graphsin Figure 2. These graphs are showing a dependence of the

depth function l from angle θ while using different values ofangle ϕ. From the graphs it is evident that the depth func-tion l is rising slower in the case of a bigger angle ϕ. Thisproperty decreases the error in depth estimation l when us-ing a bigger angle ϕ, but this decrease in the error becomeseven more evident if we know that the horizontal axis isdiscrete and the intervals on the axis are θ0

2 degrees wide(see Figure 2). If we compare the width of the interval onboth graphs with respect to the width of the interval that θ isdefined on (θ ∈ [0, ϕ]), we can see that the interval whosewidth is θ0

2 degrees, is much smaller when using a biggerangle ϕ. This subsequently means that the error for a pixelin the estimation of angle θ is much smaller when using abigger angle ϕ, since a shift for angle θ0 describes the shiftof the camera for one column of pixels.

Because of discrete horizontal axis θ (Figure 2) with in-tervals, which are θ0

2 degrees wide (in our case θ0 = 0.2o),the number of possible depth estimation values is propor-tional to angle ϕ: we can calculate (ϕ div θ0

2 =)149 differ-ent depth values if we are using the angle 2ϕ = 29.9625o

and only 18 different depth values if we are using the angle2ϕ = 3.6125o. And this is the negative property of using asmaller angle ϕ.

From the graphs shown on Figure 2 we can conclude thatthe error is much bigger in case of a smaller angle ϕ than inthe case of a bigger angle ϕ. The second conclusion is thatthe value of the error is getting bigger when the value of theangle θ is getting closer to the value of the angle ϕ. Thisis true regardless of the value of the angle ϕ. The speed ofthe reconstruction process is inversely proportional to theaccuracy of the process.

By varying the parameters θ0 and r we are changing thesize of the error.

2.6.3. Definition of maximal depth in which we trust

In section 2.6.1 we defined the minimal possible depth es-timation lmin and maximal possible depth estimation lmax,but we did not write anything about the meaning of the errorfor a pixel in estimation of angle θ for these two estimateddepths. Let us examine the size of the error ∆l for thesetwo estimated depths: we calculate ∆lmin as the absolutevalue of difference between the depth lmin and the depth lfor which the angle θ is bigger than angle θmin for angle θ0

2 :

∆lmin = |lmin(θmin)− l(θmin +θ02

)| = |lmin(θ02

)− l(θ0)|.

Similarly, we calculate the error ∆lmax as the absolute valueof difference between the depth lmax and the depth l forwhich the angle θ is smaller than angle θmax for angle θ0

2 :

∆lmax = |lmax(θmax)− l(θmax − θ02 )| =

= |lmax(n θ02 )− l((n− 1) θ02 )|,

where by the variable n we denote a positive number fol-lowing from the equation: n = ϕ div θ0

2 .

2ϕ = 29.9625o 2ϕ = 3.6125o

∆lmin 2 mm 19 mm∆lmax 30172 mm 81587 mm

Table 1. The meaning of the error (∆l) for onepixel in the estimation of angle θ for lmin andlmax regarding the angle ϕ.

In Table 1 we gathered the error sizes for different val-ues of angle ϕ. The results confirm statements in Section2.6.2. We can also add two additional conclusions: (1.) Thevalue of the error ∆lmax is unacceptably high and this istrue regardless of the value of angle ϕ. This is why we haveto sensibly decrease the maximal possible depth estimationlmax. This conclusion in practice leads us to define the up-per boundary of allowed error size (∆l) for one pixel in theestimation of angle θ and with it, we subsequently definethe maximal depth in which we trust. (2.) Angle ϕ alwaysdepends upon the horizontal view angle α of the camera(equation (2)). And while the angle α is limited to around40o considering standard cameras, our system is limited toan angle α when estimating the depth, since in the best casewe have: ϕmax = α

2 . Thus our system can really be usedonly for the reconstruction of small rooms.

3. Depth-assisted edge detection

3.1. Scale, size and loss of detail

Theoretically, it could be argued that the application ofa two-dimensional filter like the Gaussian to an image rep-resenting a three-dimensional scene heralds implicit prob-lems. When a three-dimensional scene is captured using acamera, it is logical to assume that the available detail thatthe resulting image can yield at any point depends partiallyon that point’s original distance from the lens. The fartherthe feature, the smaller its representation in the image willbe (scale). This difference in scale will make it less likelythat a distant feature we are interested in will retain all of itsdetail after being sampled and quantized.

One can then see that the application of a uniform Gaus-sian filter with a constant standard deviation σ will causea disproportionate loss of detail across the image, essen-tially resulting in low operator performance in the areas ofthe picture farthest from the lens. This is more evident inoutdoor and synthetic scenes, where depth can vary greatlyfrom point to point. This loss of quality may be counteredif the filter’s scale can be varied according to its point ofapplication in the image.

An intuitive solution here, then, would be to establish arelation between the filter’s scale (controlled by σ), and thereal-world size of the object being smoothed. In order toachieve this kind of adjustment at the level of processingthat precedes edge detection, one would need to somehowbe aware of the relative size of objects at every part of theimage. It is known, however, that there is a direct relation-ship between the size (in pixels) of the representation of anobject in an image and that object’s original, real-world dis-tance to the camera’s lens.

Assuming, therefore, the availability of a robust stereosystem, we can compute the distance at any point in the im-age, thus establishing a variable that will allow us to adjustthe Gaussian’s σ accordingly.

3.2. Algorithm overview

The algorithm that deals with the problems describedabove utilizes pre-computed pixel distance information inorder to segment the image into several different layers. Inthis capacity, the algorithm can be considered to be depth-adaptive. These layers are then smoothed independentlyand are recombined before being fed to an edge-detector.The entire process consists of: (1.) Layering — The im-age is segmented into discrete layers based on pixel dis-tance data and a user-supplied parameter that denotes thedesired number of layers to be produced. (2.) PiecewiseSmoothing — The resulting sub-images are smoothed in-dependently, utilizing a filter scale appropriate to the rele-vant depth of the layer in the image. (3.) Compositing —The smoothed sub-images are combined into one imageagain. That image will be the input for the edge-detector.(4.) Edge-detection — The composited image is fed into anedge-detector.

3.3. Layering

The first step in the realization of this algorithm is tosplit a given image into multiple layers. Each layer corre-sponds to a certain range of distances and consists only ofimage pixels with relevant distance characteristics. The pix-els that make up each layer are subsequently supplementedby artificially background-valued pixels in order to form asub-image that is ready for smoothing. The algorithm doesnot currently impose any continuity constraints, and it is notuncommon for a layer to contain non-neighboring pixels.

Note that most trial runs were performed using densedepth maps. To accommodate different measurementschemes, the algorithm detects the minimum and maximumdepth values and normalizes all pixel distances to a relativepercentage. The pixel(s) with the smallest distance to thecamera are given a value of 0 percent, while those farthestfrom the camera are given a value of 100 percent.

A user-supplied parameter controls the number of layersthat will be produced. Steps are being taken to optimize andautomate this process, but meanwhile, tests have shown thatfive layers are often a good choice in synthetic scenes withnoticeable depth differences. The process of splitting theoriginal image into various layers is illustrated in Figure 3.

Figure 3. Depiction of layers in a 3D scene.

In the case depicted above, the user has asked for fivelayers to be produced. The first layer (closest to the camera)will include all pixels with normalized distances of 0 to 20percent. The second layer will include pixels with distancesof 20 to 40 percent, the third 40 to 60 percent, the fourth 60to 80 percent, and the last layer (farthest from the camera),will include pixels with normalized distances of 80 to 100percent. Note that these percentages are irrelevant to thesize of a layer. The maximum and minimum distances thatdefine a layer are termed dmax and dmin respectively.

Thus, given an image I of height h and width w, thealgorithm will produce a set of sub-images

Is = {I1, I2, . . . , In},where n is the number of layers as specified by the user-supplied parameter. Each sub-image will retain a height ofh and a width of w and will consist of a set of pixels Pnsuch that

∀p ∈ Pn{Vp = VI , dmax > Dp > dmin

Vp = B, otherwise

where p is any pixel in the sub-image, Vp is the value ofthat pixel in the sub-image, VI is the value of the same pixelin the original image, Dp is the normalized distance of thepixel in the original image, and B is the background grayvalue. Hence, a sub-image In consists of a number of pixelswhose gray values are inherited by the original image, andof a number of “artificial” pixels, which are used in orderto make the process of smoothing easier and to avoid thecreation of false edges through intensity variations.

3.4. Piecewise smoothing

At this point, the algorithm is ready to apply a Gaussianfilter to each of the sub-images that were created during theprevious step. Each sub-image should be smoothed using

a different σ. In addition, σ should be greatest in the firstlayer (closest to the camera) and smallest in the last layer(farthest from the camera). Currently, the values of σ foreach layer are chosen manually. The algorithm suggestsdefault values after σ has been specified for the first layer,but user intervention is still occasionally required in orderto strike a good balance between the elimination of noiseand the preservation of detail.

This process should ensure that all sub-images are con-volved using a scale appropriate to the pixel-wise size oftheir contents.

3.5. Compositing

Once the smoothing procedure is finished, the final stepis to re-combine the various sub-images into one solid im-age once more. This resultant image will be the input to theedge-detector. It will essentially be a composite of all thesub-images produced earlier by the layering procedure.

The composition process will result in a final image IC ,in which all original pixels will be substituted by the pixelsof the various sub-images, according to the following algo-rithm: (1.) Create a new empty image IC , with height h andwidth w. (2.) From the set of pre-smoothed sub-images,choose the one with the greatest maximum pixel distancedmax (i.e. the layer farthest from the camera). (3.) Sub-stitute all pixels of Ic with all non-artificial pixels of thesub-image. (4.) Discard the current working sub-image andrepeat steps 1 to 4 until no more sub-images remain.

The above process will ensure that Ic is made upof a number of regions, which have been independentlysmoothed. The filter scale used in the smoothing of each re-gion will be one that is appropriate to the pixel-wise size ofits contents. The visible changes in the amount of smooth-ing from region to region often vary along the depth discon-tinuities in the image.

4. Integration and experimental results

4.1. Integration

Although the two systems were developed indepen-dently, the integration process was not especially problem-atic. Essentially, it consisted merely of using real-worlddata (captured images and depth maps) generated by thepanoramic stereo system to drive the depth-assisted edgedetection process.

There were two main problems enountered. The most se-rious of these problems was the limitation of the panoramicstereo system regarding the distances that it can accuratelymeasure. As mentioned above, the depth-assisted edge de-tection technique is especially useful in images with largedepth variations. The images and associated depth maps

a) b) c) d) e)

Figure 4. Some results of stereo reconstruc-tion when creating the depth image for theleft eye while angle 2ϕ = 29.9625o: a) left eyepanorama, b) dense depth image / using back-correlation / reconstruction time: 6:42:20(h:m:s), c) the information about the confi-dence in estimated depth, d) dense depthimage after the weighting / not using back-correlation / reconstruction time: 3:21:56,e) sparse depth image / not using back-correlation / reconstruction time: 0:0:38.

produced by the panoramic stereo system do not exhibitgreat depth variations. This, of course, somewhat limits theobvious benefits of the smoothing pre-processor.

The second problem that was encountered was the non-continuity of the depth map. Due to both noise and thelayout of the room, the depth distribution in the test datawas extremely hectic and discontinuous. This was in strongcontrast to the synthetic scenes, where the depth essentiallyvaried smoothly. This problem was expected, of course, andalthough the enhancement to the edge detection process wasvisible, it showcased the need for additional refinement.

Experimental results for both systems are presented be-low, along with commentary.

4.2. Generation of real world depth images

1

2

34

5

6

7 8

9 10

11

12

13

Figure 5. On top is a ground-plan showing theresults of the reconstruction process basedon the 68th row of the depth image. Weused back-correlation and weighting for an-gle 2ϕ = 29.9625o. The corresponding depthimage is shown in the middle picture. For ori-entation, the reconstructed row and the fea-tures on the scene for which we measuredthe actual depth by hand are shown on thebottom picture. The features on the scenemarked with big dots and associated num-bers are not necessarily visible in this row.

Figure 4 shows some results of our system. In the casedenoted by b), we constructed a dense panoramic image,which means that we tried to find a corresponding pointin the right eye panorama for every point in the left eyepanorama. Black color marks the points on the scene withno associated depth estimation. Otherwise, the nearer thepoint on the scene is to the rotational center of the system,the lighter the point appears in the depth image.

In the case denoted by d), we used the information aboutthe confidence in estimated depth (case c)), which we getfrom normalized correlation estimations. In this way, weeliminate from the dense depth image all the associateddepth estimates which do not have a high enough associ-ated confidence estimation.

In the case marked by e), we created a sparse depth im-age by searching only for the corresponding points of fea-tures on input panoramas. The features can be presented asvertical edges on the scene, which we can derive very fastif we filter the panoramas with the Sobel filter for searchingthe vertical edges [6, 9]. If we would use a smaller valuefor angle ϕ, the reconstruction times would be up to eighttimes smaller from the presented ones.

Since it is hard to evaluate the quality of generated depthimages given in Figure 4, we will present the reconstructionof the room from generated depth image. Then we will beable to evaluate the quality of generated depth image andconsequently the quality of the system.

The result of the (3D) reconstruction process is a ground-plan of the scene. The following properties can be observedon Figure 5: (1.) Big dots denote features on the scenefor which we measured the actual depth by hand. (2.) Bigdot near the center of the reconstruction shows the center ofour system. (3.) Small black dots are reconstructed pointson the scene. (4.) Lines between black dots denote linksbetween two successively reconstructed points.

With respect to the presented reconstruction times wecould conclude that the reconstruction procedure couldwork in nearly real time, if we would work with 8-bitgrayscale images, with lower resolution and/or if we wouldcreate the sparse depth image of only part of the scene. Thiscould be used for robot navigation [9].

4.3. Edge detection on a synthetic image

The algorithm for depth-assisted edge detection was im-plemented using the architecture described and presented in[5]. A set of sample images is presented below, document-ing a trial run of the algorithm on a synthetic image. Theentire algorithm has been coded using components. Pixeldistances for the synthetic scene presented in this sectionwere calculated using a custom 3D engine. All pictures andvisualizations were produced in bitmap format using Win-dows GDI functions.

a) b)

Figure 6. a) the synthetic 3D scene used forthe tests, b) depth map visualization.

The synthetic scene shown in Figure 6a) was created toillustrate how the algorithm presented in this section could

be applied in order to preserve necessary detail. There aretwo “Some Edges” labels in this picture. One is very closeto the camera, while the other is further away from it. Theselabels represent data that we are interested in. On the otherhand, four “Unwanted Edges” labels figure in the image aswell. These labels represent noise and unwanted edge data.

The goal of the pre-processor here would be to preservethe details we are interested in (namely the “Some Edges”labels), while attempting to get rid of any noise and un-wanted information in the image (namely the “UnwantedEdges” labels).

Figure 7. From left to right and top to bot-tom — Canny with σ=1, σ=2, σ=3 and σ=4.

The standard implementation of the Canny edge-detectordoes not fare too well in this task. If the σ is too small, theunwanted information persists. As soon as the σ starts get-ting large enough to partly or fully eliminate the unwantedinformation, the edge-detector fails to detect relevant detail,which the smoothing filter has obscured (Figure 7).

Our algorithm, on the other hand, begins by reading az-buffer dump of the synthetic scene. The values read rep-resent the distance of each pixel from the camera. Afternormalization, the system produces a visualization of thedepth map for convenience (Figure 6b)). 256 gray valuesare used. Pixels closest to the camera are dark, while thosethat are further away are light. Notice the barely visiblepolyhedron in the middle of the far white wall.

The parameter supplied to the layering part of the algo-rithm is 5. The layers produced are transformed into sub-images and smoothed independently. The composition ofthese sub-images into a single image follows, and generatesthe image presented in Figure 8a).

Notice how the blurring effect of the filter varies accord-ing to the depth of the picture. The parts of the picture thatare closest to the camera are blurred the most, but the im-portant details are not affected due to the large scale. Onthe other hand, the points that are far away from the cam-

a) b)

Figure 8. a) composited image, b) edge imageafter scale-based smoothing.

era are smoothed less. Detail is preserved. Depending onthe distribution of depth in the image, one could achievea near-perfect balance between noise reduction and detailpreservation at every level of depth in the image.

The above is then fed into the edge-detecting module,which produces the edge image presented on Figure 8b).

While the unwanted detail has not been eradicated com-pletely, the detail has been preserved properly where thedepth is greatest. Notice that the “Some Edges” labels havebeen changed only slightly. Notice also how the squares in-side the pyramids on the left- and right-hand side are well-defined. The polyhedron and the cubes that lie against thefar wall also appear quite clearly. Contrast this with the im-ages the Canny produced above with a σ of 3 and 4. Shouldwe wish to do so, we could use a disproportionately largeσ in the first layer in order to completely wipe out the “Un-wanted Edges” labels. In addition, note that the labels wereonly used to illustrate a point. In practice, noise would neverhave the kind of coherency that these letters retain.

4.4. Edge detection on a real world image

Using the same methods outlined above, we applied theenhanced Canny edge detector to real data produced by thepanoramic stereo system (Figure 9).

On Figure 10 you can see the depth map that generatedby the system, based on the distance data that was providedby the stereo system.

Running an unmodified version of Canny on the image,we can see in the Figure 11 that the edge detector performspretty well. There are essentially two main problems withthe image. The first problem is that the pattern appearingon the monitor on the left-hand side has created some edgesthat we are probably not interested in. The second problemis that the edge detector has picked up some false edgesaround what seems to be bright spots. Again, as we vary σ,we see that these imperfections can be culled, but only atthe cost of detail in the rest of the image.

As mentioned above, the depth variations in the imagewere both small and hectic. Thus, it was decided thatonly two layers would be used, and that the closest layer

Figure 9. Real data.

Figure 10. Depth map visualization.

Figure 11. From top to bottom — Canny withσ=1, σ=2, and σ=3.

Figure 12. Edge image after scale-basedsmoothing.

would be heavily smoothed. Figure 12 shows the result ofthe Canny edge detector after the image was piped throughthe smoothing pre-processor. You can see that most of theaforementioned imperfections are gone, while image detailhas been preserved.

5. Conclusions

In this article, we presented an analysis of the systemfor the construction of depth panoramic images using onlyone standard camera. We proved the following: the pro-cedure for creating panoramic images is very long and cannot be executed in real time under any circumstances (us-ing only one camera); epipolar lines of symmetrical pairof panoramic images are image rows; based on the equa-tion for estimation of depth l, we can constraint the searchspace on the epipolar line; confidence in estimated depth ischanging: bigger the slope of the function l curve, smallerthe confidence in estimated depth; if we observe only thereconstruction time, we can conclude that the creation ofdense panoramic images is very expensive. We can writethe essential conclusion as: such systems can be used for3D reconstruction of small rooms.

A further time reduction in panorama building can beachieved: Instead of building the panorama from only onecolumn of the captured image, we could build the panoramafrom the wider stripe of the captured image, [17]. Thus,

we would increase the speed of the building process. If wewould use this idea in our system, we know that within thestripe the angle ϕ is changing. However, the question is howthis influences the reconstruction procedure?

The hypothesis that depth information provided by stere-opsis (or other mechanisms) may be utilized in order to im-prove the efficiency of edge detection operators was alsopresented. In support of this hypothesis, a non-optimal al-gorithm that alters the pre-processing stage of the Cannyedge detector in order to take advantage of pixel distancedata was detailed. Through the distance-based segmenta-tion of the input image into different layers, the algorithmthat was presented was able to apply custom-fitting smooth-ing filters to each layer. This allowed the algorithm to elim-inate noise, from the image, while at the same time keepingthe available overall detail to acceptable constant levels overthe entire image. While the algorithm has performed verywell on synthetic images with large and continuous depthvariations, it still needs a lot of work and refinement forproper operation on real-world images.

While the integration of this algorithm with the afore-mentioned panoramic stereo system was relatively straight-forward, functional limitations specific to the panoramic ap-proach limit this algorithm’s usefulness in that context.

The optimization, improvement and the possible re-design of this algorithm are currently subjects of furtherresearch. It is hoped that this research will contribute an al-ternative approach to edge detection — and possibly otherlow-level image operations — by exposing how stereo datamay be used to assist such endeavors.

The next step in pursuing this path of research is to opti-mize the layering process so that human intervention is notrequired. Statistical depth distributions are being consid-ered for this purpose. In addition, a lot of work can be per-formed in automatically determining the proper σ for eachlayer.

References

[1] I. E. Abdou and W. K. Pratt. Quantitative design and eval-uation of enhancement/thresholding edge detectors. In Pro-ceedings of the IEEE, volume 67, pages 753–763, 1979.

[2] J. Canny. A computational approach to edge detection. IEEETransactions on Pattern Analysis and Machine Intelligence,8(6):679–698, 1986.

[3] S. Chen. Quicktime VR — an image-based approach to vir-tual environment navigation. In ACM SIGGRAPH, pages29–38, Los Angeles, USA, 1995.

[4] E. R. Davies. Machine Vision — Theory, Algorithms, Prac-ticalities. Academic Press, USA, 1997.

[5] A. Economopoulos and D. Martakos. Component-based ar-chitectures for computer vision. In Proceedings of WSCG:International Conference in Central Europe on ComputerGraphics, Visualization and Computer Vision, number 2,pages 117–125, 2001.

[6] O. Faugeras. Three-Dimensional Computer Vision: A Ge-ometric Viewpoint. MIT Press, Cambridge, Massachusetts,London, England, 1993.

[7] R. Gupta and R. I. Hartley. Linear pushbroom cameras.IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 19(9):963–975, September 1997.

[8] F. Huang and T. Pajdla. Epipolar geometry in concentricpanoramas. Technical Report CTU-CMP-2000-07, Centerfor Machine Perception, Czech Technical University, Par-gue, Czech Republic, March 2000.

[9] H. Ishiguro, M. Yamamoto, and S. Tsuji. Omni-directionalstereo. IEEE Transactions on Pattern Analysis and MachineIntelligence, 14(2):257–262, February 1992.

[10] R. Jain, R. Kasturi, and B. G. Schunck. Machine Vision.McGraw-Hill, Singapore, 1995.

[11] R. Klette, K. Schluens, and A. Koschan. Computer Vision —Three-dimensional Data from Images. Springer, Singapore,1998.

[12] T. Lindeberg. Edge detection and ridge detection with au-tomatic scale selection. International Journal of ComputerVision, 30(2):117–154, 1998.

[13] J. R. Parker. Algorithms for Image Processing and ComputerVision. Wiley Computer Publishing, USA, 1997.

[14] S. Peleg and M. Ben-Ezra. Stereo panorama with a singlecamera. In IEEE Conference on Computer Vision and Pat-tern Recognition, pages 395–401, Fort Collins, USA, 1999.

[15] S. Peleg, Y. Pritch, and M. Ben-Ezra. Cameras for stereopanoramic imaging. In IEEE Conference on Computer Vi-sion and Pattern Recognition, pages 208–214, Hilton HeadIsland, USA, 2000.

[16] S. Peleg, B. Rousso, A. Rav-Acha, and A. Zomet. Mosaicingon adaptive manifolds. IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, 22(10):1144–1154, October2000.

[17] B. Prihavec and F. Solina. User interface for video obser-vation over the internet. Journal of Network and ComputerApplications, 21:219–237, 1998.

[18] P. Rademacher and G. Bishop. Multiple-center-of-projectionimages. In Computer Graphics (ACM SIGGRAPH), pages199–206, Orlando, USA, 1998.

[19] H. Y. Shum and R. Szeliski. Stereo reconstruction from mul-tiperspective panoramas. In IEEE 7th International Confer-ence on Computer Vision, pages 14–21, Kerkyra, Greece,1999.

[20] R. Szeliski and H. Y. Shum. Creating full view panoramicimage mosaics and texture-mapped models. In ComputerGraphics (ACM SIGGRAPH), pages 251–258, Los Angeles,USA, 1997.

[21] E. Trucco and A. Verri. Introductory Techniques for 3-DComputer Vision. Prentice-Hall, USA, 1998.

[22] D. Wood, A. Finkelstein, J. Hughes, C. Thayer, andD. Salesin. Multiperspective panoramas for cel animation.In Computer Graphics (ACM SIGGRAPH), pages 243–250,Los Angeles, USA, 1997.

[23] C. L. Zitnick and T. Kanade. A cooperative algorithm forstereo matching and occlusion detection. IEEE Transactionson Pattern Analysis and Machine Intelligence, 22(7), 2000.

Panoramic stereo vision and depth-assisted edge detectionpeterp/publications/aris.pdfPanoramic...

Documents

Transcript of Panoramic stereo vision and depth-assisted edge detectionpeterp/publications/aris.pdfPanoramic...