DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are...

5
DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES UNDER VARYING ILLUMINATION Kouki Takechi Takahiro Okabe Department of Artificial Intelligence Kyushu Institute of Technology ABSTRACT Separating diffuse and specular reflection components is im- portant for preprocessing of various computer vision tech- niques such as photometric stereo. In this paper, we address diffuse-specular separation for photometric stereo based on light fields. Specifically, we reveal the low-rank structure of the multi-view images under varying light source directions, and then formulate the diffuse-specular separation as a low- rank approximation of the 3rd order tensor. Through a num- ber of experiments using real images, we show that our pro- posed method, which integrates the complement clues based on varying light source directions and varying viewing direc- tions, works better than existing techniques. Index Termsphotometric stereo, reflectance separa- tion, light field, higher-order SVD 1. INTRODUCTION Photometric stereo is a technique for estimating surface nor- mals from the shading observed on an object surface under varying illumination. In contrast to multi-view stereo for es- timating the depths of possibly sparse corresponding points, photometric stereo can estimate pixel-wise surface normal, i.e. densely recover the shape of an object. In general, the reflected light observed on an object surface consists of a diffuse reflection component and a specular reflection component. The conventional photomet- ric stereo [1] assumes the Lambert model and known light sources, and then specular reflection components are outliers to be removed. On the other hand, in the context of uncal- ibrated photometric stereo [2] assuming the Lambert model and unknown light sources, specular reflection components are an important clue for uniquely recovering the shape of an object [4, 5] by resolving the Generalized Bas-Relief (GBR) ambiguity [3]. Therefore, separating diffuse and specular reflection components of the images of an object taken under varying light source directions is an important topic to be addressed for photometric stereo. In this paper, we address diffuse-specular separation for photometric stereo based on light fields. Since light field cam- eras [6] can capture multi-view images in a single shot, we study diffuse-specular separation of multi-view images under varying illumination, and propose a robust method by inte- grating a clue based on varying light source directions and a clue based on varying viewing directions. Specifically, we represent the multi-view images of an ob- ject under varying light source directions as the 3rd order ten- sor whose axes are the position on the object surface, the light source direction, and the viewing direction. Then, we reveal the low-rank structure of the tensor on the basis of the facts that the brightness of a diffuse reflection component is inde- pendent of viewing directions and that it is given by the inner product between a light source direction and a normal, i.e. bi- linear with respect to them. Based on the above theoretical insight, our proposed method separates diffuse and specular reflection components via low-rank tensor approximation us- ing higher-order Singular Value Decomposition (SVD) [7]. In addition, SVD with missing data [8] is incorporated into our method in order to take outliers such as specular reflection components and shadows into consideration. To demonstrate the effectiveness of our proposed method, we conducted a number of experiments using real images. We show that our method, which integrates the complement clues based on varying light source directions and varying viewing directions, performs better than existing techniques. 2. RELATED WORK Existing techniques for diffuse-specular separation can be classified into the following four approaches. Varying viewing directions: The reflected light observed on an object surface consists of a viewpoint-invariant diffuse reflection component and a viewpoint-dependent specular re- flection component. Then, we can consider the darkest inten- sity at a point on the surface seen from varying viewing direc- tions as the diffuse reflection component [9, 10]. In contrast to this approach, our proposed method makes use of varying light source directions as well as varying viewing directions. Varying light source directions: An image of a Lamber- tian object under an arbitrary directional light source is rep- resented by a linear combination of three basis images of the object [11]. Then, the approach based on varying light source 2632 978-1-5090-2175-8/17/$31.00 ©2017 IEEE ICIP 2017

Transcript of DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are...

Page 1: DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are low-rank, when the all pixel values are described by the Lambert model. First, we consider

DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGESUNDER VARYING ILLUMINATION

Kouki Takechi Takahiro Okabe

Department of Artificial IntelligenceKyushu Institute of Technology

ABSTRACT

Separating diffuse and specular reflection components is im-portant for preprocessing of various computer vision tech-niques such as photometric stereo. In this paper, we addressdiffuse-specular separation for photometric stereo based onlight fields. Specifically, we reveal the low-rank structure ofthe multi-view images under varying light source directions,and then formulate the diffuse-specular separation as a low-rank approximation of the 3rd order tensor. Through a num-ber of experiments using real images, we show that our pro-posed method, which integrates the complement clues basedon varying light source directions and varying viewing direc-tions, works better than existing techniques.

Index Terms— photometric stereo, reflectance separa-tion, light field, higher-order SVD

1. INTRODUCTION

Photometric stereo is a technique for estimating surface nor-mals from the shading observed on an object surface undervarying illumination. In contrast to multi-view stereo for es-timating the depths of possibly sparse corresponding points,photometric stereo can estimate pixel-wise surface normal,i.e. densely recover the shape of an object.

In general, the reflected light observed on an objectsurface consists of a diffuse reflection component and aspecular reflection component. The conventional photomet-ric stereo [1] assumes the Lambert model and known lightsources, and then specular reflection components are outliersto be removed. On the other hand, in the context of uncal-ibrated photometric stereo [2] assuming the Lambert modeland unknown light sources, specular reflection componentsare an important clue for uniquely recovering the shape of anobject [4, 5] by resolving the Generalized Bas-Relief (GBR)ambiguity [3]. Therefore, separating diffuse and specularreflection components of the images of an object taken undervarying light source directions is an important topic to beaddressed for photometric stereo.

In this paper, we address diffuse-specular separation forphotometric stereo based on light fields. Since light field cam-eras [6] can capture multi-view images in a single shot, we

study diffuse-specular separation of multi-view images undervarying illumination, and propose a robust method by inte-grating a clue based on varying light source directions and aclue based on varying viewing directions.

Specifically, we represent the multi-view images of an ob-ject under varying light source directions as the 3rd order ten-sor whose axes are the position on the object surface, the lightsource direction, and the viewing direction. Then, we revealthe low-rank structure of the tensor on the basis of the factsthat the brightness of a diffuse reflection component is inde-pendent of viewing directions and that it is given by the innerproduct between a light source direction and a normal,i.e. bi-linear with respect to them. Based on the above theoreticalinsight, our proposed method separates diffuse and specularreflection components via low-rank tensor approximation us-ing higher-order Singular Value Decomposition (SVD) [7]. Inaddition, SVD with missing data [8] is incorporated into ourmethod in order to take outliers such as specular reflectioncomponents and shadows into consideration.

To demonstrate the effectiveness of our proposed method,we conducted a number of experiments using real images. Weshow that our method, which integrates the complement cluesbased on varying light source directions and varying viewingdirections, performs better than existing techniques.

2. RELATED WORK

Existing techniques for diffuse-specular separation can beclassified into the following four approaches.

Varying viewing directions: The reflected light observedon an object surface consists of a viewpoint-invariant diffusereflection component and a viewpoint-dependent specular re-flection component. Then, we can consider the darkest inten-sity at a point on the surface seen from varying viewing direc-tions as the diffuse reflection component [9, 10]. In contrastto this approach, our proposed method makes use of varyinglight source directions as well as varying viewing directions.

Varying light source directions: An image of a Lamber-tian object under an arbitrary directional light source is rep-resented by a linear combination of three basis images of theobject [11]. Then, the approach based on varying light source

2632978-1-5090-2175-8/17/$31.00 ©2017 IEEE ICIP 2017

Page 2: DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are low-rank, when the all pixel values are described by the Lambert model. First, we consider

imaging

sensor

objectmain lens

array of

microlenses

Fig. 1. A microlens-based light field camera: the rays emittedfrom a point on an object surface are focused on a microlens.

Fig. 2. The closeup of a raw light field image.

directions is often used for photometric stereo, and a num-ber of techniques such as low-rank matrix recovery [11, 12]have been proposed. In contrast to this approach, our methodmakes use of not only varying light source directions but alsovarying viewing directions.

Polarizations: When we observe the reflected light froman object surface illuminated by polarized light, specular re-flection components are polarized whereas diffuse reflectioncomponents are unpolarized [13]. Then, the difference of thepolarization states can be used for diffuse-specular separa-tion. The polarization-based approach requires a set of im-ages taken by placing linear polarizing filters in front of alight source and a camera, and rotating one of them.

Colors: The color-based approach makes use of the dif-ference between the colors of a specular reflection componentand a diffuse reflection component. According to the dichro-matic reflection model [14], the former is the same as the lightsource color whereas the latter depends on the spectral re-flectance of a surface. This approach does not work well forobjects with low-saturation reflectance and for narrow-bandlight sources such as LED, because the color of a specular re-flection component is almost the same as the color of a diffuseone for them.

3. PROPOSED METHOD

3.1. Raw Light Field Image

In this study, we assume microlens-based light field camerasin which an array of microlenses is placed in front of an imag-ing sensor. As shown in Fig. 1, our proposed method assumesthat a camera focuses on an object of interest1, i.e. the raysemitted from a point on the object surface are focused on a

1We consider the target objects and/or imaging conditions such that par-allaxes are negligible,e.g.near-planar objects.

Fig. 3. Unfolding the 3rd order tensor with respect to eachaxis:M = 4, L = 2, andV = 3 for display purpose.

microlens. Therefore, the pixels just below a microlens cap-ture the intensities of the same point on the object surface seenfrom varying viewing directions. We can observe the silhou-ette of an array of microlenses in the closeup of a raw lightfield image in Fig. 2.

3.2. Structure of 3rd Order Tensor

Let us consider a set of images of a static object taken by astatic light field camera but under varying light source direc-tions. We denote the pixel value of thev-th (v = 1, 2, 3, ..., V )pixel, i.e. observed from thev-th viewing direction, inthe m-th (m = 1, 2, 3, ...,M ) miclolens, i.e. observedat them-th position on the object surface, under thel-th(l = 1, 2, 3, ..., L) light source direction byimlv

2. Our pro-posed method represents those pixel values as the 3rd ordertensorI whose axes are the position on the object surfacem,the light source directionl, and the viewing directionv.

As shown in Fig. 3, we unfold the 3rd order tensorI withrespect to each axis, and then consider three matricesI(mlens),I(light), andI(view) whose sizes areM × (LV ), L× (VM),andV × (ML) respectively. Hereafter, we show that thosematrices are low-rank, when the all pixel values are describedby the Lambert model.

First, we consider the matrixI(mlens) at the top of Fig. 3.This matrix consists ofV blocks with the size ofM × L.According to the Lambert model, the pixel values of thel-thcolumn in the first block are given by ρ1n

⊤1

...ρMn⊤

M

sl. (1)

2We assume thatM , L, andV are significantly larger than 3.

2633

Page 3: DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are low-rank, when the all pixel values are described by the Lambert model. First, we consider

Here,ρm andnm are the albedo and normal at the positionon the object surface corresponding to them-th microlens,andsl is the light source vector whose norm and directionare the intensity and direction of thel-th light source respec-tively. Since the degree of freedom of the light source vectoris 3 in general3, the rank of the first block is 3. This meansthat an image of a Lambertian object under an arbitrary di-rectional light source is represented by a linear combinationof three basis images of the object [11]. In addition, since thebrightness of a diffuse reflection component is independent ofviewing directions, the other blocks are the same as the firstblock, and therefore the rank of the matrixI(mlens) is also 3.

Second, let us consider the matrixI(light) at the middleof Fig. 3. This matrix consists ofM blocks with the size ofL×V . Since the brightness of a diffuse reflection componentis viewpoint-invariant, the rank of each block is 1. Accordingto the Lambert model, the pixel values of an arbitrary columnin them-th block are given by s⊤1

...s⊤L

ρmnm. (2)

Since the degree of freedom of the vectorρmnm is 3 in gen-eral4, the rank of the matrixI(light) is also 3.

Third, we consider the matrixI(view) at the bottom ofFig. 3. It is obvious that the rank of the matrixI(view) is1, since the brightness of a diffuse reflection component doesnot depend on viewing directions. Hence, the structure of the3rd order tensorI is revealed; the matrices given by unfoldingthe tensor with respect to each axis are low-rank.

3.3. Low-Rank Tensor Approximation

Based on the low-rank structure of the 3rd order tensor,our method separates diffuse and specular reflection compo-nents via low-rank tensor approximation using higher-orderSVD [7]. First, we decompose the matrixI(mlens) as

I(mlens) = U(mlens)Σ(mlens)V⊤(mlens) (3)

by using SVD. Since the rank of the matrixI(mlens) is 3, we

obtain theM ×3 matrix U(mlens) by keeping the eigenvectorsin the matrixU(mlens) corresponding to the 3 largest eigenval-

ues. In a similar manner, we obtain theL × 3 matrix U(light)

and theV × 1 matrix U(view).

By using those matrices, we compute the core tensorZ as

Z = I ×1 U⊤(mlens) ×2 U

⊤(light) ×3 U

⊤(view) (4)

from the tensorI. Here,×n stands for then-th mode product.We compute the low-rank approximation of the tensorI as

I = Z ×1 U(mlens) ×2 U(light) ×3 U(view), (5)3We assume that the light source vectors are not degenerate.4The exception includes degenerate shapes such as a plane and a cylinder.

(a) (b) (c) (d)

(f)(e) (g) (h)

Fig. 4. The experimental results (ceramic dwarf): (a) an inputraw image, (b) our result raw image, (c) an input image, (d)our result image, (e) our result image without outlier removal,and the result images of existing techniques: (f) varying view-ing directions, (g) varying light source directions, and (h) po-larizations.

and consider it as diffuse reflection components.In general, actual images consist of not only diffuse reflec-

tion components but also outliers such as specular reflectioncomponents and shadows. Therefore, the low-rank matricesU(mlens), U(light), andU(view) computed via SVD are contam-inated by those outliers. Accordingly, our proposed methodmakes use of SVD with missing data [8] instead of the con-ventional SVD for computing the low-rank matrices withoutcontaminations due to those outliers5.

4. EXPERIMENTS

4.1. Experimental Results

To confirm the effectiveness of our proposed method, we con-ducted a number of experiments using real images. We used10 input images (L = 10) for each object taken by a staticlight field camera, LYTRO ILLUM, but under varying lightsource directions. The center coordinates of the microlensesare calibrated by using the existing library [15].

In Fig. 4, we show (a) an input raw image of a ceramicdwarf and (b) our result raw image6 under a certain lightsource direction. We can see that the highlights due to specu-lar reflection components are observed at saturated pixels andtheir neighbors in (a) the input image, but they are success-fully removed from (b) our result image.

Fig. 4 (c) and (d) are the images seen from the centralviewing direction synthesized from the raw images (a) and(b) respectively. Those images also show that our proposedmethod works well. In addition, (d) our result image is very

5Actually, we iteratively updateU(mlens), U(light), U(view), andZ bytaking outliers into consideration.

6We handle the pixels within the radius of 6 pixels from the centers ofmicrolenses, and black out other pixels.

2634

Page 4: DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are low-rank, when the all pixel values are described by the Lambert model. First, we consider

(a) (b) (c)

(d) (f)(e)

Fig. 5. The experimental results (wood bread): (a) an in-put image, (b) our result image, (c) our result image withoutoutlier removal, and the result images of existing techniques:(d) varying viewing directions, (e) varying light source direc-tions, and (f) polarizations.

smooth, because noises,e.g. zero-mean Gaussian noises, arecanceled out by applying SVD to the pixel values observed ata point on an object surface from varying viewing directions.Those results qualitatively show that our method works well.

4.2. Comparison with Existing Techniques

We compared the performance of our proposed method withthose of four closely related methods. In Fig. 4, we showthe result images of (e) our method without outlier removal,(f) the method based on varying viewing directions, (g) themethod based on varying light source directions, and (h) thepolarization-based method.

First, Fig. 4 (e) shows that our method without outlier re-moval cannot remove specular reflection components, in par-ticular, the result image is contaminated by the highlights ob-served under the all light source directions. This result showsthe effectiveness of outlier removal,i.e. the use of SVD withmissing data instead of the conventional SVD.

Second, in Fig. 4 (f), we can see that the method based onvarying viewing directions cannot remove specular reflectioncomponents, when they are observed from the all viewing di-rections. In addition, the method is sensitive to noises andthen the result image is rough, because it considers the dark-est intensity as the diffuse reflection component.

Third, Fig. 4 (g) shows that the method based on varyinglight source directions can remove most specular reflectioncomponents, although some of them are still remaining. Themethod is rather sensitive to noises and then the result imageis a little rough, because a set of images seen from a singleviewing direction instead of multiple ones are used.

Fourth, in Fig. 4 (h), we can see that the polarization-based method7 can remove specular reflection componentswell. Unfortunately, however, the method is sensitive to

7The result image is darker than those of the other methods due to theuse of polarization filters. Then, the result image is scaled so that its averagepixel value is the same as that of our method.

0

5

10

15

20

25

30

35

40

45

0

5

10

15

20

25

30

35

40

-100

-50

0

50

100

150

0

5

10

15

20

25

30

35

40

45

0

5

10

15

20

25

30

35

40

-80

-60

-40

-20

0

20

40

60

(a) (b)

Fig. 6. The shapes of the dwarf (a) with and (b) without theGBR ambiguity.

noises and then the result image is rough, because a set of im-ages seen from a single viewing direction and under a singlelight source direction are used.

Fig. 5 shows that the experimental results for a woodbread are similar to those for the ceramic dwarf. Here, thenoises in (f) the polarization-based method are removed byaveraging 64 images so that it is considered as the groundtruth. The root-mean-square errors of pixel values in 8 bitimages are 3.67, 7.49, 9.24, and 4.91 for (b) our method, (c)our method without outlier removal, (d) varying viewing di-rections, and (e) varying light source directions respectively.This result quantitatively shows that our proposed methodperforms better than those existing techniques.

4.3. Application to Uncalibrated Photometric Stereo

We can use the result of our proposed method for uncalibratedphotometric stereo. Fig. 6 (a) shows the 3D shape of the ce-ramic dwarf recovered from the separated diffuse reflectioncomponents up to the GBR ambiguity by imposing the in-tegrability constraint [16]. The separated specular reflectioncomponents are further useful for resolving the GBR ambi-guity [5] as shown in Fig. 6 (b), where the convex/concaveambiguity is resolved manually. Note that the depth clue forresolving the GBR ambiguity is not available, when paral-laxes are negligible.

5. CONCLUSION AND FUTURE WORK

In this paper, we addressed diffuse-specular separation forphotometric stereo based on light fields. Specifically, we re-vealed the low-rank structure of the multi-view images un-der varying light source directions, and then formulated thediffuse-specular separation as a low-rank approximation ofthe 3rd order tensor. Through a number of experiments usingreal images, we confirmed that our proposed method, whichintegrates the complement clues based on varying light sourcedirections and varying viewing directions, works better thanthe existing techniques. The extension to scenes with non-negligible parallaxes is one of the future directions of thisstudy.AcknowledgmentsThis work was supported by JSPS KAK-ENHI Grant Numbers JP16H01676 and JP17H01766.

2635

Page 5: DIFFUSE-SPECULAR SEPARATION OF MULTI-VIEW IMAGES …okabe/paper/ICIP2017_takechi.pdfmatrices are low-rank, when the all pixel values are described by the Lambert model. First, we consider

6. REFERENCES

[1] R. Woodham, “Photometric method for determiningsurface orientation from multiple images,”Optical En-gineering, vol. 19, no. 1, pp. 139–144, 1980.

[2] H. Hayakawa, “Photometric stereo under a light sourcewith arbitrary motion,” JOSA A, vol. 11, no. 11, pp.3079–3089, 1994.

[3] P. Belhumeur, D. Kriegman, and A. Yuille, “The bas-relief ambiguity,” inProc. IEEE CVPR1997, 1997, pp.1060–1066.

[4] A. Georghiades, “Incorporating the Torrance and Spar-row model of reflectance in uncalibrated photometricstereo,” inProc. IEEE ICCV2003, 2003, pp. 816–823.

[5] O. Drbohlav and M. Chantler, “Can two specularpixels calibrate photometric stereo?,” inProc. IEEEICCV2005, 2005, pp. II–1850–1857.

[6] R. Ng, M. Levoy, M. Bredif, G. Duval, M. Horowitz,and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Tech. Rep., Stanford CTSR2005-02, 2005.

[7] M. Vasilescu and D. Terzopoulos, “Multilinear analysisof image ensembles: tensorfaces,” inProc. ECCV2002(LNCS2350), 2002, pp. 447–460.

[8] H.-Y. Shum, K. Ikeuchi, and R. Reddy, “Principal com-ponent analysis with missing data and its application topolyhedral object modeling,”IEEE Trans. PAMI, vol.17, no. 9, pp. 854–867, 1995.

[9] K. Nishino, Z. Zhang, and K. Ikeuchi, “Determining re-flectance parameters and illumination distribution froma sparse set of images for view-dependent image synthe-sis,” in Proc. IEEE ICCV2001, 2001, pp. 599–606.

[10] M. Tao, J.-C. Su, T.-C. Wang, J. Malik, and R. Ra-mamoorthi, “Depth estimation and specular removalfor glossy surfaces using point and line consistency withlight-field cameras,”IEEE Trans. PAMI, vol. 38, no. 6,pp. 1155–1169, 2016.

[11] A. Shashua, “On photometric issues in 3D visual recog-nition from a single 2D image,”IJCV, vol. 21, no. 1/2,pp. 99–122, 1997.

[12] L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang,and Y. Ma, “Robust photometric stereo via low-rankmatrix completion and recovery,” inProc. ACCV2010(LNCS6494), 2010, pp. 703–717.

[13] L. Wolff and T. Boult, “Constraining object featuresusing a polarization reflectance model,”IEEE Trans.PAMI, vol. 13, no. 7, pp. 635–657, 1991.

[14] S. Shafer, “Using color to separate reflection compo-nents,” Color Research & Application, vol. 10, no. 4,pp. 210–218, 1985.

[15] Y. Bok, H.-G. Jeon, and I. S. Kweon, “Geometric cal-ibration of micro-lens-based light-field cameras usingline features,” inProc. ECCV2014 (LNCS8694), 2014,pp. 47–61.

[16] A. Yuille and D. Snow, “Shape and albedo from multipleimages using integrability,” inProc. IEEE CVPR1997,1997, pp. 158–164.

2636