Multiview Rectification of Folded Documents

8
IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Multiview Rectification of Folded Documents Shaodi You, Member, IEEE, Yasuyuki Matsushita, Member, IEEE, Sudipta Sinha, Member, IEEE Yusuke Bou, Member, IEEE, Katsushi Ikeuchi, Fellow, IEEE, Abstract—Digitally unwrapping images of paper sheets is crucial for accurate document scanning and text recognition. This paper presents a method for automatically rectifying curved or folded paper sheets from a few images captured from multiple viewpoints. Prior methods either need expensive 3D scanners or model deformable surfaces using over-simplified parametric representations. In contrast, our method uses regular images and is based on general developable surface models that can represent a wide variety of paper deformations. Our main contribution is a new robust rectification method based on ridge-aware 3D reconstruction of a paper sheet and unwrapping the reconstructed surface using properties of developable surfaces via 1 conformal mapping. We present results on several examples including book pages, folded letters and shopping receipts. Index Terms—Robust digitally unwarpping, ridge-aware surface reconstruction, mobile phone friendly algorithms 1 I NTRODUCTION Digitally scanning paper documents for sharing and editing is becoming a common daily task. Such pa- per sheets are often curved or folded, and proper rectification is important for high-fidelity digitization and text recognition. Flatbed scanners allow physical rectification of such documents but are not suitable for hardcover books. For a wider applicability of document scanning, it is wanted a flexible technique for digitally rectifying folded documents. There are two major challenges in document image rectification. First, for a proper rectification, the 3D shape of curved and folded paper sheets must be estimated. Second, the estimated surface must be flat- tened without introducing distortions. Prior methods for 3D reconstruction of curved paper sheets either use specialized hardware [1], [2], [3] or assume sim- plified parametric shapes [4], [5], [6], [7], [8], [9], [2], such as generalized cylinders (Fig. 2a). However, these methods are difficult to use due to bulky hardware or make restrictive assumptions about the deformations of the paper sheet. In this paper, we present a convenient method for digitally rectifying heavily curved and folded paper sheets from a few uncalibrated images captured with a hand-held camera from multiple viewpoint. Our method uses structure from motion (SfM) to recover an initial sparse 3D point cloud from the uncalibrated images. To accurately recover the dense 3D shape of paper sheet without losing high-frequency struc- S. You is with National ICT Australia (NICTA), and Australian National University, Australia. E-mail: [email protected] Y. Matsushita is with Osaka University, Japan. E-mail: [email protected] S. Sinha is with Microsoft Research Redmond, USA. E-mail: [email protected] Y. Bou is with Microsoft, Japan. E-mail: [email protected] K. Ikeuchi is with Microsoft Research Asia, China. E-mail: [email protected] tures such as folds and creases, we develop a ridge- aware surface reconstruction method. Furthermore, to achieve robustness to outliers present in the sparse SfM 3D point cloud caused by repetitive document textures, we pose the surface reconstruction task as a robust Poisson surface reconstruction based on 1 optimization. Next, to unwrap the reconstructed sur- face, we propose a robust conformal mapping method by incorporating ridge-awareness priors and 1 opti- mization technique. See Fig. 1 for an overview. The contributions of our work are threefold. First, we show how ridge-aware regularization can be used for both 3D surface reconstruction and flattening (con- formal mapping) to improve accuracy. Our ridge- aware reconstruction method preserves the sharp structure of folds and creases. Ridge-awareness pri- ors act as non-local regularizers that reduce global distortions during the surface flattening step. Second, we extend the Poisson surface reconstruction [10] and least-squares conformal mapping (LSCM) [11] algorithms by explicitly dealing with outliers using 1 optimization. Finally, we describe a practical system for rectifying curved and folded documents that can be used with ordinary digital cameras. 2 RELATED WORK The topic of digital rectification of curved and folded documents has been actively studied in both the com- puter vision and document processing communities. It is common to model paper sheets as developable surfaces which have underlying rulers corresponding to lines with zero Gaussian curvature. Many exist- ing methods assume generalized cylindrical surfaces where the paper is curved only in one direction and thus can be parameterized using a 1D smooth function. Such surfaces do not require an explicit parameterization of the rulers. See Fig. 2a for an example. A variety of existing techniques recover arXiv:1606.00166v1 [cs.CV] 1 Jun 2016

Transcript of Multiview Rectification of Folded Documents

Page 1: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

Multiview Rectification of Folded DocumentsShaodi You, Member, IEEE, Yasuyuki Matsushita, Member, IEEE, Sudipta Sinha, Member, IEEE

Yusuke Bou, Member, IEEE, Katsushi Ikeuchi, Fellow, IEEE,

Abstract—Digitally unwrapping images of paper sheets is crucial for accurate document scanning and text recognition. Thispaper presents a method for automatically rectifying curved or folded paper sheets from a few images captured from multipleviewpoints. Prior methods either need expensive 3D scanners or model deformable surfaces using over-simplified parametricrepresentations. In contrast, our method uses regular images and is based on general developable surface models that canrepresent a wide variety of paper deformations. Our main contribution is a new robust rectification method based on ridge-aware3D reconstruction of a paper sheet and unwrapping the reconstructed surface using properties of developable surfaces via `1conformal mapping. We present results on several examples including book pages, folded letters and shopping receipts.

Index Terms—Robust digitally unwarpping, ridge-aware surface reconstruction, mobile phone friendly algorithms

F

1 INTRODUCTION

Digitally scanning paper documents for sharing andediting is becoming a common daily task. Such pa-per sheets are often curved or folded, and properrectification is important for high-fidelity digitizationand text recognition. Flatbed scanners allow physicalrectification of such documents but are not suitablefor hardcover books. For a wider applicability ofdocument scanning, it is wanted a flexible techniquefor digitally rectifying folded documents.

There are two major challenges in document imagerectification. First, for a proper rectification, the 3Dshape of curved and folded paper sheets must beestimated. Second, the estimated surface must be flat-tened without introducing distortions. Prior methodsfor 3D reconstruction of curved paper sheets eitheruse specialized hardware [1], [2], [3] or assume sim-plified parametric shapes [4], [5], [6], [7], [8], [9], [2],such as generalized cylinders (Fig. 2a). However, thesemethods are difficult to use due to bulky hardware ormake restrictive assumptions about the deformationsof the paper sheet.

In this paper, we present a convenient method fordigitally rectifying heavily curved and folded papersheets from a few uncalibrated images captured witha hand-held camera from multiple viewpoint. Ourmethod uses structure from motion (SfM) to recoveran initial sparse 3D point cloud from the uncalibratedimages. To accurately recover the dense 3D shapeof paper sheet without losing high-frequency struc-

• S. You is with National ICT Australia (NICTA), and AustralianNational University, Australia. E-mail: [email protected]

• Y. Matsushita is with Osaka University, Japan.E-mail: [email protected]

• S. Sinha is with Microsoft Research Redmond, USA.E-mail: [email protected]

• Y. Bou is with Microsoft, Japan. E-mail: [email protected]• K. Ikeuchi is with Microsoft Research Asia, China.

E-mail: [email protected]

tures such as folds and creases, we develop a ridge-aware surface reconstruction method. Furthermore, toachieve robustness to outliers present in the sparseSfM 3D point cloud caused by repetitive documenttextures, we pose the surface reconstruction task asa robust Poisson surface reconstruction based on `1optimization. Next, to unwrap the reconstructed sur-face, we propose a robust conformal mapping methodby incorporating ridge-awareness priors and `1 opti-mization technique. See Fig. 1 for an overview.

The contributions of our work are threefold. First,we show how ridge-aware regularization can be usedfor both 3D surface reconstruction and flattening (con-formal mapping) to improve accuracy. Our ridge-aware reconstruction method preserves the sharpstructure of folds and creases. Ridge-awareness pri-ors act as non-local regularizers that reduce globaldistortions during the surface flattening step. Second,we extend the Poisson surface reconstruction [10]and least-squares conformal mapping (LSCM) [11]algorithms by explicitly dealing with outliers using `1optimization. Finally, we describe a practical systemfor rectifying curved and folded documents that canbe used with ordinary digital cameras.

2 RELATED WORK

The topic of digital rectification of curved and foldeddocuments has been actively studied in both the com-puter vision and document processing communities.It is common to model paper sheets as developablesurfaces which have underlying rulers correspondingto lines with zero Gaussian curvature. Many exist-ing methods assume generalized cylindrical surfaceswhere the paper is curved only in one directionand thus can be parameterized using a 1D smoothfunction. Such surfaces do not require an explicitparameterization of the rulers. See Fig. 2a for anexample. A variety of existing techniques recover

arX

iv:1

606.

0016

6v1

[cs

.CV

] 1

Jun

201

6

Page 2: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2

(a) Multi-view images (e) Dewarp

Dewarp

(b) Structure from motion (c) Ridge-aware reconstruction (d) Robust unwrapping

Fig. 1. Our technique recovers a ridge-aware 3D reconstruction of the document surface from a sparse 3D point cloud.The final rectified image is then obtained via robust conformal mapping.

(a) Generalized

cylindrical surface

(b) Cylinder-like

developable surface

(c) General

developable surface

Fig. 2. Developable surfaces with underlying rulers (lineswith zero Gaussian curvature) and fold lines (ridges) shownas dotted and solids lines respectively. Examples of (a)smooth parallel rulers, (b) smooth rulers not parallel to eachother and (c) rulers and ridges in arbitrary directions.

surface geometry using this assumption. Shape fromshading methods were first used by Wada et al. [4],Tan et al. [12], [13], Courteille et al. [14] and Zhang etal. [5] whereas shape from boundary methods wereexplored by Tsoi et al. [15], [6]. Binocular stereo match-ing with calibrated cameras was used by Yamashita etal. [16], Koo et al. [7] and Tsoi et al. [6]. Shape from textlines is another popular method for reconstructingthe document surface geometry [17], [12], [18], [19],[20], [21], [8], [22], [9], [23], [24], [25]. However, thesemethods assume that the document contains well-formatted printed characters.

Some recent methods relax the parallel ruler as-sumption (see Fig. 2b). However, the numerous pa-rameters in these models makes the optimizationquite challenging. Liang et al. [26] and Tian et al. [27]use text lines. Although these methods can handle asingle input image, the strong assumptions on surfacegeometry, contents and illumination limit the applica-bility. Meng et al.designed a special calibrated activestructural light device to retrieve the two parallel1D curvatures [2], the surface can be parameterizedby assuming appropriate boundary conditions andconstraints on ruler orientations. Perriollat et al.[28]use sparse SfM points but assume they are reasonablydense and well distributed. Their parameterization issensitive to noise and can be unreliable when the 3dpoint cloud is sparse or has varying density.

For rectification of documents with arbitrary distor-tion and content (Fig. 2c), other methods require spe-cialized devices and use non-parametric approaches.

Brown et al. [3] use a calibrated mirror system toobtain 3D geometry using multi-view stereo. Theyunwrap the reconstructed surface using constraintson elastic energy, gravity and collision. The model isnot ideal for paper documents because developablesurfaces are not elastic. Later, they propose usingdense 3D range data [29] after which they flatten thesurface using least square conformal mapping [11].Zhang et al. [30] also use dense range scans and userigid constraints instead of elastic constraints withthe method proposed in [3]. Pilu [1] assumes thata dense 3D mesh is available and minimizes theglobal bending potential energy to flatten the surface.None of these existing methods are as practical andconvenient as our method that only requires a hand-held camera and a few images.

3 PROPOSED METHODOur method has two main steps – 3D documentsurface reconstruction and unwrapping of the recon-structed surface. For now, we assume that a set ofsparse 3D points on the surface are available. Next, wedescribe our new algorithms for ridge-aware surfacereconstruction and robust surface unwrapping.

3.1 Ridge-aware surface reconstructionDense methods are favored for 3D scanning of foldedand curved documents [31], [32]. This is because exist-ing methods for surface reconstruction from sparse 3Dpoints tend to produce excessive smoothing and failto preserve sharp creases and folds ie. ridges on thesurface. Such methods are typically also inadequatefor dealing with noisy 3D points caused by repeti-tive textures present in documents. We address theseissues by developing a robust ridge-aware surfacereconstruction method for sparse 3D points. Specif-ically, we extend the Poisson surface reconstructionmethod [10] by incorporating ridge constraints andby adding robustness to outliers.Robust Poisson surface reconstruction. We denotea set of N sparse 3D points obtained from SfMas xn, yn, zn, n = 1, 2, · · · , N , where only 3D pointstriangulated from at least three images are retained.For our input images, N typically lies between 700 to2000. For a selected reference image (and viewpoint),we use a depth map parameterization z(x, y) for thedocument surface. We aim to estimate depth at the

Page 3: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 3

(a) Input (b) Robust Poisson reconstruction

(c) Ridge candidates (d) Ridge-aware reconstruction

Fig. 3. Example of ridge-aware 3D surface reconstruction.

mesh grid vertices zi(xi, yi), where i is the meshgrid index, 1 ≤ i ≤ I . Our method computes theoptimal depth values z∗ = [z1, . . . , zI ]

> by solving thefollowing optimization problem.

z∗ = argminz

Ed(z) + λEs(z). (1)

Here, Ed and Es are the data and smoothness termsrespectively and λ is a parameter to balance thetwo terms. The original Poisson surface reconstructionmethod uses the squared `2-norm for both terms.Instead, we propose using the `1-norm for the dataterm Ed to deal with outliers.

Ed(z) =∑n

‖zn − zi‖1. (2)

This encourages zi to be consistent with the observeddepth zn without requiring explicit knowledge aboutwhich observations are outliers. We rewrite Eq. (2) invector form.

Ed(z) = ‖z− PΩz‖1, (3)where PΩ is a permutation matrix that selects andaligns observed entries Ω by ensuring correspondencebetween zn and zi. The smoothness term Es is definedusing the squared Frobenius norm of the gradient ofdepth vector z along x and y in camera coordinates.

Es(z) = ‖∇2z‖2F =

∥∥∥∥[∂2z

∂x2,∂2z

∂y2

]∥∥∥∥2

F

. (4)

By preparing a sparse derivative matrix D thatreplaces the Laplace operator ∇2 in a linear form,

D =

di,j =

2 if i = j

−1 if zj is left/right to zi0 otherwise

di+I,j =

2 if i = j

−1 zj is above/below zi0 otherwise

2I×I

, (5)

we have a special form of the Lasso problem [33].z∗ = argmin

z‖z− PΩz‖1 + λ‖Dz‖22. (6)

While this problem (Eq. (6)) does not have aclosed form solution, we use a variant of iterativelyreweighted least squares (IRLS) [34] for deriving thesolution. By rewriting the data terms in Eq. (6) as aweighted `2 norm using a diagonal matrix W withpositive values on the diagonal, we have

z∗=argminz

(z− PΩz)>W>W (z− PΩz) + λz>D>Dz. (7)

In contrast to Equation (6), the data term now uses`2 norm instead of `1 norm. We solve this problem(Equation (7)) using alternation as described next.

Step 1: Update zEq. (7) can be rewritten as z∗ = argminz ‖ Az− b ‖22,

where A =

[WPΩ√λD

]and b =

[Wz02I×1

]and 02I×1 is

a zero vector of length 2I . This is a squared `2 sparselinear system. It has the closed form solution

z∗ = [A>A + αI]−1A>b, (8)

where I is the identity matrix, α is a regularizationparameter (we use α = 1.0e-8).

Step 2: Update WWe initialize W to the identity matrix. During eachiteration, each diagonal element wi of W is updatedgiven the residual r = WPΩz

∗ −Wb, as follows.

wi =1

|ri|+ ε, (9)

Here, ri is the i-th element of r and ε is a small positivevalue (we use ε = 1.0e-8). These steps are repeateduntil convergence; namely, until the estimate at t-thiteration z∗(t) becomes similar to the previous estimatez∗(t−1), i.e., ‖z∗(t)−z∗(t−1)‖2 < 1.0e-8. Figure 3.c showsan example of the reconstructed mesh.

Ridge-aware reconstruction. Developable surfacesare ruled [35], i.e., contain straight lines on the surfaceas shown in Fig. 2. Our method exploits this geo-metric property as described in this section. Unlikeexisting parameterization-based methods which onlyhandle smooth rulers, [26], [27], [2], [28], extractingarbitrary creases and ridges is more difficult whenonly sparse 3D points are available. We propose asequential approach by first detecting ridges on themesh z∗ that was obtained using our robust Pois-son reconstruction method. After selecting the ridgecandidates, we instantiate additional linear ridge con-straints and incorporate them into the linear systemthat was solved earlier. This sequential approach isquite general and avoids overfitting. It also avoidsspurious ridge candidates arising due to noise.

For each point z(x, y) on the mesh z∗, we computethe Hessian K as follows.

K(z) =

[∂2z∂x2

∂2z∂x∂y

∂2z∂x∂y

∂2z∂y2

]. (10)

Based on the following Eigen decomposition of K(z),

Page 4: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4

K(z) = [p1,p2]

[κ1 00 κ2

][p1,p2]

>, (11)

we obtain principal curvatures κ1 and κ2 (|κ1| ≤ |κ2|)and the corresponding eigenvectors p1 and p2.

The value of κ1 is equal to zero at all points on adevelopable surface. Thus, at any point zi, a straightline along direction p1 must lie on the surface. Asdiscussed earlier and shown in Fig. 2, the curvaturealong the ridge is zero while the curvature orthogonalto the ridge reaches a local extremum. We use thisobservation to select ridge candidates using the valueof |κ2|. Mesh points zi(xi, yi) with |κ2(i)| greater thanthe threshold κth are selected as ridge candidates. (seeFigure 3d for an example). The associated smoothnessconstraints in Eq. (4) are adjusted as follows.

di,j = ϕ(〈p1, e1〉)di,jdi+I,j = ϕ(〈p1, e2〉)di+1,j ,

(12)

where 〈 , 〉 is the inner product and e1 = [1, 0]>, e2 =[0, 1]> are orthonormal bases. ϕ(·) is a convex mono-tonic function defined as ϕ(x) = βx2

−1β−1 , which places

a greater weight β 1 along the ridge and smallerweight orthogonal to it. We also consider two moredirectional smoothness constraints similar to thosestated in Eq. (12), for the two diagonal directionse3 = [

√2

2 ,√

22 ]> and e4 = [

√2

2 ,−√

22 ]>.

Finally, we modify Es(z) defined in Eq. (4) byadding these ridge constraints and solve a new sparselinear system (similar to the earlier one) to obtain thefinal reconstruction. Figure 3e shows that this methodcan preserve accurate folds and creases.

3.2 Surface UnwrappingGiven the 3D surface reconstruction, our next stepis to unwrap the surface. We take a conformal map-ping approach to this problem, amongst which, LeastSquares Conformal Mapping (LSCM) [11], [29] is asuitable choice. However, it is not resilient to the pres-ence of outliers and susceptible to global distortionwhich can occur due to the absence of long-rangeconstraints. We address both these issues and extendLSCM by incorporating an appropriate robustifier aswell as ridge constraints to reduce global drift.Conformal Mapping. For our mesh topology, each 3Dpoint zi(xi, yi), i = 1, . . . , I , on the grid on z formstwo triangles, one with its upper and left neighbor,the other with its lower and right neighbor on thegrid. The triangulated 3D mesh is denoted as T , z.A conformal map will produce an associated 2Dmesh with the same connectivity but with 2D vertexpositions such that the angle of all the triangles arebest preserved. We denote the 2D mesh as T ,u,where u = (ui, vi).

For a particular 3D triangle t with vertices at(x1, y1, z1), (x2, y2, z2), and (x3, y3, z3), we seek itsassociated 2D vertex positions ((u1, v1), (u2, v2) and(u3, v3)) under the conformal map. Using a local 2D

Obtain local basis:𝐀 = 𝑥3, 𝑦3, 𝑧3 − 𝑥1, 𝑦1, 𝑧1𝐁 = 𝑥2, 𝑦2, 𝑧2 − 𝑥1, 𝑦1, 𝑧1𝐍 = 𝐀 × 𝐁/ 𝐀 × 𝐁𝐗 = 𝐀/ 𝐀𝐘 = 𝐍 × 𝐗

𝑥1, 𝑦1, 𝑧1

𝐗

𝐘𝐍

𝐀

𝐁

Coordinates in local basis(𝑋1, 𝑌1) = (0, 0)(𝑋2, 𝑌2) = (𝐁 ⋅ 𝐗, 𝐁 ⋅ 𝐘)(𝑋3, 𝑌3) = (𝐀 ⋅ 𝐗, 0)

𝑥3, 𝑦3, 𝑧3

𝑥2, 𝑦2, 𝑧2

Fig. 4. Vertices of a triangle in a local coordinate basis.

coordinate basis for triangle t, the conformality con-straint is captured by the following linear equations.

1

S

[∆X1 ∆X2 ∆X3 −∆Y1 −∆Y2 −∆Y3∆Y1 ∆Y2 ∆Y3 ∆X1 ∆X2 ∆X3

]ut = 0, (13)

Here, ut = [u1, u2, u3, v1, v2, v3]>, S is the areaof t, ∆X1 = (X3 −X2), ∆X2 = (X1 −X3) and∆X3 = (X2 −X1) (∆Y is similarly defined). Notethat variables (X1, Y1), (X2, Y2), and (X3, Y3) wereobtained from t’s vertex coordinates (see Fig. 4).Putting together the constraints for all the triangles,we have the following sparse linear system.

Cu = 0. (14)

Using indices i and j to index the I vertices andJ triangles respectively, the 2J × 2I matrix C inEquation (14) has the following non-zero entries.

cj,i = ∆XSj, cj,i+I = −∆Y

Sj

cj+J,i = ∆YSj, cj+J,i+I = ∆X

Sj

(15)

Ridge constraints. Notice that the original confor-mal mapping has only local constraints, which willresult in global distortion, Fig. 5.f. To reduce globaldistortions during unwrapping, we add ridge andboundary constraints to constrain the solution further.

We take into consideration two facts. First, theridge lines remain straight after flattening but shouldessentially become invisible on the flattened surface.Second, the conformal mapping constraint Eq. (13)applies to beyond triangles. In particular, it is true forthree collinear points. Therefore, we propose using thecollinearity property to derive non-local constraintsduring flattening. and add it to our conformal mapestimation problem. Referring to Fig. 4, and imaginethe collinear case, that is when point (x2, y2, z2) is alsolying on the X axis; in such case, Y2 = Y1 = Y3 = 0.In addition, the area of the triangle T is zero. Hence,the ridge constraints can be written in a form similarto Eq. (13).[

∆X1 ∆X2 ∆X3 0 0 00 0 0 ∆X1 ∆X2 ∆X3

]uR = 0. (16)

where uR = [u1, u2, u3, v1, v2, v3]> are the targeted 2Dcoordinates similarly defined as ut. We select ridge

Page 5: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5

(a) RA (ours) + L1 (ours) (b) RA (ours) + Geo (c) RA (ours) + L2

(d) Po + L1(ours) (e) Po + Geo (f) Po + L2

Fig. 5. Rectification results from combination of meth-ods. Acronyms RA and Po denote our ridge-aware methodand Poisson reconstruction respectively. L1 denotes our `1conformal mapping method with non-local constraints; L2indicates LSCM [29] and Geo indicates geodesic unwrap-ping [30]. Po + L2 produced gross failures and is thus notcompared.

candidates in the same way as we did earlier duringreconstruction. However, this step is now more accu-rate because the surface is well reconstructed. For eachridge candidate (vertex), we find two farthest ridgecandidates along the ridge line in opposite directionsand instantiate the above mentioned constraint forthe three points. We assume that the boundary ofthe flattened 2D document image has straight linesegments (they need not be straight lines on the 3Dsurface). These boundary constraints can be expressedin a form similar to Eq. (16). We incorporate allridge and boundary constraints into a system of linearequations.

Ru = 0. (17)Robust conformal mapping. We propose using an`1 norm instead of the standard squared `2 norm tomake conformal mapping robust to outliers. Puttingtogether Eqs. (14) and (17) in the `1 sense, we have

u∗ = argminu

‖ Cu ‖1 +γ ‖ Ru ‖1, (18)

where γ balances the ridge and boundary constraints.To avoid the trivial solution u = 0, we fix twopoints of u to (ui, vi) = (0, 0) and (uj , vj) = (0, 1).Equation (18) is then rewritten asu∗ = argmin

u‖ Cu ‖1 +γ ‖ Ru ‖1 +θ ‖ Efix ‖22, (19)

where Efix is the energy function for the two fixedpoints. We solve the objective function using theiterative reweighted least squares method [34]. Fig-ure 5 shows a result from the conventional LSCM (`2method) and our proposed `1 method.

3.3 Implementation detailsSparse 3D reconstruction. We recover the initialsparse 3D point cloud using SfM. While any exist-ing SfM method is applicable, we use the popularincremental SfM technique et al. [36] in our system.We typically capture five to ten still images for eachdocument from different viewpoints. Capturing theseimages or equivalently a set of burst photos or a shortvideo clip only takes a few seconds. Figure 1a showsan input example and the corresponding reconstruc-tion.Image warping. After recovering the flattened meshgrid u = ui, vi, we unwrap the input image withthe maximum document area in pixels. To obtaincorrespondence between the input image and ui, vi,we project the 3D mesh points zi(xi, yi) into theimage to obtain image coordinates xi, yi using thecamera pose estimated using SfM. We then warpthe image according to the correspondence betweenxi, yi and ui, vi with bilinear interpolation.

4 EXPERIMENTS

We perform qualitative and quantitative evaluationon a wide variety of input documents. The first setof experiments show that our method can handledifferent paper types, document content and varioustypes of folds and creases. Next, we report a quanti-tative evaluation based on known ground truth usinglocal and global metrics where we demonstrate thesuperior performance and advantages of our methodover existing methods [29], [30], [28]. In all the experi-ments, we set parameters as follows: λ = 1e-5, β = 40,γ = 1e3, θ = 1e2 and κth = 0.006. Our methodis insensitive to these parameters. Varying λ, γ, θ byfactors of 0.1 – 1.0 or varying β or κth by 50% fromthese settings did not change the result significantly.

4.1 Test DataThe first six out of the 12 test sequences (I – VI) con-tain documents with no fold lines, one fold line, twoto three parallel fold lines, and two to three crossingfold lines respectively. The other six sequences (VII– XII) contain documents with an increasing numberof fold lines. Irregular fold lines were intentionallyadded to make the rectification more challenging.All documents were either placed on a planar orcurved background surface. Sequence VII contains ashopping receipt on a paper roll whereas II and VIIIcontain pages from a book. Sequences III, IV, IX andX contain letters folded within envelopes. Sequence V,VI, XI and XII contain examples of documents foldedinside a purse or notebook. The input images as wellas the results from our method are shown in Fig. 6.Our method does not rely on the content, formatting,layout or color of the document. Thus, it is generallyapplicable as long as a sufficient number of sparsekeypoints in the input images are available for SfM.

Page 6: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6

I II III IV V VI

VII VIII IX X XI XII

Fig. 6. [OUR RESULTS] Original images are shown in rows 1 and 3 and our rectification results are shown in rows 2 and 4.

4.2 Quantitative Evaluation Metrics

We quantitatively evaluate the global and local dis-tortion between the ground truth digital image andour rectified result using local and global metrics. Thedigital version of six out of the 12 test documents wereavailable. We treat those images as ground truth andresize them by setting their height to 1000 pixels.

Global distortion metric. We first register the rectifiedimage to the ground truth by estimating a globalaffine transform T estimated using SIFT keypointcorrespondences in these images [37].

T =

[a1 a2 t1a3 a4 t20 0 s

], (20)

This is achieved by minimizing the squared error.

T∗ = argminT

‖ Tp− p ‖22 . (21)

where, p and p denote corresponding 2D keypoint po-sitions using homogenous coordinates. We computethe global distortion metric G as follows.

G = (a1a4 − a2a3)/s2

G = max (G, 1/G).(22)

A perfect result has G = 1; and larger values indicatemore distortion (see comparative results in Fig. 7).Local distortion metric. We compute the local metricby performing dense image registration using SIFT-flow [38] between the rectified image and the groundtruth image. The frequency distribution of local dis-placements are shown in lower figure in Fig. 7 andcompared with existing methods. We found denseregistration to be more useful for an unbiased evalua-tion than sparse SIFT keypoint-based registration be-cause sparse methods are more likely to ignore manymatches if the result contains large deformations.

Page 7: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

RA + L1 (proposed) RA + Geo RA + L2 Po + L1 Po + Geo

Data I Data III Data IV Data VI Data X Data XII

Distortion

Global distortion evaluation metric

00.10.20.30.40.50.60.70.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

RA + L1 (proposed) RA + L2 RA + Geo Po + L1 Po + Geo

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

Data I

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

Data III

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

Data IV

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

0

0.2

0.4

0.6

0.8

1

1 4 7 10 13 16 19 22

Data VI Data X Data XII

Frequency

Error

Frequency

Error

Frequency

Error

Frequency

Error

Frequency

Error

Frequency

Error

Local distortion evaluation metricFig. 7. Distortion metrics for datasets shown in Fig. 6.Abbreviations are consistent with Fig. 5 and the text.

4.3 Comparison with existing methods

We first compare with three methods [28], [29], [30]on various real images and then use syntheticallygenerated data to further compare with the methodsdesigned for dense 3D point data [29], [30].

Perriollat et al. [28]. Their method explicitly parame-terizes smooth rulers but cannot handle our documentimages with creases and folds. Our method works fineon their dataset and produces a more accurate resultthan the one obtained by running their code 1 (seeFig. 8). Although our result has minor artifacts dueto self-occlusion and fore-shortening, the flatteningresult is quite accurate.

Brown et al. [29], Zhang et al. [30]. We compareto both methods using our sequences where groundtruth is available (Fig. 6). Since they require 3D rangedata, we use our reconstructed surface as their in-put and compare the surface flattening quality. Wealso compare our ridge-aware reconstruction to thestandard Poisson reconstruction method. As shownin Fig. 7, the global and local distortion metrics intro-duced earlier are used in the evaluation. Our methodhas higher accuracy in terms of both metrics. Resultsfrom various methods have been compared in Fig. 5.

Evaluation on synthetic data. We compared ourmethod with [29], [30] on synthetically generateddense 3D points because these methods require dense3D points. We vary the point cloud size from 2K to

1. Their result shown here was generated by the original codeprovided by the authors. These result do not agree with the resultsin their paper. This is probably due to a difference in initialization.

Input Mesh (Perriollat)

Dewarp (Perriollat)

Mesh (proposed)

Dewarp (proposed) Fig. 8. Visual comparison with cylindrical like surfacebased method which uses SfM as input [28].

300K (common in 3D range data) and inject varyinglevels of Gaussian noise. The results from the threemethods are compared in Figure 9. These experimentsshow that with low noise and high point density, allthree methods are comparable in accuracy. However,when the points are sparser or when the noise levelis higher, our method is more accurate than priormethods [29], [30].

5 CONCLUSION AND FUTURE WORK

In this paper, we propose a method for automati-cally rectifying curved or folded paper sheets froma small number of images captured from differentviewpoints. We use SfM to obtain sparse 3D pointsfrom images and propose ridge-aware surface recon-struction method which utilizes the geometric prop-erty of developable surface for accurate and dense3D reconstruction of paper sheets. We also robustifythe algorithms using `1 optimization techniques. Afterrecovering surface geometry, we unwrap the surfaceby adopting conformal mapping with both local andnon-local constraints in a robust estimation frame-work. In the future we will address the correctionof photometric inconsistencies in the document imagecaused by shading under scene illumination.

REFERENCES[1] M. Pilu, “Undoing paper curl distortion using applicable sur-

faces,” in CVPR. IEEE, 2001.[2] G. Meng, Y. Wang, S. Qu, S. Xiang, and C. Pan, “Active flatten-

ing of curved document images via two structured beams,” inCVPR. IEEE, 2014.

[3] M. S. Brown and W. B. Seales, “Image restoration of arbitrarilywarped documents,” TPAMI, vol. 26, no. 10, pp. 1295–1306,2004.

[4] T. Wada, H. Ukida, and T. Matsuyama, “Shape from shadingwith interreflections under a proximal light source: Distortion-free copying of an unfolded book,” IJCV, vol. 24, no. 2, pp.125–135, 1997.

[5] L. Zhang, A. M. Yip, M. S. Brown, and C. L. Tan, “A uni-fied framework for document restoration using inpainting andshape-from-shading,” Pattern Recognition, vol. 42, no. 11, pp.2961–2978, 2009.

[6] Y.-C. Tsoi and M. S. Brown, “Multi-view document rectificationusing boundary,” in CVPR. IEEE, 2007.

Page 8: Multiview Rectification of Folded Documents

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

0 5 100

0.2

0.4

0.6

0.8

1

1.007 274.5 N/A N/A N/A N/A

1.019 22.50 1187 N/A N/A N/A

1.029 2.497 21.05 265.0 1230 3823

1.034 1.333 2.783 8.738 29.50 85.32

1.007 1.042 1.167 1.371 1.635 2.070

1.003 1.016 1.062 1.133 1.231 1.342

1.001 1.005 1.017 1.035 1.061 1.096

1.000 1.001 1.004 1.009 1.015 1.023

PointsZhang et al.

1.030 1.031 1.036 1.034 1.028 1.029

1.007 1.017 1.020 1.015 1.017 1.004

1.011 1.012 1.012 1.014 1.013 1.012

1.013 1.014 1.015 1.015 1.014 1.017

Noise Level2

K0 1 2 3 4

9K

36

K1

50

K

Proposed Brown et al.

1

1.1

1.2

Noise Level

2K

9K

36K

150K

0 1 2 3

Proposed

Error

Frequency

Frequency

Frequency

Frequency

Error

Error

Error

(a) Global distortion evaluation metric (b) Local distortion evaluation metric

Points

1

1.1

1.2

Noise Level

2K

9K

36K

150K

0 1 2 3

Zhang et al.

Points

1

1.1

1.2

Noise Level

2K

9K

36K

150K

0 1 2 3

Brown et al.

Po

ints

Fig. 9. Comparison of the global distortion metric between our method (top) and Zhang et al. [30] and Brown et al.[29]with varying point density and noise. Here lower values indicate higher accuracy. (b) Frequency distribution of local distortionmetrics for the associated experiments. Our method is more accurate when input point are sparser or have more noise.

[7] H. I. Koo, J. Kim, and N. I. Cho, “Composition of a dewarpedand enhanced document image from two view images,” TIP,vol. 18, no. 7, pp. 1551–1562, 2009.

[8] N. Stamatopoulos, B. Gatos, I. Pratikakis, and S. J. Peran-tonis, “Goal-oriented rectification of camera-based documentimages,” TIP, vol. 20, no. 4, pp. 910–920, 2011.

[9] Z. Zhang, X. Liang, and Y. Ma, “Unwrapping low-rank textureson generalized cylindrical surfaces,” in ICCV. IEEE, 2011, pp.1347–1354.

[10] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surfacereconstruction,” in Fourth Eurographics symposium on Geometryprocessing, 2006.

[11] B. Levy, S. Petitjean, N. Ray, and J. Maillot, “Least squaresconformal maps for automatic texture atlas generation,” inACM TOG, vol. 21. ACM, 2002, pp. 362–371.

[12] Z. Zhang, C. Lim, and L. Fan, “Estimation of 3d shape ofwarped document surface for image restoration,” in ICPR, 2004.

[13] C. L. Tan, L. Zhang, Z. Zhang, and T. Xia, “Restoring warpeddocument images through 3d shape modeling,” TPAMI, vol. 28,no. 2, pp. 195–208, 2006.

[14] F. Courteille, A. Crouzil, J.-D. Durou, and P. Gurdjos, “Shapefrom shading for the digitization of curved documents,” Ma-chine Vision and Applications, vol. 18, no. 5, pp. 301–316, 2007.

[15] Y.-C. Tsoi and M. S. Brown, “Geometric and shading correctionfor images of printed materials: a unified approach usingboundary,” in CVPR. IEEE Computer Society, 2004.

[16] A. Yamashita, A. Kawarago, T. Kaneko, and K. T. Miura,“Shape reconstruction and image restoration for non-flat sur-faces of documents with a stereo vision system,” in ICPR. IEEE,2004.

[17] H. Cao, X. Ding, and C. Liu, “A cylindrical surface model torectify the bound document image,” in ICCV, 2003.

[18] H. Ezaki, S. Uchida, A. Asano, and H. Sakoe, “Dewarping ofdocument image by global optimization,” in ICDAR. IEEE,2005.

[19] A. Ulges, C. H. Lampert, and T. M. Breuel, “Document imagedewarping using robust estimation of curled text lines,” inICDAR. IEEE, 2005.

[20] S. Lu, B. M. Chen, and C. C. Ko, “A partition approach for therestoration of camera images of planar and curled document,”Image and Vision Computing, vol. 24, no. 8, pp. 837–848, 2006.

[21] B. Fu, M. Wu, R. Li, W. Li, Z. Xu, and C. Yang, “A model-basedbook dewarping method using text line detection,” in 2nd Int.Workshop on Camera Based Document Analysis and Recognitionl,2007.

[22] G. Meng, C. Pan, S. Xiang, J. Duan, and N. Zheng, “Metricrectification of curved document images,” TPAMI, vol. 34, no. 4,pp. 707–722, 2012.

[23] C. Liu, Y. Zhang, B. Wang, and X. Ding, “Restoring camera-captured distorted document images,” IJDAR, vol. 18, no. 2, pp.111–124, 2014.

[24] B. S. Kim, H. I. Koo, and N. I. Cho, “Document dewarpingvia text-line based optimization,” Pattern Recognition, 2015.

[25] D. Salvi, K. Zheng, Y. Zhou, and S. Wang, “Distance transformbased active contour approach for document image rectifica-tion,” in WACV. IEEE, 2015.

[26] J. Liang, D. DeMenthon, and D. Doermann, “Geometric recti-fication of camera-captured document images,” TPAMI, vol. 30,no. 4, pp. 591–605, 2008.

[27] Y. Tian and S. G. Narasimhan, “Rectification and 3d recon-struction of curved document images,” in CVPR. IEEE, 2011.

[28] M. Perriollat and A. Bartoli, “A computational model ofbounded developable surfaces with application to image-basedthree-dimensional reconstruction,” Computer Animation and Vir-tual Worlds, vol. 24, no. 5, pp. 459–476, 2013.

[29] M. S. Brown, M. Sun, R. Yang, L. Yun, and W. B. Seales,“Restoring 2d content from distorted documents,” TPAMI,vol. 29, no. 11, pp. 1904–1916, 2007.

[30] L. Zhang, Y. Zhang, and C. L. Tan, “An improved physically-based method for geometric restoration of distorted documentimages,” TPAMI, vol. 30, no. 4, pp. 728–734, 2008.

[31] H. Avron, A. Sharf, C. Greif, and D. Cohen-Or, “ 1-sparsereconstruction of sharp point set surfaces,” ACM TOG, 2010.

[32] A. C. Oztireli, G. Guennebaud, and M. Gross, “Feature pre-serving point set surfaces based on non-linear kernel regres-sion,” in Computer Graphics Forum, vol. 28, no. 2. Wiley OnlineLibrary, 2009, pp. 493–501.

[33] R. Tibshirani, “Regression shrinkage and selection via thelasso,” Journal of the Royal Statistical Society. Series B (Method-ological), pp. 267–288, 1996.

[34] E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsityby reweighted `1 minimization,” Journal of Fourier analysis andapplications, vol. 14, no. 5-6, pp. 877–905, 2008.

[35] E. Portnoy, “Developable surfaces in hyperbolic space,” PacificJournal of Mathematics, vol. 57, no. 1, pp. 281–288, 1975.

[36] R. I. Hartley and A. Zisserman, Multiple View Geometry inComputer Vision, 2nd ed. Cambridge University Press, ISBN:0521540518, 2004.

[37] D. G. Lowe, “Distinctive image features from scale-invariantkeypoints,” IJCV, vol. 60, no. 2, pp. 91–110, 2004.

[38] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, “Siftflow: Dense correspondence across different scenes,” in ECCV,2008.