Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS...

12
Learning to Clean: A GAN Perspective Monika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India {monika.sharma1 | verma.abhishek7 | lovekesh.vig }@tcs.com Abstract. In the big data era, the impetus to digitize the vast reservoirs of data trapped in unstructured scanned documents such as invoices, bank documents, courier receipts and contracts has gained fresh momentum. The scanning process often results in the introduction of artifacts such as salt-and-pepper / background noise, blur due to camera motion or shake, watermarkings, coffee stains, wrin- kles, or faded text. These artifacts pose many readability challenges to current text recognition algorithms and significantly degrade their performance. Existing learning based denoising techniques require a dataset comprising of noisy doc- uments paired with cleaned versions of the same document. In such scenarios, a model can be trained to generate clean documents from noisy versions. How- ever, very often in the real world such a paired dataset is not available, and all we have for training our denoising model are unpaired sets of noisy and clean images. This paper explores the use of Generative Adversarial Networks (GAN) to generate denoised versions of the noisy documents. In particular, where paired information is available, we formulate the problem as an image-to-image transla- tion task i.e, translating a document from noisy domain ( i.e., background noise, blurred, faded, watermarked ) to a target clean document using Generative Adver- sarial Networks (GAN). However, in the absence of paired images for training, we employed CycleGAN which is known to learn a mapping between the distribu- tions of the noisy images to the denoised images using unpaired data to achieve image-to-image translation for cleaning the noisy documents. We compare the performance of CycleGAN for document cleaning tasks using unpaired images with a Conditional GAN trained on paired data from the same dataset. Experi- ments were performed on a public document dataset on which different types of noise were artificially induced, results demonstrate that CycleGAN learns a more robust mapping from the space of noisy to clean documents. Keywords: Document Cleaning Suite · CycleGAN · Unpaired Data · Deblurring · Denoising · Defading · Watermark Removal. 1 Introduction The advent of industry 4.0 calls for the digitization of every aspect of industry, which includes automation of business processes, business analytics and phasing out of man- ually driven processes. While business processes have evolved to store large volumes of scanned digital copies of paper documents, however for many such documents the information stored needs to be extracted via text recognition techniques. While captur- ing these images via camera or scanner, artifacts tend to creep into the images such as arXiv:1901.11382v1 [cs.CV] 28 Jan 2019

Transcript of Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS...

Page 1: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Learning to Clean: A GAN Perspective

Monika Sharma, Abhishek Verma, and Lovekesh Vig

TCS Research, New Delhi, India{monika.sharma1 | verma.abhishek7 | lovekesh.vig }@tcs.com

Abstract. In the big data era, the impetus to digitize the vast reservoirs of datatrapped in unstructured scanned documents such as invoices, bank documents,courier receipts and contracts has gained fresh momentum. The scanning processoften results in the introduction of artifacts such as salt-and-pepper / backgroundnoise, blur due to camera motion or shake, watermarkings, coffee stains, wrin-kles, or faded text. These artifacts pose many readability challenges to currenttext recognition algorithms and significantly degrade their performance. Existinglearning based denoising techniques require a dataset comprising of noisy doc-uments paired with cleaned versions of the same document. In such scenarios,a model can be trained to generate clean documents from noisy versions. How-ever, very often in the real world such a paired dataset is not available, and allwe have for training our denoising model are unpaired sets of noisy and cleanimages. This paper explores the use of Generative Adversarial Networks (GAN)to generate denoised versions of the noisy documents. In particular, where pairedinformation is available, we formulate the problem as an image-to-image transla-tion task i.e, translating a document from noisy domain ( i.e., background noise,blurred, faded, watermarked ) to a target clean document using Generative Adver-sarial Networks (GAN). However, in the absence of paired images for training, weemployed CycleGAN which is known to learn a mapping between the distribu-tions of the noisy images to the denoised images using unpaired data to achieveimage-to-image translation for cleaning the noisy documents. We compare theperformance of CycleGAN for document cleaning tasks using unpaired imageswith a Conditional GAN trained on paired data from the same dataset. Experi-ments were performed on a public document dataset on which different types ofnoise were artificially induced, results demonstrate that CycleGAN learns a morerobust mapping from the space of noisy to clean documents.

Keywords: Document Cleaning Suite · CycleGAN ·Unpaired Data ·Deblurring· Denoising · Defading ·Watermark Removal.

1 Introduction

The advent of industry 4.0 calls for the digitization of every aspect of industry, whichincludes automation of business processes, business analytics and phasing out of man-ually driven processes. While business processes have evolved to store large volumesof scanned digital copies of paper documents, however for many such documents theinformation stored needs to be extracted via text recognition techniques. While captur-ing these images via camera or scanner, artifacts tend to creep into the images such as

arX

iv:1

901.

1138

2v1

[cs

.CV

] 2

8 Ja

n 20

19

Page 2: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

2 M. Sharma et al.

background noise, blurred and faded text. In some scenarios, companies insert a water-mark in the documents which poses readability issues after scanning. Text recognitionengines often suffer due to the low quality of scanned documents and are not able toread the documents properly and hence, fail to correctly digitize the information presentin the documents. In this paper, we attempt to perform denoising of the documents be-fore the document is being sent to text recognition network for reading and propose adocument cleaning suite based on generative adversarial training. This suite is trainedfor background noise removal, deblurring, watermark removal and defading and learnsa mapping from the distribution of noisy documents to the distribution of clean docu-ments.

Background noise removal is the process of removing the background noise, suchas uneven contrast, see through effects, interfering strokes, and background spots onthe documents. The background noise presents a problem to the performance of OCRas it is difficult to differentiate the text and background [3], [5], [14], [9]. De-blurring isthe process of removal of blur from an image. Blur is defined as distortion in the imagedue to various factors such as shaking of camera, improper focus of camera etc. whichdecreases the readability of the text in the document image and hence, deteriorates theperformance of OCR. Recent works for deblurring have focused on estimating blur ker-nels using techniques such as GAN [6], CNN [10], dictionary-based prior [12], sparsity-inducing prior [15] and hybrid non-convex regularizer [24]. Watermark removal aimsat removing the watermark from an image while preserving the text in the image. Wa-termarks are low-intensity images printed on photographs and books in order to preventcopying of the material. But this watermark post scanning creates hinderance in read-ing the text of interest from documents. Inpainting [20] [23] techniques are used in theliterature to recover the original image after detecting watermarks statistically. Defad-ing is the process of recovering text that has lightened / faded over time, which usuallyhappens in old books and documents. This is also detrimental to the OCR performance.To remove all these artifacts that degrade the quality of documents and create hindrancein readability, we formulate the document cleaning process as an image-to-image trans-lation task at which Generative Adversarial Networks (GANs) [6] are known to giveexcellent performance.

However, with the limited availability of paired data i.e., noisy and correspondingcleaned documents, we proposed to train CycleGAN [26] for unpaired datasets of noisydocuments. We train CycleGAN for denoising / background noise removal, deblurring,watermark removal and defading tasks. CycleGAN eliminates the need for one-to-onemapping between images of source and target domains by a two-step transformationof source image i.e., first source image is mapped to an image in target domain andthen back to source again. We evaluate the performance of our document cleaningsuite on synthetic and publicly available datasets and compare them against state-of-the-art methods. We use Kaggle’s document dataset for denoising / background noiseremoval [4], the BMVC document deblurring dataset [7] which are publicly availableonline. There does not exist any document dataset for watermark removal and defadingonline. Therefore, we have synthetically generated document datasets for watermarkremoval and defading tasks, and have also made these public for the benefit of researchcommunity. Overall, our contributions in this paper are as follows :

Page 3: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Document Cleaning Suite 3

– We proposed a Document Cleaning Suite which is capable of cleaning documentsvia denoising / background noise removal, deblurring, watermark removal and de-fading for improving readability.

– We proposed the application of CycleGAN [26] for translating a document from anoisy document distribution (e.g. with background noise, blurred, watermarked andfaded) to a clean document distribution in the situations where there is shortage ofpaired dataset.

– We synthetically created a document dataset for watermark removal and defadingby inserting logos as watermarks and applying fading techniques on Google Newsdataset [1] of documents, respectively.

– We evaluate CycleGAN for background noise removal, deblurring, watermark re-moval and defading on publicly available kaggle document dataset [4], BMVC de-blurring document dataset [7] and synthetically created watermarked and defadingdocument datasets, respectively.

The remaining parts of the paper are organized as follows. Section 2 reviews therelated work. Section 3 introduces CycleGAN and explains its architecture. Section 4provides details of datasets, training, evaluation metric used and also discusses experi-mental results and comparisons to evaluate the effectiveness and superiority of Cycle-GAN for cleaning the noisy documents. Section 5 concludes the paper.

2 Related Work

Generative adversarial Network (GAN) [6] is the idea that has taken deep learning bystorm. It employs adversarial training which essentially means pitting two neural net-works against each other. One is a generator while the other is a discriminator, wherethe former aims at producing data that are indistinguishable from real data while thelatter tries to distinguish between real and fake data. The process eventually yieldsa generator with the ability to do a plethora of tasks efficiently such as image-to-image generation. Other notable applications where GANs have established their su-permacy are representation learning, image editing, art generation, music generationetc. [2] [19] [22] [13] [21].

Image-to-image translation is the task of mapping images in source domain to im-ages in target domain such as converting sketches into photographs, grayscale imagesto color images etc. The aim is to generate the target distribution given the source distri-bution. Prior work in the field of GANs such as Conditional GAN [17] forces the imageproduced by generator to be conditioned on the output which allows for optimal trans-lations. However, earlier GANs require one-to-one mapping of images between sourceand target domain i.e., a paired dataset. In case of documents, it is not possible to al-ways have cleaned documents corresponding to each noisy document. This persuadedus to explore unpaired image-to-image translation methods, e.g. Dual-GAN [25] whichuses dual learning and CycleGAN [26] which makes use of cyclic-consistency loss toachieve unpaired image-to-image translation.

In this paper, we propose to apply CycleGAN for document cleaning task. It has twopairs of generators and discriminators. One pair focuses on converting source domainto target domain while the other pair focuses on converting target domain to source

Page 4: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

4 M. Sharma et al.

domain. This bi-directional conversion process allows for a cyclic consistency loss forCycleGAN which ensures the effective conversion of an image from source to target andthen back to source again. The transitivity property of cyclic-consistency loss allowsCycleGAN to perform well on unpaired image-to-image translation.

Existing methods for removing background noise from document images consist ofbinarization and thresholding techniques, fuzzy logic, histogram, morphology and ge-netic algorithm based methods [3] [5]. An automatic method for color noise estimationfrom a single image using Noise Level Function (NLF) and a Gaussian ConditionalRandom Field (GCRF) based removal technique was proposed in [14] for producinga clean image from noisy input. Sobia et al. [9] employs a technique for removingbackground and punch-hole noise from handwritten Urdu text. We observed that deeplearning has not been applied in literature for removing noise from document images.

There exists quite a lot of work on deblurring of images. For example, Deblur-GAN [11] uses conditional GANs to deblur images, [18] uses a multi-scale CNN tocreate an end-to-end system for deblurring. Ljubenovic et al. proposed class-adapteddictionary-based prior for the image [16]. There also exists method of sparsity-inducingprior on the blurring filter, which allows for deblurring images containing differentclasses of images such as faces, text etc. [15] when they co-occur in a document. Anon-convex regularization method was developed by Yao et al. [24] which leveragedthe non-convex sparsity constraints on image gradients and blur kernels for improvingthe kernel estimation accuracy. [10] uses a CNN to classify the image into one of thedegradative sub-spaces and the corresponding blur kernel is then used for deblurring.

Very few attempts have been made in past for removing watermarks from images.Authors in [20] proposed to use image inpainting to recover the original image. How-ever, the method developed by Xu et al. [23] detects the watermark using statisticalmethods and subsequently, removes it using image inpainting. To the best of our knowl-edge, we did not find any work on defading of images.

3 CycleGAN

CycleGAN [26] has shown its worth in scenarios where there is paucity of paireddataset, i.e., image in source domain and corresponding image in target domain. Thisproperty of CycleGAN, of working without the need of one-to-one mapping betweeninput domain and target domain and still being able to learn such image-to-image trans-lations, persuades us to use them for document cleaning suite where there is alwayslimited availability of clean documents corresponding to noisy documents. To circum-vent the issue of learning meaningful transformations in case of unpaired dataset, Cy-cleGAN uses cycle-consistency loss which says that if an image is transformed fromsource distribution to target distribution and back again to source distribution, then weshould get samples from source distribution. This loss is incorporated in CycleGAN byusing two generators and two discriminators, as shown in Figure 1. The first genera-tor GB maps the image from noisy domain A (IA) to an output image in target cleandomain B (OB). To make sure that there exists a meaningful relation between IA andOB , they must learn some features which can be used to map back OB to original noisyinput domain. This reverse transformation is carried out by second generator GA which

Page 5: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Document Cleaning Suite 5

Fig. 1. Overview of CycleGAN - It consists of two generators, GA and GB which map noisyimages to clean images and clean to noisy images, respectively using cycle-consistency loss [26].It also contains two discriminators DA and DB which acts as adversary and rejects images gen-erated by generators.

takes as input OB and converts it back into an image CA in noisy domain. Similar pro-cess of transformation is carried out for converting images in clean domain B to noisedomain A as well. It is evident in the Figure 1 that each discriminator takes two inputs- original image in source domain and generated image via a generator. The task ofthe discriminator is to distinguish between them so that discriminator is able to defeatgenerator by rejecting images generated by it. While competing against discriminatorso that it stops rejecting its images, the generator learns to produce images very closeto the original input images.

We use the same network of CycleGAN as proposed in [26]. The generator networkconsists of two convolutional layers of stride 2, several residual blocks, two layers oftransposed convolutions with stride 1. The discriminator network uses 70 × 70 Patch-GANs [8] to classify the 70× 70 overlapping patches of images as real or fake.

4 Experimental Results and discussion

This section is divided into the following subsections: Section 4.1 provides details ofthe datasets used for the document cleaning suite. In Section 4.2, we elaborate on thetraining details utilized to perform our experiments. Next, we give the performance

Page 6: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

6 M. Sharma et al.

evaluation metric in Section 4.3. Subsequently, Section 4.4 discusses the results ob-tained from the experiments we conducted and provides comparison with the baselinemodel i.e., Conditional GAN [17].

Table 1. Performance comparison of Conditional GAN and CycleGAN based on PSNR

PSNR (in dB)Task ConditionalGAN CycleGANBackground removal 27.624 31.774Deblurring 19.195 30.293Watermark removal 29.736 34.404Defading 28.157 34.403

Fig. 2. Plot showing comparison between PSNR of images produced by CycleGAN [26] andConditionalGAN [17] on test-set of deblurring document dataset [7]. The test-set consists of 16sets of 100 documents each, where each set is blurred with one of the 16 different blur kernelsused for creating the training dataset.

4.1 Dataset Details

We used 4 separate document datasets, one each for background noise removal, deblur-ring, watermark removal and defading. Their details are given below :

– Kaggle Document Denoising Dataset : This document denoising dataset hostedby Kaggle [4] consists of noisy documents with noise in various forms such ascoffee stains, faded sun spots, dog-eared pages, and lot of wrinkles etc. We use thisdataset for training and evaluating CycleGAN for removing background noise fromdocument images. We have used a training set of 144 noisy documents to train Cy-cleGAN and tested the trained network on a test dataset of 72 document images.

Page 7: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Document Cleaning Suite 7

– Document Deblurring Dataset : We used artificial deblurring dataset of docu-ments [7] available online for training CycleGANs to deblur the blurred documents.This dataset was created by taking documents from the CiteSeerX repository andwere further processed via various geometric transformations and two types of bluri.e., motion blur and de-focus blur, on them to make the noise look more realistic.We have used only a subset of this dataset by random sampling of 2000 documentsfor training CycleGAN. For evaluation, this deblurring dataset has a test-set whichconsists of 16 sets of 100 documents, with each set blurred with one of the 16 dif-ferent blur kernels used for creating the training dataset.

– Watermark Removal Dataset : As there exists no publicly available dataset forwatermarked document images, we generated our own synthetic watermark re-moval document dataset. To create the dataset, we first obtained text documentsfrom Google News Dataset [1] and approx. 21 logos from the Internet for insertingwatermarks. Then, we pasted the logos on the documents by making logos trans-parent with varying values of alpha channel. We used variations in the position oflogo, size of logo and transparency factor for creating randomness in the water-marked documents and to make them realistic. The training set of 2000 images andtest set of 200 images from this synthetic dataset was used for experimental pur-poses.

– Document Defading Dataset : Similar to watermark removal dataset, we artifi-cially generated faded documents from Google News Dataset [1] by applying var-ious dilation operations on document images. Here again, the train and test setconsisted of 2000 and 200 document images, respectively for training and evaluat-ing the performance of CycleGAN for defading purposes.

4.2 Training Details

We use the same training procedure as adopted for CycleGan in paper [26]. Least-squares loss is used to train the network as this loss is more stable and produces betterquality images. We update the discriminators by using a history of generated imagesrather than the ones produced by latest generator to reduce model oscillations. We useAdam optimizer with learning rate of 0.0002 and momentum of 0.5 for training Cy-cleGAN on noisy images of size 200 × 200. The network is trained for 12, 30, 12 and8 epochs for background noise removal, deblurring, watermark removal and defading,respectively.

For Conditional GAN [17], we use kernel size of 3 × 3 with a stride 1 and zero-padding by 1 for all convolutional and deconvolutional layers of generator network. Incase of discriminator network, the first three convolutional and deconvolutional layerswere composed of kernels of size 4×4 with a stride 2 and zero-padding by 1. However,the last two layers in discriminator network uses kernel of size 4× 4 with stride of size1. The network is trained on input images of size 200 × 200 using Adam Optimizerwith a learning rate of 2 × 10−3. We use 6.6 × 10−3 and 1 as values of weights foradversarial loss and perceptual loss, respectively. The network is trained for 5 epochs

Page 8: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

8 M. Sharma et al.

for each of the document cleaning tasks i.e., background noise removal, deblurring,watermark removal and defading.

4.3 Evaluation Metric

We evaluate the performance of CycleGAN using Peak Signal-to-Noise Ratio (PSNR)1

as an image quality metric. PSNR is defined as ratio of the maximum possible powerof a signal and the power of distorting noise which deteriorates the quality of its repre-sentation. PSNR is usually expressed in terms of Mean-squared error (MSE). Given adenoised image (D) of size m × n and its corresponding noisy image (I) of same size,PSNR is given as follows :

PSNR = 20× log 10(MaxD

MSE) (1)

where MaxD represents the maximum pixel intensity value of image D. Higher thePSNR value, better is the image quality.

Fig. 3. Examples of sample noisy images (upper row) cleaned by CycleGAN and their corre-sponding cleaned images (bottom row) from Kaggle Document Denoising Dataset [4]

4.4 Results

Now, we present the results obtained on document datasets using CycleGAN for doc-ument cleaning purposes. Table 1 gives the comparison of Conditional GAN and Cy-cleGAN for denoising, deblurring, watermark removal and defading tasks. We observethat CycleGAN beats Conditional GAN on all these document cleaning tasks as shownin Table 1. Row 1 of Table 1 gives mean PSNR values of images deblurred using Con-ditional GAN and CycleGAN. CycleGAN obtains higher PSNR value of 31.774 dBas compared to that of Conditional GAN’s PSNR (27.624 dB) on Kaggle Document

1 Peak Signal-to-Noise Ratio: http://www.ni.com/white-paper/13306/en/

Page 9: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Document Cleaning Suite 9

Fig. 4. Results of CycleGAN on deblurring document dataset [7]. Top row shows the blurredimages and bottom row shows their corresponding deblurred images.

Fig. 5. Samples of watermarked images (first row) and their respective cleaned images (secondrow) produced by CycleGAN.

Denoising dataset [4]. Similarly, PSNR value of CycleGAN (19.195 dB) is better thanConditional GAN for deblurring dataset [7]. We have also shown the PSNR comparisonfor deblurring test-set using a plot, as given in Figure 2 which shows the superiority ofCycleGAN over Conditional GAN. Row 3 and 4 gives the PSNR values for watermarkremoval and defading task. Here again, CycleGAN gives better image quality.

We also show some sample examples of clean images produced after the applicationof CycleGAN for all four tasks - background noise removal, deblurring, watermarkremoval and defading, as given in Figures 3, 4, 5, 6, respectively.

5 Conclusion

In this paper, we proposed and developed Document Cleaning Suite which is basedon the application of CycleGAN and is responsible for performing various documentcleaning tasks such as background noise removal, deblurring, watermark removal and

Page 10: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

10 M. Sharma et al.

Fig. 6. Figure showing example images of faded (top row) and corresponding defaded images(bottom row) with recovered text by CycleGAN.

defading. Very often it is difficult to obtain clean images corresponding to a noisy im-age, and simulation of noise for training image-to-image translators does not adequatelygeneralize to the real world. Instead, we trained a model to learn the mapping from aninput distribution to an output distribution of images, while preserving the essence ofthe image. We used CycleGAN because it has been seen to provide good results forsuch domain adaptation scenarios where there is limited availability of paired datasetsi.e., noisy and correspondig cleaned image. We demonstrated the effectiveness of Cy-cleGAN on publicly available and synthetic document datasets, and the results demon-strate that it can clean up a variety of noise effectively.

References

1. 2011, E.: Google news dataset. EMNLP 2011 SIXTH WORKSHOP ON STATIS-TICAL MACHINE TRANSLATION (2011), http://www.statmt.org/wmt11/translation-task.html#download

2. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: In-terpretable representation learning by information maximizing generative adversarial nets.CoRR abs/1606.03657 (2016), http://arxiv.org/abs/1606.03657

3. Farahmand, A., Sarrafzadeh, A., Shanbehzadeh, J.: Document image noises and removalmethods. Proceedings of the International MultiConference of Engineers and ComputerScientists 2013 1 (2013), http://www.iaeng.org/publication/IMECS2013/IMECS2013_pp436-440.pdf

4. Frank, A.: Uci machine learning repository. irvine, ca: University of california, school ofinformation and computer science. http://archive. ics. uci. edu/ml (2010)

5. Ganbold, G.: History document image background noise and removal methods. InternationalJournal of Knowledge Content Development and Technology 5 (2015), http://ijkcdt.net/xml/05531/05531.pdf

6. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,Courville, A., Bengio, Y.: Generative Adversarial Networks. ArXiv e-prints (Jun 2014)

7. Hradis, M., Kotera, J., Zemcık, P., Sroubek, F.: Convolutional neural networks for direct textdeblurring. In: Proceedings of BMVC 2015. The British Machine Vision Association and

Page 11: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

Document Cleaning Suite 11

Society for Pattern Recognition (2015), http://www.fit.vutbr.cz/research/view_pub.php?id=10922

8. Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional ad-versarial networks. CoRR abs/1611.07004 (2016), http://arxiv.org/abs/1611.07004

9. Javed, S.T., Fasihi, M.M., Khan, A., Ashraf, U.: Background and punch-hole noise removalfrom handwritten urdu text. In: 2017 International Multi-topic Conference (INMIC). pp. 1–6(Nov 2017). https://doi.org/10.1109/INMIC.2017.8289451

10. Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document de-blurring. In: 2017 14th IAPR International Conference on Document Analysis and Recogni-tion (ICDAR). vol. 01, pp. 703–707 (Nov 2017). https://doi.org/10.1109/ICDAR.2017.120

11. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motiondeblurring using conditional adversarial networks. CoRR abs/1711.07064 (2017), http://arxiv.org/abs/1711.07064

12. Li, H., Zhang, Y., Zhang, H., Zhu, Y., Sun, J.: Blind image deblurring based on sparse prior ofdictionary pair. In: Proceedings of the 21st International Conference on Pattern Recognition(ICPR2012). pp. 3054–3057 (Nov 2012)

13. Lin, D., Fu, K., Wang, Y., Xu, G., Sun, X.: Marta gans: Unsupervised representation learn-ing for remote sensing image classification. IEEE Geoscience and Remote Sensing Letters14(11), 2092–2096 (Nov 2017). https://doi.org/10.1109/LGRS.2017.2752750

14. Liu, C., Szeliski, R., Kang, S.B., Zitnick, C.L., Freeman, W.T.: Automatic estimation andremoval of noise from a single image. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence (2006), http://people.csail.mit.edu/celiu/denoise/denoise_pami.pdf

15. Liu, R.W., Li, Y., Liu, Y., Duan, J., Xu, T., Liu, J.: Single-image blind deblurring with hy-brid sparsity regularization. In: 2017 20th International Conference on Information Fusion(Fusion). pp. 1–8 (July 2017). https://doi.org/10.23919/ICIF.2017.8009659

16. Ljubenovic, M., Zhuang, L., Figueiredo, M.A.T.: Class-adapted blind deblurringof document images. In: 2017 14th IAPR International Conference on Docu-ment Analysis and Recognition (ICDAR). vol. 01, pp. 721–726 (Nov 2017).https://doi.org/10.1109/ICDAR.2017.123

17. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784(2014), http://arxiv.org/abs/1411.1784

18. Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamicscene deblurring. CoRR abs/1612.02177 (2016), http://arxiv.org/abs/1612.02177

19. Peng, Y., Qi, J., Yuan, Y.: Cm-gans: Cross-modal generative adversarial networks for com-mon representation learning. CoRR abs/1710.05106 (2017), http://arxiv.org/abs/1710.05106

20. Qin, C., He, Z., Yao, H., Cao, F., Gao, L.: Visible watermark removal scheme based onreversible data hiding and image inpainting. Signal Processing: Image Communication60, 160–172 (2018). https://doi.org/https://doi.org/10.1016/j.image.2017.10.003, http://www.sciencedirect.com/science/article/pii/S0923596517301868

21. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convo-lutional generative adversarial networks. CoRR abs/1511.06434 (2015), http://arxiv.org/abs/1511.06434

22. Wang, L., Gao, C., Yang, L., Zhao, Y., Zuo, W., Meng, D.: Pm-gans: Discriminative rep-resentation learning for action recognition using partial-modalities. CoRR abs/1804.06248(2018), http://arxiv.org/abs/1804.06248

Page 12: Learning to Clean: A GAN Perspective - arXivMonika Sharma, Abhishek Verma, and Lovekesh Vig TCS Research, New Delhi, India fmonika.sharma1 jverma.abhishek7 jlovekesh.vig g@tcs.com

12 M. Sharma et al.

23. Xu, C., Lu, Y., Zhou, Y.: An automatic visible watermark removal technique using imageinpainting algorithms. In: 2017 4th International Conference on Systems and Informatics(ICSAI). pp. 1152–1157 (Nov 2017). https://doi.org/10.1109/ICSAI.2017.8248459

24. Yao, Q., Kwok, J.T.: Efficient Learning with a Family of Nonconvex Regularizers by Redis-tributing Nonconvexity. ArXiv e-prints (Jun 2016)

25. Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: Unsupervised dual learning for image-to-image translation. CoRR abs/1704.02510 (2017), http://arxiv.org/abs/1704.02510

26. Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR abs/1703.10593 (2017), http://arxiv.org/abs/1703.10593