Singe Image Rain Removal with Unpaired Information: A...

8
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective Hongyuan Zhu 1 , Vijay Chanderasekh 1 , Liyuan Li 1 Joo-Hwee Lim 1 1 Institute for Infocomm Research, A*STAR, Singapore, {zhuh, vijay, lyli, joo-hwee}@i2r.a-star.edu.sg Abstract Single image rain-streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. Previous works solve this problem using various hand-designed priors or by explicitly mapping synthetic rain to paired clean image in a supervised way. In practice, how- ever,the pre-defined priors are easily violated and the paired training data are hard to collect. To overcome these limi- tations, in this work, we propose RainRemoval-GAN (RR- GAN), the first end-to-end adversarial model that generates realistic rain-free images using only unpaired supervision. Our approach alleviates the paired training constraints by in- troducing a physical-model which explicitly learns a recov- ered images and corresponding rain-streaks from the differen- tiable programming perspective. The proposed network con- sists of a novel multiscale attention memory generator and a novel multiscale deeply supervised discriminator. The multi- scale attention memory generator uses a memory with atten- tion mechanism to capture the latent rain streaks context at different stages to recover the clean images. The deeply su- pervised multiscale discriminator imposes constraints at the recovered output in terms of local details and global appear- ance to the clean image set. Together with the learned rain- streaks, a reconstruction constraint is employed to ensure the appearance consistent with the input image. Experimen- tal results on public benchmark demonstrates our promising performance compared with nine state-of-the-art methods in terms of PSNR, SSIM, visual qualities and running time. Introduction Rain, snow, and fog are common visual artifacts which affects many vision-based applications (Zhu et al. 2017a; 2016a; 2018a; 2015; 2016b), such as drone-based video surveillance and self-driving cars. The performance of many computer vision systems often exhibits significant drop when they are presented with images that contain these ar- tifacts. Hence, it is highly practical and expected to de- velop automatic artifacts removal methods (Li et al. 2017; Zhang et al. 2017c). In this paper, we mainly focus on the problem of rain streak removal from a single image. A rain image I can be modeled as an image composition problem (Luo et al. 2015): I = X + R (1) Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. (a) Input (b) Ground truth (c) Our result Figure 1: A visual illustration of single image deraining. The target is to recover a clean image from the input rain image. Our method produces a recovered image with faithful color and structure without using any paired training data. where X and R denote the desired clean image and the rain streaks, respectively. Single image rain-streak removal aims to estimate the hidden X and R from a given I , which is an unconstrained problem because we need to estimate two unknown variables and there are infinitely solutions if no further regularization are imposed. To make the rain streak removal problem trackable, exist- ing methods can be roughly grouped into two categories: prior-based and data-driven. The prior-based methods es- timate the rain-streak based on various priors or assump- tions. Typical priors include but not limited to sparse-coding prior (Luo et al. 2015), low-rank prior (Chang et al. 2017) and Gaussian prior (Li et al. 2016). Despite remarkable progress achieved by adopting these priors, they are easily violated in practice given the rain-streak does not strictly fol- low the Gaussian or sparse distribution and the background scenes is cluttered and contains complex illuminations. In recent, the interest has shifted to data-driven ap- proaches which utilize labeled data with paired rain im- age and rain streak to learn a deep neural network (Fu et al. 2017; Yang et al. 2017b; Zhang and Patel 2018; Li et al. 2018). Given an input image, the rain streaks or clean image can be regressed with given input rain im- age. More recently, inspired by the huge success of gener- ative adversarial networks (GAN) in image-to-image trans- lation tasks (Goodfellow et al. 2014; Isola et al. 2017), (Zhang et al. 2017b) propose a GAN-based image derain- ing method which employs a generator to map input image to the ground-truth clean image. One major limitation of the method is the necessity of paired training data and it is ex- tremely challenging to collect paired training data given the changing environment. However, collecting paired training data is extremely

Transcript of Singe Image Rain Removal with Unpaired Information: A...

Singe Image Rain Removal with Unpaired Information:A Differentiable Programming Perspective

Hongyuan Zhu1, Vijay Chanderasekh1, Liyuan Li1 Joo-Hwee Lim1

1 Institute for Infocomm Research, A*STAR, Singapore,{zhuh, vijay, lyli, joo-hwee}@i2r.a-star.edu.sg

AbstractSingle image rain-streak removal is an extremely challengingproblem due to the presence of non-uniform rain densitiesin images. Previous works solve this problem using varioushand-designed priors or by explicitly mapping synthetic rainto paired clean image in a supervised way. In practice, how-ever,the pre-defined priors are easily violated and the pairedtraining data are hard to collect. To overcome these limi-tations, in this work, we propose RainRemoval-GAN (RR-GAN), the first end-to-end adversarial model that generatesrealistic rain-free images using only unpaired supervision.Our approach alleviates the paired training constraints by in-troducing a physical-model which explicitly learns a recov-ered images and corresponding rain-streaks from the differen-tiable programming perspective. The proposed network con-sists of a novel multiscale attention memory generator and anovel multiscale deeply supervised discriminator. The multi-scale attention memory generator uses a memory with atten-tion mechanism to capture the latent rain streaks context atdifferent stages to recover the clean images. The deeply su-pervised multiscale discriminator imposes constraints at therecovered output in terms of local details and global appear-ance to the clean image set. Together with the learned rain-streaks, a reconstruction constraint is employed to ensurethe appearance consistent with the input image. Experimen-tal results on public benchmark demonstrates our promisingperformance compared with nine state-of-the-art methods interms of PSNR, SSIM, visual qualities and running time.

IntroductionRain, snow, and fog are common visual artifacts whichaffects many vision-based applications (Zhu et al. 2017a;2016a; 2018a; 2015; 2016b), such as drone-based videosurveillance and self-driving cars. The performance of manycomputer vision systems often exhibits significant dropwhen they are presented with images that contain these ar-tifacts. Hence, it is highly practical and expected to de-velop automatic artifacts removal methods (Li et al. 2017;Zhang et al. 2017c). In this paper, we mainly focus on theproblem of rain streak removal from a single image.

A rain image I can be modeled as an image compositionproblem (Luo et al. 2015):

I = X +R (1)

Copyright c© 2019, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

(a) Input (b) Ground truth (c) Our result

Figure 1: A visual illustration of single image deraining. Thetarget is to recover a clean image from the input rain image.Our method produces a recovered image with faithful colorand structure without using any paired training data.

where X and R denote the desired clean image and the rainstreaks, respectively. Single image rain-streak removal aimsto estimate the hidden X and R from a given I , which isan unconstrained problem because we need to estimate twounknown variables and there are infinitely solutions if nofurther regularization are imposed.

To make the rain streak removal problem trackable, exist-ing methods can be roughly grouped into two categories:prior-based and data-driven. The prior-based methods es-timate the rain-streak based on various priors or assump-tions. Typical priors include but not limited to sparse-codingprior (Luo et al. 2015), low-rank prior (Chang et al. 2017)and Gaussian prior (Li et al. 2016). Despite remarkableprogress achieved by adopting these priors, they are easilyviolated in practice given the rain-streak does not strictly fol-low the Gaussian or sparse distribution and the backgroundscenes is cluttered and contains complex illuminations.

In recent, the interest has shifted to data-driven ap-proaches which utilize labeled data with paired rain im-age and rain streak to learn a deep neural network (Fuet al. 2017; Yang et al. 2017b; Zhang and Patel 2018;Li et al. 2018). Given an input image, the rain streaks orclean image can be regressed with given input rain im-age. More recently, inspired by the huge success of gener-ative adversarial networks (GAN) in image-to-image trans-lation tasks (Goodfellow et al. 2014; Isola et al. 2017),(Zhang et al. 2017b) propose a GAN-based image derain-ing method which employs a generator to map input imageto the ground-truth clean image. One major limitation of themethod is the necessity of paired training data and it is ex-tremely challenging to collect paired training data given thechanging environment.

However, collecting paired training data is extremely

challenging given the changing environment. One popularsolution is simulating rain in controlled environments. How-ever, most existing simulations are usually too simplified todepict the complexity of real-world. Another common solu-tion is using photo editing tools to add streak to clean nat-ural images with a various rain-density levels with differentorientations and scales, but this method limits the scale oftraining data.

In fact, one could observe that it is easier to collect a largenumber of rain/rain-free images from Internet despite theseimages are not in pair. Thus, it is highly excepted to developderain model which can utilize unpaired supervision. Notethat, CycleGAN (Zhu et al. 2017b) recently has becomepopular to learn cross-style image translation by using bidi-rectional constrains with adversarial learning. Although Cy-cleGAN has achieved impressive performance in style trans-lation, it is designed for style translation problem and maynot preserve the appearance consistency in the translated re-sult.

Based on the above observations, we propose a novelsingle image deraining method called RainRemoval-GAN(RR-GAN) which is specifically designed based on rain im-age composition model in Eq.1. The proposed RR-GAN issignificantly different from CycleGAN and its variants instructure and application. More specifically, we propose anovel generator which uses an attention memory to cap-tures the latent rain streaks contexts in a recurrent fashionto recover clean image X . Moreover, we propose a noveldeeply-supervised multiscale discriminator to regularize therecovered image to look as realistic as possible to the rain-free images. Furthermore, these latent factors are compos-ited together as in Eq.1 to reconstruct the original rain im-age to preserve faithful color and structure, thus avoidingthe inefficient CycleGAN’s bi-directional consistency train-ing paradigm.

The major contributions of this work are summarized asfollows:

• To the best of our knowledge, this is one of first works tomarriage CycleGAN for single image deraining, so thatmakes rain-removal training with unpaired data possible.The novel GAN consists of a novel generator and discrim-inator which is specifically designed by incorporating therain image generation model with un-paired training in-formation.

• A novel multi-scale attention memory generator is pro-posed with an attention memory to fuse the contexts fromcoarse-scale and fine-scale densely connected network torecurrently learns the rain-streaks to recover the clean im-age using un-paired training data.

• A novel multiscale deeply-supervised discriminator isproposed to regularize the generated recover image as re-alistic as possible to the target image in terms of both low-level details and high-level structures.

• Extensive experiments on public benchmark demon-strates our method’s efficiency and effectiveness in termsof quantitative and qualitative performance.

Related WorkIn this section, we briefly review several recent related workson single image de-raining and generative adversarial net-works.

Single Image De-rainingImage deraining has been a classic image restoration prob-lem for years. Some early methods (Zhang et al. 2006;Garg and Nayar 2007; Santhaseelan and Asari 2015; Tri-pathi and Mukhopadhyay 2014) exploit photometric con-sistency and temporal dynamics for rain removal. Althoughthese methods achieve promising performance, they are notapplicable to the single image setting.

Unlike video-based methods, prior-based methods havebeen proposed to handle single image deraining problem byassuming the rain-streaks follows certain assumption. Thereare many hand-crafted priors have been proposed, e.g. sparsecoding-based prior (Luo et al. 2015), low-rank prior (Changet al. 2017) and gaussian prior (Li et al. 2018), just to name afew. One major limitation of these methods is that the priorsare easily violated which results in over-/under-estimationof the rain-streaks (Zhang and Patel 2018).

Recently, deep learning has become popular in both high-level and low-level vision tasks (Cai et al. 2016; Ren et al.2016; Yang et al. 2017a), several CNN-based methods havealso been proposed for image de-raining (Fu et al. 2017;Yang et al. 2017b; Zhang and Patel 2018; Li et al. 2018).In these methods, the idea is to learn a mapping betweeninput rainy images and their corresponding rain-streaks us-ing a CNN structure. (Yang et al. 2017b) design a deep di-lated network to joint detect and remove rain streaks. (Li etal. 2018) propose incorporate recurrent neural network tothe dilaeted network of Yang et al. to preserve multi-stagecontexts. (Zhang and Patel 2018) propose to use multi-scaledensely connected network trained with additional rain den-sity classifier to predict the rain streaks. In summary, thesemethods aim to learn a mapping using synthesized rain/cleanimage pairs which are hard to collect in large scale.

Our methods’s generator is similar to (Zhang and Pa-tel 2018) as we combines coarse- and fine-scale networkfor feature extraction, however we introduce a novel at-tention memory network to learn the rain-streaks automati-cally from data in a recurrent fashion. The attention memorynetwork is inspired by recent work in language translation(Gehring et al. 2017) which shows that the performance ofmachine translation models could be significantly improvedby solely using an attention model instead of using addi-tional gate in recurrent networks as in (Li et al. 2018), whichsignificantly improve the inference speed. Moreover, ourgenerator can explicitly learns two disentangled latent pa-rameters which corresponds to clean image and rain streaksfrom data without paired information, hence significantly re-duce the labeling efforts.

Generative Adversarial NetworksRecently generative adversarial networks (Goodfellow et al.2014; Arjovsky et al. 2017; Zhao et al. 2017) have achieved

InputImageAttentionMemory

AttentionMemory

Real/FakeReal/Fake

Real/FakeReal/Fake

Multiscale DeeplySupervisedDiscriminator

Multiscale AttentionMemoryGenerator

Maskn

RecoveredImage

Mask

DenseBlock

…....

Gf Gf

ReconstructedImage

AdversarialRegularization

ConsistencyRegularization

…..

DenseBlock

DenseBlock

…....

DenseBlock

Figure 2: The pipeline of our method. Our RR-GAN consists of a multi-scale attention memory generator and a multiscaledeeply-supervised discriminator. The generator uses un-paired rain and clean images to train an attention memory to aggregatethe latent rain streak context from different stages. The rain image together with the latent rain-streak imasks are used as aninput to the U-Net to generate the recovered image. This recovered image is regularized by a deeply-supervised multiscalediscriminator so that its appearance is as close as possible to the clean images used for training in terms of both low-leveldetails and global-level structures.

significant progress. The basic idea of GANs is transform-ing the white noise (or other specified prior) through a para-metric model to generate candidate samples with the help ofa discriminator and a generator. By optimizing a minimaxtwo-player game, the generator aims to learn the trainingdata distribution, and the discriminator aims to judge thata sample comes from the training data or the generator. Animage can be translated into an output by the conditionalgenerator, which has various applications, such as image su-per resolution (Ledig et al. 2017), text2image (Zhang et al.2017a), image2image (Yi et al. 2017). However, condition-alGAN requires paired training images, whose ground-truthcan be very difficult to get.

This paper is inspired by the recent popular Cycle-GAN (Zhu et al. 2017b) which can translate one image fromone domain to another domain without paired information.However, our work is remarkably different from CycleGAN.In brief, our architecture is specifically designed for imagederaining so that the original image’s color and structurecould be well preserved. To the best of our knowledge, thiscould one of the first works to develop specific single im-age deraining model that incorporates un-paired adversar-ial learning. Our network is remarkably distinct from Cy-cleGAN in following aspects. First, we proposes a novel

attention memory generator using multiscale densely con-nected networks and attention memory in a recurrent fashionto learn discriminative rain features and latent rain streaksto recover the clean image. Furthermore, our network con-tains of a novel deeply-supervised multi-scale discriminatortraining regime, so that the local details to global image ap-pearance is enforced to look realistic to the clean image set.Furthermore, these latent factors are composited together asin Eq.1 to reconstruct the original rain image to preservefaithful color and structure, thus avoiding the inefficient Cy-cleGAN’s bi-directional consistency training paradigm.

Differentiable ProgrammingOur work belongs to the family of differentiable program-ming which treats a program as neural network such that theprogram can be parametrized, automatically differentiatedand optimizable. The first well-known work of differentiableprogramming is the Learned ISTA (LISTA) (Gregor and Le-Cun 2010), which unfolds the popular l1 solver ISTA as asimple RNN such that the number of layers corresponds tothe iteration number and the weight corresponds to dictio-nary. The LISTA’s RNN paradigm have been applied in awide range of tasks, e.g. hashing (Wang et al. 2016a), clas-sification (Wang et al. 2016b), sparse coding (Zhou et al.

2018), image dehazing (Zhu et al. 2018b) and etc.Different from conventional differentiable programming

using RNN with difficult-to-interpret variables, our methodreformulate the image degradation model using a feed-forward convolutional neural network with prior knowledge.Therefore, our formulation is more interpreatable and effi-cient then conventional RNN based DP solver.

The Method

As shown in Fig.2, our RR-GAN consists of two networks,namely, a multiscale attention memory generator (MAMG)and a multiscale deeply-supervised discriminator (MDSD)are specifically designed for single image deraining. In brief,MAMG uses an attention memory to recurrently attend tothe rain-streak regions for the purpose of recovery of theclean image. MDSD employs a deeply supervision to re-cover image from the generator by enforcing it look as re-alistic as possible to the clean image set.

Multiscale Attention Memory Network

The multiscale attention memory network consists of an at-tentive memory network and a U-Net autoencoder with skip-connection. The former uses a multiscale feature extractorGf to extract rain-relevant regions features. To achieve bet-ter performance, Gf combines the complementary coarse-scale and dense-scale context using recently proposed denseconnection (Huang et al. 2017). The fine-scale dense net-work applies densely connected modular with small ker-nels to capture fine-scale structures relevant to rain re-gions with a structure of C(1, 3)-C(3, 3)-C(5, 3)-C(7, 3) ,where C(k1, k2) denotes the convolution with a filter of sizek1 × k1, an output channel number of k2 and a stride of1 with ReLu output (Krizhevsky et al. 2012). Each layerare densely connected with other layers. The coarse-scalebranch applies a similar architecture but with larger kernelsto capture longer-range context, whose structure is of C(7,3)-C(9, 3)-C(11, 3)-C(1, 3). The feature maps Xc and Xf

from coarse-scale and fine-scale networks respectively, areconcatenated and fed through the attention memory networkto learn rain regions.

Visual attention models are adopted to localize relevantregions in an image so that the task-specific features are ob-tained. In this paper, we introduce visual attention to learnwhere the rain-regions should be focused on. As shown inour architecture in Fig. 2, we present an attention memorynetwork which is inspired by recent convolutional sequence-to-sequence learning task in machine translation (Gehring etal. 2017). Our attention memory network sequentially usesattention to replace the recurrent networks and the attentionmap is iteratively refined given the accumulated context. Ateach time stamp t, our attention memory network concate-nates Xc, Xf , Mt−1, and St−1, where Xc is the input fromcoarse-scale network, Xf is from fine-scale network, Mt−1is the rain streak mask at the previous stage t − 1, and St

denotes the state memory in previous stage. Note that, theinitial M0 and S0 are set to zero. The detailed updating rule

is as follows:at = σ(W1[Xc, Xf ,Mt−1, St−1] + b1)

St = at · St−1

Mt = tanh(W2 ∗ St + b2)

(2)

where W1 and b1 are the weight and bias of a convolutionallayer with 3 × 3 kernels of one neuron, σ denotes the sig-moid function σ(x) = 1

1+exp(−x) . W2 and b2 are the weightand bias of a deconvolutional layer with 3 × 3 kernels ofone neuron, which are used to output the rain-streak mask.In Fig.2, one can observe that the rain mask is recurrentlyimproved by using the contexts from the previous stages.The input image together the final attention map are con-catenated to feed into the U-Net auto-encoder for generat-ing the rain-free image. Our deep autoencoder includes 16convolutional (relu) blocks, which adopts skip connectionsto prevent blurred outputs. To show the effectiveness of ourmethod, Fig. 3 shows the visual results of different gener-ators, the one which combines attention memory and skipU-Net auto-encoder yielded best visual result.

(b)(a)

(c) (d)Figure 3: A visual comparison on the effectiveness of dif-ferent generators. (a) Input; (b) the result of using contextauto-encoder alone; (c) the result of our generator withoutusing attention memory; (d) the result of our multiscale at-tention memory generator.

Multiscale Deeply-Supervised DiscriminatorThe function of discriminator is to regularize the output fromthe generator so that the generated image looks as realistic aspossible to the target set clean images. Recently, a shallowly-supervised discriminator with discriminator with supervi-sion at the last layer has become popular in recent populargan architecrtures (e.g CycleGAN (Zhu et al. 2017c) andConditionalGAN (Isola et al. 2017)) which achieves goodimage translation performance. However, such shallowly-supervised discriminator overlooks the low-level details andglobal structures which are useful for image enhancementtasks.

To overcome the above disadvantage, we propose a noveldeeply-supervised discriminator which consists of four con-volutional layers, which is with the structure of C(3, 64)-C(3, 128)-C(3, 256)-C(3, 512), where each convolutional

layer has a stride of two. Each convolutional feature mapwill be passed through an instance normalization layer withLeaky-ReLu activations and then be fed into the next con-volutional layer. Moreover, each side-output follows by anadditional convolution layer of C(1, 1) with sigmoid to out-put the probability of each patch to the clean images. Hencewe will have four predictions which provide multiple levelsupervision to provide regularization at to make the gener-ated image as realistic as the ground-truth image in terms oflow-level details and high-level structures.

Objective FunctionTo stabilize the training of discriminator D, we minimize anobjective function Ld that consists of two terms LDreal

andLDfake

as in (Mao et al. 2017):

Ld(G,D) =LDreal+ LDfake

=1

2K

K∑i=1

{Ey(1−Di(x, y))2+

Ex(Di(G(x))2}

(3)

which aims to train the discriminator Di so that it can try todifferentiate the fake examples G(x) from generator G andthe real clean image y, where Ex is the expectation of x andDi∈{1,2,...,K} is the classifier at ith level side-output.

On the other hand, when we train the generator G, weoptimize an objective function Lg which consists of a MSEloss Lr (Ren et al. 2016) and an adversarial learning lossLadv:

Lg = Lr + γLadv (4)

These three terms are designed for minimizing the recon-struction error and improving the perceptual quality, re-spectively. γ is a trade-off factors set according to cross-validation.

The MSE Loss Lm encourages the network to enforcethe appearance consistency between the recovered imageX and input image Il using the estimated rain mask R byminimizing the discrepancy between the composite imageIt = X +R and the input image Il:

Lr =1

C ×W ×H

W∑i=1

H∑j=1

C∑c=1

‖Ii,j,ct − Ii,j,cl ‖2 (5)

TheW ,H , andC are the width, height, and channel numberof the input image It.

The Adversarial Loss Ladv encourages the generator Gto recover image G(x) as realistic as the ground-truth imagey by assigning a real label 1 to recovered example G(x):

Ladv(G,D) =1

K

K∑i=1

Ex(1−Di(G(x)))2 (6)

where K is the layer number of discriminator, Di is i-thlevel classifier to classify whether an image is from the cleanimage or generator G(x). The term penalizes the details ofrecovered image that looks different from clean images.

Training DetailsDuring training, a 286 × 286 image is randomly croppedfrom the input image (or its horizontal flip) of size 256256.Adam is used as optimization algorithm with a mini-batchsize of 1. The learning rate starts from 0.001. The modelsare trained for up to 10 epochs to ensure convergence. Weuse a weight decay of 0.0001 and a momentum of 0.9. Theentire network is trained using the Pytorch framework. Dur-ing training, we set γ = 1. All the parameters are definedvia cross-validation using the validation set.

ExperimentsWe evaluate our method on public benchmark. We quan-titatively evaluate the rain-removal performance using twocommonly used metrics, including peak signal to noise ratio(PSNR) (Huynh-Thu and Ghanbari 2008) and structure sim-ilarity index (SSIM) (Wang et al. 2004) for evaluation. Wealso provide ablation study of each component proposed todemonstrate their effectiveness.

We use Rain800 (Zhang and Patel 2018) for benchmark-ing. The Rain800 dataset contains 700 synthesized imagesfor training and 100 images for testing using randomly sam-pled outdoor images.

Comparing Methods We compare our proposed ap-proach with 7 state-of-the-art methods, including image de-composition (ID) (Kang et al. 2012), discriminative sparsecoding (DSC) (Luo et al. 2015), layer priors and (LP) (Li etal. 2016), DetailsNet (Fu et al. 2017), and joint rain detec-tion and removal (JORDER) (Yang et al. 2017b) and RES-CAN (Li et al. 2018).

Results on Synthetic Datasets Table 1 shows quantita-tive results on the Rain800. One can observe that our RR-GAN considerably achieves promising results in terms ofboth PSNR and SSIM on Rain800.

From Table 1, data-driven methods, especially the deeplearning based methods (Fu et al. 2017; Yang et al. 2017b;Zhang and Patel 2018; Li et al. 2018) significantly outper-form the prior-based methods (Kang et al. 2012; Luo et al.2015; Li et al. 2016). The comparison demonstrates that thefeatures learned by the deep neural networks is much helpfulfor the rain-removal task.

On the one hand, the proposed RR-GAN is trained withunpaired training data, which outperform the JORDERmethod on the Rain800 dataset. Note that, JORDER istrained using the paired supervised data. In other words, itutilizes a stronger supervisor than our method does. The re-sult demonstrates that it is feasible and promising to developderaining model using unpaired data which could be easilyaccessed in practice.

In terms of running time, our method is also much fasterthan other deep learning methods and achieve a running timeof 0.03s per image, which demonstrates our method’s real-time performance.

Analysis of RR-GAN To demonstrate the effectiveness ofour generator used in RR-GAN, we compare it with other

DID-MDN Ground-TruthOurInput RainCNN DetailNet

Figure 4: Qualitative comparison with the state-of-the-arts on two randomly sampled Rain800.

Dataset R800 Time (s)Measure PSNR SSIM

ID 18.88 0.5832 120sDSC 18.56 0.5996 189.3sLP 20.46 0.7297 674.8s

DetailNet 21.16 0.7320 0.3sJORDER 22.24 0.7763 1.5sRESCAN 24.09 0.8410 0.4sRR-GAN 23.51 0.7566 0.03s

Table 1: Quantitative experiments on Rain800. Best resultsare marked in bold and the second best results are under-lined.

network architectures. The first baseline is dense connec-tion. To be specific, we replace the dense block with resid-ual block (ResNet) and fix other blocks in our model. Onecould find that the performance was dropped significantlyfrom 23.51/0.7566 (PSNR/SSIM) to 22.82/0.6991.

Besides the ablation study in generator, we also test theeffectiveness of our deeply supervised multiscale discrim-inator. To the end, we adopt a new discriminator whichonly enforces the supervision at the last layer (shallowly-supervised discriminator). One could see that the perfor-mance is dropped to 21.43/0.7254.

Component Baseline PSNR SSIMGenerator ResNet 22.82 0.6991

Discriminator Shallow-supervision 21.43 0.7254Our Method RR-GAN 23.51 0.7566

Table 2: Ablation study on RR-GAN.

Analysis of Training Paradigm We analyze how thepresence of corresponding rain/rain-free scenes in the train-ing samples can aeffect the model training. We first ran-domly split the Rain100H dataset into two halves (split 1 and2). We use different combinations of the images for modeltraining, and then test the model on the rain images in split2. Specifically, we train under 2 settings: the rain and cleanimage are paired from split1 (setting 1), the clean images arerandomly drawn from split 1 (setting 2), whose testing per-formance on split 2’s rain images can be observed in Table..

From the table, one can observe that the best derainingresults can be obtain when both corresponding rain and rain-free images are used for training. However, other settingscan still achieve comparable perfor- mance. This implies thatit is not necessary to have paired rain and rain-free scenesduring training (see setting 2).

Setting Pair(P)/UnPair(U) PSNR SSIM

1 P 23.79 0.79512 U 23.51 0.7566

Analysis of Loss Function To better demonstrate the ef-fectiveness of our objective function, we conduct an abla-tion study by considering the combinations of the proposedMSE loss Lr and the adversarial loss Ladv . Figure 5 and Ta-ble 3 demonstrate qualitative and quantitative results on ansample image, respectively. One can observe that by usingMSE loss alone without regularization from the adversarialloss, the quality of the recover image is very poor due to thatthere is not sufficient infomation to allow generator to learnwhat makes the clean image. Using adversarial loss alonecan significantly improve the visual quality as the informa-tion from clean image set help the generator lean use cuesto recover clean image, however the tone recovered image isbit yellowish as there is no consistency constraint betweenthe recover image and the input image. By combining thesetwo terms together, our method can produce recovered im-ages with realistic color and structures.

(b)Lr

(d)Lr+Ladv(c)Ladv (e)GroundTruth

(a)Input

Figure 5: Qualitative studies on different loss.

Table 3: Quantitative studies on different losses.Metrics Lr Ladv Lr + Ladv

PSNR 17.08 21.33 23.51SSIM 0.3760 0.7328 0.7566

ConclusionThis paper proposed a novel RR-GAN for end-to-end singleimage deraining without using paired training data. The pro-posed method includes a novel multiscale attention memorynetwork and a novel deeply supervied multiscale discrim-inator. The attention memory network uses a state mem-ory to learn a latent rain-streak mask in a recurrent fash-ion to aggreate the rain context information from diffrentstages. Together with the input image, the generator will tryto generate the recovered image by iusing un-paired rain andclean images training data. The proposed deeply supervisedmultiscale discriminator can effectively regularize the out-put from the generator to look as realistic as possible to theclean image sets in terms of local details and global struc-tures. Extensive experiments have demonstrated the promis-ing performance of our method in terms of PSNR, SSIM,running time and visual quality.

AcknowledgementThis was funded by the Deep learning 2.0. program atI2R, A*STAR. The work of X. Peng was supported by theFundamental Research Funds for the Central Universitiesunder Grant YJ201748, by NFSC under Grant 61806135,61432012 and Grant U1435213, and by the Fund of SichuanUniversityTomorrow Advancing Life.

ReferencesMartin Arjovsky, Soumith Chintala, and Leon Bottou.Wasserstein GAN. In International Conference on MachineLearning, Jan. 2017.B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet:An end-to-end system for single image haze removal. IEEETransactions on Image Processing, 25(11):5187–5198, Nov.2016.Yi Chang, Luxin Yan, and Sheng Zhong. Transformed low-rank model for line pattern noise removal. In ICCV, 2017.Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, XinghaoDing, and John Paisley. Removing rain from single imagesvia a deep detail network. In CVPR, 2017.Kshitiz Garg and Shree K. Nayar. Vision and rain. IJCV,75(1):3–27, 2007.Jonas Gehring, Michael Auli, David Grangier, Denis Yarats,and Yann N. Dauphin. Convolutional sequence to sequencelearning. In ICML, 2017.Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville,and Yoshua Bengio. Generative Adversarial Nets. In Ad-vances in Neural Information Processing Systems, pages2672–2680, Montreal, Canada, Dec. 2014.

Karol Gregor and Yann LeCun. Learning fast approxima-tions of sparse coding. In ICML, pages 399–406, USA,2010.Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil-ian Q. Weinberger. Densely connected convolutional net-works. In CVPR, 2017.Q. Huynh-Thu and M. Ghanbari. Scope of validity ofpsnr in image/video quality assessment. Electronics Letters,44(13):800–801, June 2008.Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A.Efros. Image-to-image translation with conditional adver-sarial networks. In CVPR, pages 5967–5976, 2017.Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. Automaticsingle-image-based rain streaks removal via image decom-position. IEEE TIP, 21(4):1742–1755, 2012.Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.Imagenet classification with deep convolutional neural net-works. In NIPS, pages 1106–1114, 2012.C. Ledig, L. Theis, F. Huszr, J. Caballero, A. Cunning-ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, andW. Shi. Photo-realistic single image super-resolution usinga generative adversarial network. In IEEE Conference onComputer Vision and Pattern Recognition, pages 105–114,Honolulu, HI, Jul. 2017.Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, andMichael S. Brown. Rain streak removal using layer priors.In CVPR, 2016.Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, andDan Feng. An all-in-one network for dehazing and beyond.CoRR, abs/1707.06543, 2017.Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and HongbinZha. Recurrent squeeze-and-excitation context aggregationnet for single image deraining. In ECCV, 2018.Yu Luo, Yong Xu, and Hui Ji. Removing rain from a singleimage via discriminative sparse coding. In ICCV, 2015.Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau,Zhen Wang, and Stephen Paul Smolley. Least squares gen-erative adversarial networks. In ICCV, 2017.Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao,and Ming-Hsuan Yang. Single image dehazing via multi-scale convolutional neural networks. In Bastian Leibe, JiriMatas, Nicu Sebe, and Max Welling, editors, ECCV, 2016.Varun Santhaseelan and Vijayan K. Asari. Utilizing lo-cal phase information to remove rain from video. IJCV,112(1):71–89, 2015.Abhishek Kumar Tripathi and Sudipta Mukhopadhyay. Re-moval of rain from videos: a review. Signal, Image andVideo Processing, 8(8):1421–1430, 2014.Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-celli. Image quality assessment: from error visibility tostructural similarity. IEEE Transactions on Image Process-ing, 13(4):600–612, April 2004.Zhangyang Wang, Qing Ling, and Thomas S. Huang. Learn-ing deep l0 encoders. In AAAI, pages 2194–2200, 2016.

Zhangyang Wang, Yingzhen Yang, Shiyu Chang, Qing Ling,and Thomas S. Huang. Learning A deep l∞ encoder forhashing. In IJCAI, pages 2174–2180, 2016.Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, JimmyRen, and Yu-Wing Tai. Image dehazing using bilinear com-position loss function. arXiv preprint arXiv:1710.00279,2017.Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu,Zongming Guo, and Shuicheng Yan. Deep joint rain de-tection and removal from a single image. In CVPR, 2017.Zili Yi, Hao Zhang, and Ping Tan Minglun Gong. DualGAN:Unsupervised Dual Learning for Image-to-Image Transla-tion. In IEEE International Conference on Computer Vision,pages 502–510, Venice, Italy, October 2017.He Zhang and Vishal M. Patel. Density-aware single imagede-raining using a multi-stream dense network. In CVPR,2018.Xiaopeng Zhang, Hao Li, Yingyi Qi, Wee Kheng Leow, andTeck Khim Ng. Rain removal in video by combining tem-poral and chromatic properties. In ICME, 2006.Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, XiaoleiHuang, Xiaogang Wang, and Dimitris N Metaxas. Stack-GAN - Text to Photo-realistic Image Synthesis with StackedGenerative Adversarial Networks. In IEEE InternationalConference on Computer Vision, Venice, Italy, Oct. 2017.He Zhang, Vishwanath Sindagi, and Vishal M. Patel. Imagede-raining using a conditional generative adversarial net-work. CoRR, abs/1701.05957, 2017.Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, andLei Zhang. Beyond a gaussian denoiser: Residual learningof deep CNN for image denoising. IEEE Trans. Image Pro-cessing, 26(7):3142–3155, 2017.Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-based Generative Adversarial Network. In InternationalConference on Learning Representations, Toulon, France,Apr 2017.Joey Tianyi Zhou, Kai Di, Jiawei Du, Xi Peng, Hao Yang,Sinno Jialin Pan, Ivor Tsang, Yong Liu, Zheng Qin, and RickSiow Mong Goh. Sc2net: Sparse lstms for sparse coding. InAAAI, 2018.Hongyuan Zhu, Shijian Lu, Jianfei Cai, and Guangqing Lee.Diagnosing state-of-the-art object proposal methods. In Pro-ceedings of the British Machine Vision Conference 2015,BMVC 2015, Swansea, UK, September 7-10, 2015, pages11.1–11.12, 2015.Hongyuan Zhu, Jiangbo Lu, Jianfei Cai, Jianmin Zheng,Shijian Lu, and Nadia Magnenat-Thalmann. Multiple hu-man identification and cosegmentation: A human-orientedCRF approach with poselets. IEEE Trans. Multimedia,18(8):1516–1530, 2016.Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. Dis-criminative multi-modal feature fusion for RGBD indoorscene recognition. In 2016 IEEE Conference on ComputerVision and Pattern Recognition, CVPR 2016, Las Vegas, NV,USA, June 27-30, 2016, pages 2969–2976, 2016.

Hongyuan Zhu, Romain Vial, and Shijian Lu. TOR-NADO: A spatio-temporal convolutional regression networkfor video action proposal. In IEEE International Conferenceon Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 5814–5822, 2017.Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.H. Zhu, R. Vial, S. Lu, X. Peng, H. Fu, Y. Tian, and X. Cao.Yotube: Searching action proposal via recurrent and staticregression networks. IEEE Transactions on Image Process-ing, 27(6):2609–2622, June 2018.Hongyuan Zhu, Xi Peng, Vijay Chandrasekhar, Liyuan Li,and Joo-Hwee Lim. Dehazegan: When image dehazingmeets differential programming. In Proceedings of theTwenty-Seventh International Joint Conference on ArtificialIntelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Swe-den., pages 1234–1240, 2018.