1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam,...

28
1 Going Deep in Medical Image Analysis: Concepts, Methods, Challenges and Future Directions Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing a paradigm shift due to Deep Learning. This technology has recently attracted so much interest of the Medical Imaging community that it led to a specialized conference in ‘Medical Imaging with Deep Learning’ in the year 2018. This article surveys the recent developments in this direction, and provides a critical review of the related major aspects. We organize the reviewed literature according to the underlying Pattern Recognition tasks, and further sub-categorize it following a taxonomy based on human anatomy. This article does not assume prior knowledge of Deep Learning and makes a significant contribution in explaining the core Deep Learning concepts to the non-experts in the Medical community. Unique to this study is the Computer Vision/Machine Learning perspective taken on the advances of Deep Learning in Medical Imaging. This enables us to single out ‘lack of appropriately annotated large-scale datasets’ as the core challenge (among other challenges) in this research direction. We draw on the insights from the sister research fields of Computer Vision, Pattern Recognition and Machine Learning etc.; where the techniques of dealing with such challenges have already matured, to provide promising directions for the Medical Imaging community to fully harness Deep Learning in the future. Index Terms—Deep Learning, Medical Imaging, Artificial Neural Networks, Survey, Tutorial, Data sets. 1 I NTRODUCTION Deep Learning (DL) [1] is a major contributor of the contem- porary rise of Artificial Intelligence in nearly all walks of life. This is a direct consequence of the recent breakthroughs re- sulting from its application across a wide variety of scientific fields; including Computer Vision [2], Natural Language Processing [3], Particle Physics [4], DNA analysis [5], brain circuits studies [6], and chemical structure analysis [7] etc. Very recently, it has also attracted a notable interest of researchers in Medical Imaging, holding great promises for the future of this field. The DL framework allows machines to learn very com- plex mathematical models for data representation, that can subsequently be used to perform accurate data analysis. These models hierarchically compute non-linear and/or linear functions of the input data that is weighted by the model parameters. Treating these functions as data process- ing ‘layers’, the hierarchical use of a large number of such layers also inspires the name ‘Deep’ Learning. The common goal of DL methods is to iteratively learn the parameters of the computational model using a training data set such that the model gradually gets better in performing a desired task, e.g. classification; over that data under a specified metric. The computational model itself generally takes the form of an Artificial Neural Network (ANN) [8] that consists of multiple layers of neurons/perceptrons [9] - basic com- putational blocks, whereas its parameters (a.k.a. network weights) specify the strength of the connections between the neurons of different layers. We depict a deep neural network in Fig. 1 for illustration. Once trained for a particular task, the DL models are also able to perform the same task accurately using a variety of previously unseen data (i.e. testing data). This strong generalization ability of DL currently makes it stand out of the other Machine Learning techniques. Learning of the parameters of a deep model is carried out with the help of back-propagation strategy [10] that enables some form of the popular Gradient Descent technique [11], [12] to iteratively arrive at the desired parameter values. Updating the model parameters using the complete training data once is known as a single epoch of network/model training. Contemporary DL models are normally trained for hundreds of epochs before they can be deployed. Although the origins of Deep Learning can be traced back to 1940s [13], the sudden recent rise in its utilization for solving complex problems of the modern era results from three major phenomena. (1) Availability of large amount of training data: With ubiquitous digitization of information in recent times, very large amount of data is available to train complex computational models. Deep Learning has an intrinsic ability to model complex functions by simply stacking multiple layers of its basic computational blocks. Hence, it is a convenient choice for dealing with hard problems. Interestingly, this ability of deep models is known for over few decades now [14]. However, the bottleneck of relatively smaller training data sets had restricted the utility of Deep Learning until recently. (2) Availability of powerful computational resources: Learning complex functions over large amount of data results in immense computational re- quirements. Related research communities are able to fulfill such requirements only recently. (3) Availability of public libraries implementing DL algorithms: There is a growing recent trend in different research communities to publish the source codes on public platforms. Easy public access to DL algorithm implementations has exploded the use of this arXiv:1902.05655v1 [cs.CV] 15 Feb 2019

Transcript of 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam,...

Page 1: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

1

Going Deep in Medical Image Analysis:Concepts, Methods, Challenges and Future

DirectionsFouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua

Abstract—Medical Image Analysis is currently experiencing a paradigm shift due to Deep Learning. This technology has recentlyattracted so much interest of the Medical Imaging community that it led to a specialized conference in ‘Medical Imaging with DeepLearning’ in the year 2018. This article surveys the recent developments in this direction, and provides a critical review of the relatedmajor aspects. We organize the reviewed literature according to the underlying Pattern Recognition tasks, and further sub-categorize itfollowing a taxonomy based on human anatomy. This article does not assume prior knowledge of Deep Learning and makes asignificant contribution in explaining the core Deep Learning concepts to the non-experts in the Medical community. Unique to thisstudy is the Computer Vision/Machine Learning perspective taken on the advances of Deep Learning in Medical Imaging. This enablesus to single out ‘lack of appropriately annotated large-scale datasets’ as the core challenge (among other challenges) in this researchdirection. We draw on the insights from the sister research fields of Computer Vision, Pattern Recognition and Machine Learning etc.;where the techniques of dealing with such challenges have already matured, to provide promising directions for the Medical Imagingcommunity to fully harness Deep Learning in the future.

Index Terms—Deep Learning, Medical Imaging, Artificial Neural Networks, Survey, Tutorial, Data sets.

F

1 INTRODUCTION

Deep Learning (DL) [1] is a major contributor of the contem-porary rise of Artificial Intelligence in nearly all walks of life.This is a direct consequence of the recent breakthroughs re-sulting from its application across a wide variety of scientificfields; including Computer Vision [2], Natural LanguageProcessing [3], Particle Physics [4], DNA analysis [5], braincircuits studies [6], and chemical structure analysis [7] etc.Very recently, it has also attracted a notable interest ofresearchers in Medical Imaging, holding great promises forthe future of this field.

The DL framework allows machines to learn very com-plex mathematical models for data representation, that cansubsequently be used to perform accurate data analysis.These models hierarchically compute non-linear and/orlinear functions of the input data that is weighted by themodel parameters. Treating these functions as data process-ing ‘layers’, the hierarchical use of a large number of suchlayers also inspires the name ‘Deep’ Learning. The commongoal of DL methods is to iteratively learn the parameters ofthe computational model using a training data set such thatthe model gradually gets better in performing a desired task,e.g. classification; over that data under a specified metric.The computational model itself generally takes the formof an Artificial Neural Network (ANN) [8] that consistsof multiple layers of neurons/perceptrons [9] - basic com-putational blocks, whereas its parameters (a.k.a. networkweights) specify the strength of the connections between theneurons of different layers. We depict a deep neural networkin Fig. 1 for illustration.

Once trained for a particular task, the DL models are alsoable to perform the same task accurately using a variety

of previously unseen data (i.e. testing data). This stronggeneralization ability of DL currently makes it stand outof the other Machine Learning techniques. Learning of theparameters of a deep model is carried out with the help ofback-propagation strategy [10] that enables some form of thepopular Gradient Descent technique [11], [12] to iterativelyarrive at the desired parameter values. Updating the modelparameters using the complete training data once is knownas a single epoch of network/model training. ContemporaryDL models are normally trained for hundreds of epochsbefore they can be deployed.

Although the origins of Deep Learning can be tracedback to 1940s [13], the sudden recent rise in its utilization forsolving complex problems of the modern era results fromthree major phenomena. (1) Availability of large amount oftraining data: With ubiquitous digitization of informationin recent times, very large amount of data is available totrain complex computational models. Deep Learning hasan intrinsic ability to model complex functions by simplystacking multiple layers of its basic computational blocks.Hence, it is a convenient choice for dealing with hardproblems. Interestingly, this ability of deep models is knownfor over few decades now [14]. However, the bottleneck ofrelatively smaller training data sets had restricted the utilityof Deep Learning until recently. (2) Availability of powerfulcomputational resources: Learning complex functions overlarge amount of data results in immense computational re-quirements. Related research communities are able to fulfillsuch requirements only recently. (3) Availability of publiclibraries implementing DL algorithms: There is a growingrecent trend in different research communities to publishthe source codes on public platforms. Easy public access toDL algorithm implementations has exploded the use of this

arX

iv:1

902.

0565

5v1

[cs

.CV

] 1

5 Fe

b 20

19

Page 2: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

2

Fig. 1: Illustration of a deep neural network: The network consists of multiple layers of neurons/perceptrons that areconnected in an inter-layer fashion. The neurons compute non-linear/linear transformations of their inputs. Each featureof the input signal is weighted by the network parameters and processed by the neuron layers hierarchically. A networkconnection performs the weighting operation. The strengths of the connections (i.e. network parameter values) are learnedusing training data. The network layers not seen by the input and output signals are often termed ‘hidden’ layers.

technique in many application domains.The field of Medical Imaging has been exploiting Ma-

chine Learning since 1960s [15], [16]. However, the firstnotable contributions that relate to modern Deep Learningtechniques appeared in the Medical Imaging literature in1990s [17], [18], [19], [20], [21]. The relatedness of thesemethods to contemporary DL comes in the form of usingANNs to accomplish Medical Imaging tasks. Nevertheless,restricted by the amount of training data and computationalresources, these works trained networks that were only twoto three layers deep. This is no longer considered ‘deep’ inthe modern era. The number of layers in the contemporaryDL models generally ranges from a dozen to over onehundred [22]. In the context of image analysis, such modelshave mostly originated in Computer Vision literature [2].

The field of Computer Vision closely relates to MedicalImaging in analyzing digital images. Medical Imaging has along tradition of profiting from the findings in ComputerVision. In 2012 [23], DL provided a major breakthroughin Computer Vision by performing a very hard imageclassification task with remarkable accuracy. Since then,the Computer Vision community has gradually shifted itsmain focus to DL. Consequently, Medical Imaging literaturealso started witnessing methods exploiting deep neuralnetworks in around 2013, and now such methods are ap-pearing at an ever increasing rate. Sahiner et al. [24] notedthat the peer-reviewed publications that employed DL forradiological images tripled from 2016 (∼100) to 2017 (∼300),whereas the first quarter of 2018 alone saw over 100 suchpublications. Similarly, the main stream Medical Imagingconference, i.e. International Conference on ‘Medical ImageComputing and Computer Assisted Intervention’ (MICCAI)published over 120 papers in its main proceedings in 2018that employed Deep Learning for Medical Image Analysistasks.

The large inflow of contributions exploiting DL in Med-ical Imaging has also given rise to a specialized venue inthe form of International Conference on ‘Medical Imag-ing with Deep Learning’ (MIDL) in 20181 that published82 contributions in this direction. Among other publishedpapers in 2018, our literature review also includes notablecontributions from MICCAI and MIDL 2018. We note that

1. https://midl.amsterdam/

few review articles [24], [25], [26] for closely related re-search directions also exist at the time of this publication.Whereas those articles collectively provide a comprehensiveoverview of the methods exploiting deep neural networksfor medical tasks until the year 2017, none of them seesthe existing Medical Imaging literature through the lensof Computer Vision and Machine Learning. Consequently,they also fall short in elaborating on the root causes ofthe challenges faced by Deep Learning in Medical Imaging.Moreover, limited by their narrower perspective, they alsodo not provide insights into leveraging the findings in otherfields for addressing these challenges.

In this paper, we provide a comprehensive review of therecent DL techniques in Medical Imaging, focusing mainlyon the very recent methods published in 2018. We cate-gorize these techniques under different pattern recognitiontasks and further sub-categorize them following a taxonomybased on human anatomy. Analyzing the reviewed litera-ture, we establish ‘lack of appropriately annotated large-scale datasets’ for Medical Imaging tasks as the fundamentalchallenge (among other challenges) in fully exploiting DeepLearning for those tasks. We then draw on the literature ofComputer Vision, Pattern Recognition and Machine Learn-ing in general; to provide guidelines to deal with this andother challenges in Medical Image Analysis using DeepLearning. This review also touches upon the available publicdatasets to train DL models for the medical imaging tasks.Considering the lack of in-depth comprehension of DeepLearning framework by the broader Medical community,this article also provides the understanding of the coretechnical concepts related to DL at an appropriate level. Thisis an intentional contribution of this work.

The remaining article is organized as follows. In Sec-tion 2, we present the core Deep Learning concepts for theMedical community in an intuitive manner. The main bodyof the reviewed literature is presented in Section 3. We touchupon the public data repositories for Medical Imaging inSection 4. In Section 5, we highlight the major challengesfaced by Deep Learning in Medical Image Analysis. Recom-mendations for dealing with these challenges are discussedin Section 6 as future directions. The article concludes inSection 7.

Page 3: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

3

2 BACKGROUND CONCEPTS

In this Section, we first briefly introduce the broader typesof Machine Learning techniques and then focus on the DeepLearning framework. Machine Learning methods are bo-radly categorized as supervised or unsupervised based on thetraining data used to learn the computational models. DeepLearning based methods can fall in any of these categories.

2.0.0.1 Supervised Learning: In supervised learn-ing, it is assumed that the training data is available in theform of pairs (x,y), where x ∈ Rm is a training example,and y is its label. The training examples generally belongto different, say C classes of data. In that case, y is oftenrepresented as a binary vector living in RC , such that its cth

coefficient is ‘1’ if x belongs to the cth class, whereas all othercoefficients are zero. A typical task for supervised learning isto find a computational modelM with the help of trainingdata such that it is also able to correctly predict the labels ofdata samples that it had not seen previously during training.The unseen data samples are termed test/testing samples inthe Machine Learning parlance. To learn a model that canperform successful classification of the test samples, wecan formulate our learning problem as estimation of theparameters Θ of our model that minimizes a specific lossL(y, y), where y is the label vector predicted by the modelfor a given test sample. As can be guessed, the loss is definedsuch that it has a small value only if the learned parametricmodel is able to predict the correct label of the data sample.Whereas the model loss has its scope limited to only a singledata sample, we define a cost for the complete trainingdata. The cost of a model is simply the Expected value ofthe losses computed for the individual data samples. DeepLearning allows us to learn model parameters Θ that areable to achieve very low cost over very large data sets.

Whereas classification generally aims at learning com-putational models that map input signals to discrete outputvalues, i.e. class labels. It is also possible to learn models thatcan map training examples to continuous output values. Inthat case, y is typically a real number scalar or vector. Anexample of such task is to learn a model that can predictthe probability of a tumor being benign or malignant. InMachine Learning, such a task is seen as a regression problem.Similar to the classification problems, Deep Learning hasbeen able to demonstrate excellent performance in learningcomputational models for the regression problem.

2.0.0.2 Unsupervised Learning: Whereas super-vised learning assumes that the training data also provideslabels of the samples; unsupervised learning assume thatsample labels are not available. In that case, the typical taskof a computational model is to cluster the data samples intodifferent groups based on the similarities of their intrinsiccharacteristics - e.g. clustering pixels of color images basedon their RGB values. Similar to the supervised learningtasks, models for unsupervised learning tasks can also takeadvantage of minimizing a loss function. In the context ofDeep Learning, this loss function is normally designed suchthat the model learns an accurate mapping of an input signalto itself. Once the mapping is learned, the model is usedto compute compact representations of data samples thatcluster well. Deep Learning framework has also been foundvery effective for unsupervised learning.

Along with supervised and unsupervised learning, othermachine learning types include semi-supervised learning andreinforcement learning. Informally, semi-supervised learningcomputes models using the training data that provideslabels only for its smaller subsets. On the other hand,reinforcement learning provides ‘a kind of’ supervision forthe learning problem in terms of rewards or punishmentsto the algorithm. Due to their remote relevance to the tasksin Medical Imaging, we do not provide further discussionon these categories. Interested readers are directed to [27]for semi-supervised learning, and to [28] for reinforcementlearning.

2.1 Standard Artificial Neural NetworksAn Artificial Neural Network (ANN) is a hierarchical com-position of basic computational elements known as neurons(or perceptrons [9]). Multiple neurons exist at a single level ofthe hierarchy, forming a single layer of the network. Usingmany layers in an ANN makes it deep, see Fig. 1. A neuronperforms the following simple computation:

a = f(wᵀx+ b), (1)

where x ∈ Rm is the input signal, w ∈ Rm contains theneuron’s weights, and b ∈ R is a bias term. The symbol f(.)denotes an activation function, and the computed a ∈ R is theneuron’s activation signal or simply its activation. Generally,f(.) is kept non-linear to allow an ANN to induce complexnon-linear computational models. The classic choices forf(.) are the well-known ‘sigmoid’ and ‘hyperbolic tangent’functions. We depict a single neuron/perceptron in Fig. 2.

A neural network must learn the weight and bias termsin Eq. (1). The strategy used to learn these parameters (i.e.back-propagation [10]) requires f(.) to be a differentiablefunction of its inputs. In the modern Deep Learning era,Rectified Linear Unit (ReLU) [23] is widely used for thisfunction, especially for non-standard ANNs such as CNNs(see Section 2.2). ReLU is defined as a = max(0,wᵀx + b).For the details beyond the scope of this discussion, ReLUallows more efficient and generally more effective learningof complex models as compared to the classic sigmoid andhyperbolic tangent activation functions.

It is possible to compactly represent the weights associ-ated with all the neurons in a single layer of an ANN as amatrix W ∈ Rp×m, where ‘p’ is the total number of neuronsin that layer. This allows us to compute the activations of allthe neurons in the layer at once as follows:

a = f(Wx+ b), (2)

where a ∈ Rp now stores the activation values of all theneurons in the layer under consideration. Noting the heirar-chical nature of ANNs, it is easy to see that the functionalform of a model induced by an L-layer network can be givenas:

M(x,Θ) = fL(WLfL−1(WL−1fL−2...(W 1x+ (3)b1) + ...+ bL−1) + bL),

where the subscripts denote the layer numbers. We collec-tively denote the parameters W i, bi,∀i ∈ {1, ..., L} as Θ.

A neural network model can be composed by using dif-ferent number of layers, having different number of neurons

Page 4: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

4

Fig. 2: Illustration of a single neuron/perceptron in a stan-dard ANN. Each feature/coefficient ‘xi ∀i ∈ {1, ...,m}’of the input signal ‘x’ gets weighted by a correspondingweight ‘wi’. A bias term ‘b’ is added to the weighted sum‘wᵀx’ and a non-linear/linear function f(.) is applied tocompute the activation ‘a’.

in each layer, and even having different activation functionsfor different layers. Combined, these choices determine thearchitecture of a neural network. The design variables of thearchitecture and those of the learning algorithm are termedas hyper-parameters of the network. Whereas the model pa-rameters (i.e. Θ) are learned automatically, finding the mostsuitable values of the hyper-parameter is usually a manualiterative process. Standard ANNs are also commonly knownas Multi-Layer Perceptrons (MLPs), as their layers are gen-erally composed of standard neurons/perceptrons. One no-table exception to this layer composition is encountered atthe very last layer, i.e. softmax layer used in classification. Incontrast to ‘independent’ activation computation by eachneuron in a standard perceptron layer, softmax neuronscompute activations that are normalized across all the ac-tivation values of that layer. Mathematically, the ith neuronof a softmax layer computes the activation value as:

ai =ew

ᵀi aL−1+bi

p∑j=1

ewᵀjaL−1+bj

. (4)

The benefit of normalizing the activation signal is thatthe output of the softmax layer can be interpreted as aprobability vector that encodes the confidence of the networkthat a given sample belongs to a particular class. Thisinterpretation of softmax layer outputs is a widely usedconcept in the related literature.

2.2 Convolutional Neural Networks

In the context of DL techniques for image analysis, Convolu-tion Neural Networks (CNNs) [23], [29] are of the primaryimportance. Similar to the standard ANNs, CNNs consistof multiple layers. However, instead of simple perceptronlayers, we encounter three different kinds of layers in thesenetworks (a) Convolutional layers, (b) Pooling layers, and(c) Fully connected layers (often termed fc-layers). We de-scribe these layers below, focusing mainly on the Convolu-tional layers that are the main source of strength for CNNs.

Fig. 3: Illustration of convolution operation for 2D-grids (left) and 3D-volumes (right). Complete steps of mov-ing window are shown for the 2D case whereas only step-1 is shown for 3D. In 3D, the convolved channels arecombined by simple addition, still resulting in 2D featuremaps.

2.2.0.1 Convolutional layers: The aim of convolu-tion layers is to learn weights of the so-called2 convolutionalkernels/filters that can perform convolution operations on im-ages. Traditional image analysis has a long history of usingsuch filters to highlight/extract different image features, e.g.Sobel filter for detecting edges in images [30]. However,before CNNs, these filters needed to be designed by man-ually setting the weights of the kernel in a careful manner.The breakthrough that CNNs provided is in the automaticlearning of these weights under the neural network settings.

We illustrate the convolution operation in Fig. 3. In2D settings (e.g. grey-scale images), this operation involvesmoving a small window (i.e. kernel) over a 2D grid (i.e. im-age). In each moving step, the corresponding elements ofthe two grids get multiplied and summed up to computea scalar value. Concluding the operation results in another2D-grid, referred to as the feature/activation map in the CNNliterature. In 3D settings, the same steps are performed forthe individual pairs of the corresponding channels of the3D volumes, and the resulting feature maps are simplyadded to compute a 2D map as the final output. Sincecolor images have multiple channels, convolutions in 3Dsettings are more relevant for the modern CNNs. However,for the sake of better understanding, we often discuss therelevant concepts using the 2D grids. These concept arereadily transferable to the 3D volumes.

A convolutional layer of CNN forces the elements ofkernels to become the network weights, see Fig. 4 for illus-tration. In the figure, we can directly compute the activationa1 (2D grids) using Eq. (1), where x,w ∈ R9 are vectorsformed by arranging w1, .., w9 and x1, ..., x9 in the figure.The figure does not show the bias term, which is generallyignored in the convolutional layers. It is easy to see thatunder this setting, we can make use of the same tools tolearn the convolutional kernel weights that we use to learnthe weights of a standard ANN. The same concept applies

2. Strictly speaking, the kernels in CNNs compute cross-correlations.However, they are always referred to as ‘convolutional’ by convention.Our subsequent explanation of the ‘convolution’ operation is in-linewith the definition of this operation used in the context of CNNs.

Page 5: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

5

Fig. 4: Working of a convolutional layer. CNNs force kernelweights to become network parameters. (Left) In 2D grids,a single kernel moves over the image/input signal. (Right)A volume of multiple kernel moves over the input volumeto result in an output volumes.

to the 3D volumes, with a difference that we must use‘multiple’ kernels to get a volume (instead of a 2D grid)at the output. Each feature map resulting from a kernelthen acts as a separate channel of the output volume ofa convolutional layer. It is a common practice in CNNliterature to simplify the illustrations in 3D settings by onlyshowing the input and output volumes for different layers,as we do in Fig. 4.

From the perspective presented above, a convolutionallayer may look very similar to a standard perceptron layer,discussed in Section 2.1. However, there are two majordifferences between the two. (1) Every input feature getsconnected to its activation signal through the same kernel(i.e. weights). This implies that all input features share thekernel weights - called parameter sharing. Consequently, thekernels try to adjust their weights such that they resonatewell to the basic building patterns of the whole inputsignal, e.g. edges for an image. (2) Since the same kernelconnects all input features to output features/activation,convolutional layers have very few parameters to learn inthe form of kernel weights. This sparsity of connections allowsvery efficient learning despite the high dimensionality ofdata - a feat not possible with standard densely connectedperceptron layers.

2.2.0.2 Pooling layers: The main objective of a pool-ing layer is to reduce the width and height of the activationmaps in CNNs. The basic concept is to compute a single out-put value ‘v’ for a small np × np grid in the activation map,where ‘v’ is simply the maximum or average value of thatgrid in the activation map. Based on the used operation, thislayer is often referred as max-pooling or average-pooling layer.Interestingly, there are no learnable parameters associatedwith a pooling layer. Hence, this layer is sometimes seen as apart of Convolutional layer. For instance, the popular VGG-16 network [31] does not see pooling layer as a separatelayer, hence the name VGG-16. On the other hand, otherworks, e.g. [32] that use VGG-16 often count more than 16layers in this network by treating the pooling layer as aregular network layer.

2.2.0.3 Fully connected layers: These layers are thesame as the perceptron layers encountered in the standard

Fig. 5: Illustration of unfolded RNN [1]. A state s is shownwith a single neuron for simplicity. At the tth time stamp,the network updates its state st based on the input xt andthe previous state. It optionally outputs a signal ot. U , Vand W are network parameters.

ANNs. The use of multiple Convolutional and Pooling lay-ers in CNNs gradually reduces the size of resulting activa-tion maps. Finally, the activation maps from a deeper layerare re-arranged into a vector which is then fed to the fullyconnected (fc) layers. It is a common knowledge now thatthe activation vectors of fc-layers often serve as very goodcompact representations of the input signals (e.g. images).

Other than the above mentioned three layers, BatchNormalization (BN) [33] is another layer that is now a daysencountered more often in CNNs than in the standardANNs. The main objective of this layer (with learnableparameters) is to control the mean and variance of theactivation values of different network layers such that theinduction of the overall model becomes more efficient. Thisidea is inspired by a long known fact that induction ofANN models generally becomes easier if the inputs arenormalized to have zero mean and unit variance. The BNlayer essentially applies a similar principle to the activationsof deep neural networks.

2.3 Recurrent Neural NetworksStandard neural networks assume that input signals areindependent of each other. However, often this is not thecase. For instance, a word appearing in a sentence generallydepends on sequence of the words preceding it. RecurrentNeural Networks (RNNs) are designed to model such se-quences. An RNN can be thought to maintain a ‘memory’of the sequence with the help of its internal states. In Fig. 5,we show a typical RNN that is unfolded - complete networkis shown for the sequence. If the RNN has three layers, itcan model e.g. sentences that are three words long. In thefigure, xt is the input at the tth time stamp. For instance, xt

can be some quantitative representation of the tth word in asentence. The memory of the network is maintained by thestate st that is computed as:

st = f(Uxt +Wst−1), (5)

where f(.) is typically a non-linear activation function,e.g. ReLU. The output at a given time stamp ot is a functionof a weighted version the network state at that time. For in-stance, predicting probability of the next word in a sentencecan assume the output form ot = softmax(V st), where‘softmax’ is the same operation discussed in Section 2.1.

One aspect to notice in the above equations is that we usethe same weight matrices U ,V ,W at all time stamps. Thus,

Page 6: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

6

we are recursively performing the same operations overan input sequence at multiple time stamps. This fact alsoinspires the name ‘Recursive’ NN. It also has a significantimplication that for an RNN we need a special kind of back-propagation algorithm, known as back-propagation throughtime (BTT). As compared to the regular back-propagation,BTT must propagate error recursively back to the previoustime stamps. This becomes problematic for long sequencesthat involve too many time stamps. A phenomenon knownas vanishing/exploding gradient is the root cause of this prob-lem. This has lead RNN researcher to focus on designingnetworks that can handle longer sequences. Long Short-Term Memory (LSTM) network [34] is currently a populartype of RNN that is found be reasonably effective for deal-ing with long sequences.

LSTM networks have the same fundamental architectureof an RNN, however their hidden states are computeddifferently. The hidden units are commonly known as cells inthe context of LSTMs. Informally, a cell takes in the previousstate and the input at a given time stamp and decides onwhat to remember and what to erase from its memory.The previous state, current input and the memory is thencombined for the next time stamp.

2.4 Using Neural Networks for Unsupervised Learning

Whereas we assumed availability of the label for each datasample while discussing the basic concepts of neural net-works in the preceding subsections, those concepts can alsobe applied readily to construct neural networks to modeldata without labels. Here, we briefly discuss the mainstreamframeworks that allow to do so. It is noteworthy that thisarticle does not present neural networks as ‘supervised vsunsupervised’ intentionally. This is because the core con-cepts of neural networks are generally better understood insupervised settings. Unsupervised use of neural networkssimply requires to employ the same ideas under differentoverall frameworks.

2.4.1 AutoencodersThe main idea behind autoencoders is to map an inputsignal (e.g. image, feature vector) to itself using a neural net-work. In this process, we aim to learn a latent representationof the data that is more powerful for a particular task thanthe raw data itself. For instance, the learned representationcould cluster better than the original data. Theoretically, it ispossible to use any kind of network layers in autoencodersthat are used for supervised neural networks. The unique-ness of autoencoders comes in the output layer where thesignal is the same as the input signal to the network insteadof e.g. a label vector in classification task.

Mapping a signal to itself can result in trivial mod-els (learning identity mapping). Several techniques havebeen adopted in the literature to preclude this possibility,leading to different kinds of autoencoders. For instance,undercomplete autoencoders ensure that the dimensionalityof the latent representation is much smaller than the datadimension. In MLP settings, this can be done by using asmall number (as compared to the input signal’s dimension)of neurons in a hidden layer of the network, and usethe activations of that layer as the latent representation.

Regularized autoencoders also impose sparsity on neuronconnections [35] and reconstruction of the original signalfrom its noisy vesion [36] to ensure learning of useful latentrepresentation instead of identity mapping. Variational au-toencoders [37] and contractive autoencoders [38] are alsoamong the other popular types of autoencoders.

2.4.2 Generative Adversarial NetworksRecent years have seen an extensive use of GenerativeAdversarial Networks (GANs) [39] in natural image anal-ysis. GANs can be considered a variation of autoencodersthat aim at mimicking the distribution generating the data.GANs are composed of two parts that are neural networks.The first part, termed generator, has the ability to generatea sample whereas the other, called discriminator can classifythe sample as a real or fake. Here, a ‘real’ sample meansthat it is actually coming from the training data. The twonetworks essentially play a game where the generator triesto fool the discriminator by generating more and more re-alistic samples. In the process the generator keeps updatingits parameters to produce better samples. The adversarialobjective of the generator to fool the discriminator also in-spires the name of GANs. In natural image analysis, GANshave been successfully applied for many tasks, e.g. inducingrealism in synthetic images [40], domain adaption [41] anddata completion [42]. Such successful applications of GANsto image processing tasks also open new directions formedical image analysis tasks.

2.5 Best practices in using CNNs for image analysis

Convolutional Neural Networks (CNNs) form the back-bone of the recent breakthroughs in image analysis. Tosolve different problems in this area, CNN based modelsare normally used in three different ways. (1) A networkarchitecture is chosen and trained from scratch using theavailable training dataset in an end-to-end manner. (2) ACNN model pre-trained on some large-scale dataset is fine-tuned by further training the model for a few epochs usingthe data available for the problem at hands. This approachis more suitable when limited training data is available forthe problem under consideration. It is often termed transferlearning in the literature. (3) Use a model as a feature extrac-tor for the available images. In this case, training/testingimages are passed through the network and the activationsof a specific layer (or a combination of layers) are consideredas image features. Further analysis is performed using thosefeatures.

Computer Vision literature provides extensive studies toreflect on the best practices of exploiting CNNs in any of theaforementioned three manners. We can summarize the cruxof these practices as follows. One should only consider train-ing a model from scratch if the available training data size isvery large, e.g. 50K image or more. If this is not the case, usetransfer learning. If the training data is even smaller, e.g. fewhundred images, it may be better to use CNN only as afeature extractor. No matter which approach is adopted, it isbetter that the underlying CNN is inspired by a model thathas already proven its effectiveness for a similar task. Thisis especially true for the ‘training from scratch’ approach.We refer to the most successful recent CNN models in the

Page 7: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

7

Computer Vision literature in the paragraphs to follow. Fortransfer learning, it is better to use a model that is pre-trained on data/problem that is as similar as possible tothe data/problem at hands. In the case of using CNN asa feature extractor, one should prefer a network with morerepresentation power. Normally, deeper networks that aretrained on very large datasets have this property. Due totheir discriminative abilities, features extracted from suchmodels are especially useful for classification tasks.

Starting from AlexNet in 2012 [23], many complexCNN models have been developed in the last seven years.Whereas still useful, AlexNet is no longer considered a state-of-the-art network. A network still applied frequently isVGG-16 [31] that was proposed in 2014 by the Visual Ge-ometry Group (VGG) of Oxford university. A later versionof VGG-16 is VGG-19 that uses 19 instead of 16 layers of thelearnable parameters. Normally, the representation power ofboth versions are considered similar. Another popular net-work is GoogLeNet [43] that is also commonly known as ‘In-ception’ network. This network uses a unique type of layercalled inception layer/block from which it drives its mainstrength. To date, four different versions of Inception [44],[45] have been introduced by the original authors, with eachsubsequent version having slightly better representationpower (under a certain perspective) than its predecessor.ResNet [22] is another popular network that enables deeplearning with models having more than hundred layers. Itis based on a concept known as ‘residual learning’, which iscurrently highly favored by Pattern Recognition communitybecause it enables very deep networks. DenseNet [46] alsoexploits the insights of residual learning to achieve therepresentation power similar to ResNet, but with a morecompact network.

Whereas the above-mentioned CNNs are mainly trainedfor image classification tasks, Fully Convolutional Networks(FCNs) [47] and U-Net [48] are among the most popularnetworks for the task of image segmentation. Analyzingthe architectures and hyperparamter settings of these net-works can often reveal useful insights for developing newnetworks. In fact, some of these networks (e.g. Inception-v4/ResNet [45]) already rely on the insights from others(e.g. ResNet [22]). The same practice can yield popularnetworks in the future as well. We draw further on the bestpractices of using CNNs for image analysis in Section 6.

2.6 Deep Learning Programming Frameworks

The rise of Deep Learning has been partially enabled by thepublic access to programming frameworks that implementthe core techniques in this area in high-level programminglanguages. Currently, many of these frameworks are be-ing continuously maintained by software developers andmost of the new findings are incorporated in them rapidly.Whereas availability of appropriate Graphical ProcessingUnit (GPU) is desired to fully exploit these modern frame-works, CPU support is also provided with most of them totrain and test small models. The frameworks allow theirusers to directly test different network architectures andtheir hyperparameter settings etc. without the need of ac-tually implementing the operations performed by the layersand the algorithms that train them. The layers and related

algorithms come pre-implemented in the libraries of theframeworks.

Below we list the popular Deep Learning frameworksin use now a days. We order them based on their currentpopularity in Computer Vision and Pattern Recognitioncommunity for the problems of image analysis, startingfrom the most popular.

• Tensorflow [49] is originally developed by GoogleBrain, it is fast becoming the most popular deeplearning framework due to its continuous develop-ment. It provides Python and C++ interface.

• PyTorch [50] is a Python based library supportedby Facebook’s AI Research. It is currently receivingsignificant attention due to its ability to implementdynamic graphs.

• Caffe2 [51] builds on Caffe (see below) and providesC++ and Python interface.

• Keras [52] can be seen as a high level programminginterface that can run on top of Tensorflow andTheano [53]. Although not as felxible as other frame-works, Keras is particularly popular for quickly de-veloping and testing networks using common net-work layers and algorithms. It is often seen as agateway to deep learning for new users.

• MatConvNet [54] is the most commonly used publicdeep learning library for Matlab.

• Caffe [51] was originally developed by UC Berekely,providing C++ and Python interface. Whereas Caffe2is now fast replacing Caffe, this framework is still inuse because public implementations of many popu-lar networks are available in Caffe.

• Theano [53] is a library of Python to implement deeplearning techniques that is developed and supportedby MILA, University of Montreal.

• Torch [55] is a library and scripting language basedon Lua. Due to its first release in 2011, many firstgeneration deep learning models were implementedusing Torch.

The above is not an exhaustive list of the frameworksfor Deep Learning. However, it covers those frameworksthat are currently widely used in image analysis. It shouldbe noted, whereas we order the above list in terms ofthe ‘current’ trend in popularity of the frameworks, publicimplementations of many networks proposed in 2012 - 2015are originally found in e.g. Torch, Theano or Caffe. This alsomakes those frameworks equally important. However, it isoften possible to find public implementations of legacy net-works in e.g. Tensorflow, albiet not by the original authors.

3 DEEP LEARNING METHODS IN MEDICAL IMAGEANALYSIS

In this Section, we review the recent contributions in Med-ical Image Analysis that exploit the Deep Learning tech-nology. We mainly focus on the research papers publishedafter December 2017, while briefly mentioning the moreinfluential contributions from the earlier years. For a com-prehensive review of the literature earlier than the year 2018,we collectively recommend the following articles [24], [25],[26]. Taking more of a Computer Vision/Machine Learning

Page 8: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

8

perspective, we first categorize the existing literature under‘Pattern Recognition’ tasks. The literature pertaining to eachtask is then further sub-categorized based on the humananatomical regions. The taxonomy of our literature reviewis depicted in Fig. 6.

3.1 Detection/LocalizationThe main aim of detection is to identify a particular regionof interest in an image an draw a bounding box aroundit, e.g. brain tumor in MRI scans. Hence, localization isalso another term used for the detection task. In MedicalImage Analysis, detection is more commonly referred to asComputer Aided Detection (CAD). CAD systems are aimedat detecting the earliest signs of abnormality in patients.Lung and breast cancer detection can be considered as thecommon applications of CAD.

3.1.1 BrainFor the anatomic region of brain, Jyoti et al. [56] employeda CNN for the detection of Alzheimer’s Disease (AD) us-ing the MRI images of OASIS data set [57]. The authorsbuilt on two baseline CNN networks, namely Inception-v4 [45] and ResNet [22], to categorize four classes of AD.These classes include moderate, mild, very mild and non-demented patients. The accuracies reported by the authorsfor these classes are 33%, 62%, 75%, and 99%, respectively. Itis claimed that the proposed method does not only performwell on the used dataset, but it also has the potential togeneralize to ADNI dataset [58]. Chen et al. [59] pro-posed an unsupervised learning approach using an Auto-Encoders (AE). The authors investigated lesion detectionusing Variational Auto Encoder (VAE) [37] and AdversarialAuto Encoder (AAE) [60]. The analysis is carried out onBRATS 2015 datasets, demonstrating good results for theAera Under Curve (AUC) metric.

Alaverdyan et al. [61] used a deep neural network forepilepsy lesion detection in multiparametric MRI images.They also stacked convolution layers in an auto-encodersfashion and trained their network using the patches ofthe original images. Their model was trained using thedata from 75 healthy subjects in an unsupervised manner.For the automated brain tumor detection in MR imagesPandaet al. [62] used discriminative clustering method tosegregate the vital regions of brain such as Cerebro SpinalFluid (CSF), White Matter (WM) and Gray Matter (GM). Inanother study of automatic detection in MR images [63],Laukampet al. used multi-parametric deep learning modelfor the detection of meningiomas in brain.

3.1.2 BreastIn assessing cancer spread, histopathological analysis ofSentinel Lymph Nodes (SLNs) becomes important forthe task of cancer staging. Bejnordi et al. [64] analyzeddeep learning techniques for metastases detection in eosin-stained tissues and hematoxylin tissue sections of lymphnodes of the subjects with cancer. The computational resultsare compared with human pathologist diagnoses. Interest-ingly, out of the 32 techniques analysed, the top 5 deeplearning algorithms arguably out-performed eleven pathol-ogists.

Chiang et al. [65] developed a CAD technique based ona 3D CNN for breast cancer detection using Automatedwhole Breast Ultrasound (ABUS) imaging modality. In theirapproach, they first extracted Volumes of Interest (VOIs)through a sliding window technique, then the 3D CNN wasapplied and tumor candidates were selected based on theprobability resulting from the application of 3D CNN toVOIs. In the experiments 171 tumors are used for testing,achieving sensitivities of up to 95%. Dalmics et al. [66]proposed a CNN based CAD system to detect breast cancerin MRI images. They used 365 MRI scans for training andtesting, out of which 161 were malignant lesions. Theyclaimed the achieved sensitivity obtained by their techniqueto be better than the existing CAD systems. For the detectionof breast mass in mammography images, Zhang et al. [67]developed a Fully Convolutional Network (FCN) basedend-to-end heatmap regression technique. They demon-strated that mammography data could be used for digi-tal breast tomosynthesis (DBT) to improve the detectionmodel. They used transfer learning by fine tunning an FCNmodel on mammography images. The approach is tested ontomosynthesis data with 40 subjects, demonstrating betterperformance as compared to the model trained from scratchon the same data.

3.1.3 Eye

For the anatomical region of eye, Li et al. [68] recently em-ployed a deep transfer learning approach which fine tunesthe VGG-16 model [31] that is pretrained on ImageNet [69]dataset. To detect and classify Age-related Macular Degen-eration (AMD) and Diabetic Macular Edema (DME) dis-eases in eye, they used 207,130 retinal Optical CoherenceTomography (OCT) images. The proposed method achieved98.6% prediction detection accuracy in retinal images with100%. Ambramoff et al. [70] used a CNN based technique todetect Diabetic Retinopathy (DR) in fundus images. Theyassessed the device IDx-DR X 2.1 in their study usinga public dataset [71] and achieve an AUC score of 0.98.Schlegl et al. [72] employed deep learning for the detectionand quantification of Intraretinal Cystoid Fluid (IRC) andSubretinal Fluid (SRF) in retinal images. They employed anauto encoder-decoder formation of CNNs, and used 1,200OCT retinal images for the experiments, achieving AUC of0.92 for SRF and AUC of 0.94 for IRC.

Deep learning is also being increasingly used for di-agnosing retinal diseases [73], [74]. Li et al. [75] traineda deep learning model based on the Inception architec-ture [43] for the identification of Glaucomatous Optic Neu-ropathy (GON) in retinal images. Their model achievedAUC of 0.986 for distinguishing healthy from GON eyes.Recently, Christopher et al. [76] also used transfer learningwith VGG16, Inception v3, and ResNet50 models for theidentification of GON. They used pre-trained models ofImageNet. For their experiments, they used 14,822 OpticNerve Head (ONH) fundus images of GON or healthy eyes.The achieved best performance for identifying moderate-to-severe GON in the eyes was reported to be AUC value0.97 with 90% sensitivity and 93% specificity. Khojasteh etal. [77] used pre-trained ResNet-50 on DIARETDB1 [78] ande-Ophtha [79] datasets for the detection of excudates in the

Page 9: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

9

Fig. 6: Taxonomy of literature review: The contributions exploiting Deep Learning technology are first categorized accordingto the underlying Pattern Recognition tasks. Each category is than sub-divided based on the human anatomical regionstudied in the papers.

retinal images. They reported an accuracy of 98% with 99%sensitivity of detection on the used data.

3.1.4 ChestFor the pulmonary nodule detection in lungs in Com-puted Tomography (CT) images, Zhu et al. [80] proposeda deep network called DeepEM. This network uses a 3DCNN architecture that is augmented with an Expectation-Maximization (EM) technique for the noisily labeled data ofElectronic Medical Records (EMRs). They used the EM tech-nique to train their model in an end-to-end manner. Threedatasets were used in their study, including; the LUNA16dataset [81] - the largest publicly available dataset for su-pervised nodule detection, NCI NLST dataset3 for weaklysupervised detection and Tianchi Lung Nodule Detectiondataset.

For the detection of artefacts in Cardiac Magnetic Res-onance (CMR) imaging, Oksuz et al. [82] also proposeda CNN based technique. Before training the model, theyperformed image pre-processing by normalization and re-gion of interest (ROI) extraction. The authors used a CNNarchitecture with 6-convolutional layers (ReLU activations)followed by 4-pooling layers, 2 fc layers and a softmaxlayer to estimate the motion artefact labels. They showedgood performance for the classification of motion artefactsin videos. The authors essentially built on the insights of [83]in which video classification is done using a spatio-temporal3D CNN.

Zhe et al. [84] proposed a technique for the localizationand identification of thoraic diseases in public database NIHX-ray4 that comprises 120 frontal view X-ray images with14 labels. Their model performs the tasks of localizationand identification simultaneously. They used the popularResNet [22] architecture to build the computational model.In their model, an input image is passed through the CNNfor feature map extraction, then a max pooling or bi-linearinterpolation layer is used for resizing the input image bya patch slicing layer. Afterwards, fully convolutional layersare used to eventually perform the recognition. For train-ing, the authors exploit the framework of Multi-Instance

3. https://biometry.nci.nih.gov/cdas/datasets/nlst/4. https://www.kaggle.com/nih-chest-xrays/data

Learning (MIL), and in the testing phase, the model pre-dicts both labels and class specific localization details. Yi etal. [85] presented a scale recurrent network for the detectionof catheter in X-ray images. Their network architecture isorganised in an auto encoder-decoder manner. In anotherstudy [86], Masood et al. proposed a deep network, termedDFCNet, for the automatic computer aided lung pulmonarydetection.

Gonzalez et al. [87] proposed a deep network forthe detection of Chronic Obstructive Pulmonary Disease(COPD) and Acute Respiratory Disease (ARD) predictionin CT images of smokers. They trained a CNN using 7,983COPDGene cases and used logistic regression for COPDdetection and ARD prediction. In another study [88], thesame group of researchers used deep learning for weaklysupervised lession localization. Recently, Marsiya et al. [89]used NLST and LDIC/IDRI [90] datasets for lung nod-ule detection in CT images. They proposed a 3D Group-equivariant Convolutional Neural Network (G-CNN) tech-nique for that purpose. The proposed method was exploitedfor fast positive reduction in pulmonary lung nodule detec-tion. The authors claim their method performs on-par withstandard CNNs while trained using ten times less data.

3.1.5 Abdomen

Alensary et al. [91] proposed a deep reinforcement learningtechnique for the detection of multiple landmarks with ROIsin 3D fetal head scans. Ferlaino et al. [92] worked on plan-cental histology using deep learning. They classified fivedifferent classes with an accuracy of 89%. Their model alsolearns deep embedding encoding phenotypic knowledgethat classifies five different cell populations and learns inter-class variances of phenotype. Ghesu et al. [93] used a largedata of 1,487 3D CT scans for the detection of anatomic sites,exploiting multi-scale deep reinforcement learning.

Katzmann et al. [105] proposed a deep learning basedtechnique for the estimation of Colorectal Cancer (CRC) inCT tumor images for early treatment. Their model achievedhigh accuracies in growth and survival prediction. Meng etal. [106] formulated an automatic shadow detection tech-nique in 2D ultrasound images using weakly supervisedannotations. Their method highlights the shadow regions

Page 10: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

10

TABLE 1: Summary of highly influential papers appeared in 2016 and 2017 (based on Google Scholar citation index inJanuary 2019) that exploit deep learning for Detection/Localization in Medical Image Analysis.

Reference Anatomic Site Image Modality Network type Data CitationsShin et al. [94] (2016) Lung CT CNN - 783

Sirinukunwattana et al. [95] (2016) Abdomen Histopathology CNN 20000+ images 252Setio et al. [96] (2016) Lung CT CNN 888 images 247Xu et al. [97] (2016) Breast Histopathology AE 500 images 220

Wang et al. [98] (2016) Breast Histopathology CNN 400 images 182Kooi et al. [99] (2017) Breast Mammography CNN 45000 images 162

Rajpurkar et al. [100] (2017) Chest X-ray CNN 100000+ images 139Liu et al. [101] (2017) Breast Histopathology CNN 400 slides 98

Ghafoorian et al. [102] (2017) Brain MRI CNN LUNA16 ISBI 2016 90Dou et al. [103] (2017) Lung CT 3D CNN 1075 images 40

Zhang et al. [104] (2017) Brain MRI FCN 700 subjects 32

which is particularly useful for the segmentation task. Horieet al. [107] recently applied a CNN technique for easophagalcancer detection. They used 8,428 WGD images and attained98% results for the sensitivity. Yasaka et al. [108] used adeep CNN architecture for the diagnosis of three differentphases (noncontrast-agent enhanced, arterial, and delayed)of masses of liver in dynamic CT images.

3.1.6 MiscellaneousZhang et al. [109] achieved 98.51% accuracy and a local-ization error 2.45mm for the detection of inner ear in CTimages. They used 3D U-Net [110] to map the whole 3Dimage which consists of multiple convolution-pooling layersthat convert the raw input image into the low resolutionand highly abstracted feature maps. They applied falsepositive suppression technique in the training process andused a shape based constraint during training. Rajpurkar etal. [111] recently released a data set MURA which consists of40,561 images from 14,863 musculoskeletal studies labeledby radiologists as either normal or abnormal. The authorsused CNN with 169-layers for the detection of normalityand abnormality in each image study. Li et al. [112] proposeda Neural Conditional Random Field (NCRF) technique forthe metastasis cancer detection in whole slide images. Theirmodel was trained end-to-end using back-propagation andit obtained successful FROC score of 0.8096 in testing us-ing Camelyon16 dataset [113]. Codella et al. [114] recentlyorganized a challenge at the International Symposium onBiomedical Imaging (ISBI), 2017 for skin lession analysisfor melanoma detection. The challenge task provided 2,000training images, 150 validation images, and 600 images fortesting. It eventually published the results of 46 submission.We refer to [114] for further details on the challenge itselfand the submissions. We also mention few techniques inTable 1 related to the task of detection/localization. Thesemethods appeared in the literature in the years 2016-17.Based on Google Scholar’s citation index, these methods areamong the highly influential techniques in the current re-lated literature. This article also provides similar summariesof the highly influential papers from the years 2016-17 foreach pattern recognition task considered in the Sections tofollow.

3.2 SegmentationIn Medical Image Analysis, deep learning is being ex-tensively used for image segmentation with different

modalities, including Computed Tomography (CT), X-ray,Positron-Emission Tomography (PET), Ultrasound, Mag-netic Resonance Imaging (MRI) and Optical Cohrence To-mography (OCT) etc. Segmentation is the process of parti-tioning an image into different meaningful segments (thatshare similar characteristics) through automatic or semi-automatic outlining of the boundaries within the image. Inmedical imaging, these segments usually commensurate todifferent tissue classes, pathologies, organs or some otherbiological structure [115].

3.2.1 Brain

Related to the anatomical region of brain, Dey et al. [116]trained a complementary segmentation network, termedCompNet, for skull stripping in MRI scans for normal andpathological brain images. The OASIS dataset [117] wasused for the training purpose. In their approach, the featuresused for segmentation are learned using an encoder-decodernetwork trained from the images of brain tissues and itscomplimentary part outside the brain. The approach iscompared with a plain U-Net and a dense U-Net [118]. Theaccuracy achieved by the CompNet for the normal imagesis 98.27%, and for the pathological images is 97.62%. Theseresults are better than those achieved by [118].

Zaho et al. [119] proposed a deep learning techniquefor brain tumor segmentation by integrating Fully Convo-lutional Networks (FCNs) and Conditional Random Fields(CRFs) in a combined framework to achieve segmentationwith appearance and spatial consistency. They trained 3segmentation models using 2D image patches and slices.First, the training is performed for the FCN using imagepatches, then CRF is trained with a Recurrent Neural Net-work (CRF-RNN) using image slices. During this phase,the parameters of the FCN are fixed. Afterwards, FCNand CRF-RNN parameters are jointly fined tuned usingthe image slice. The authors used the MRI image dataprovided by Multimodal Brain Tumor Image SegmentationChallenge (BRATS) 2013, BRATS 2015 and BRATS 2016. Intheir work, Nair et al. [120] used a 3D CNN approach forthe segmentation and detection of Multiple Sclerosis (MS)lesions in MRI sequences. Roy et al. [121] used voxel-wiseBaysian FCN for the whole brain segmentation by usingMonte Carlo sampling. They demonstrated high accuracieson four datasets, namely MALC, ADNI-29, CANDI-13, andIBSR-18. Robinson et al. [122] also proposed a real-time deeplearning approach for the cardiavor MR segmentation.

Page 11: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

11

3.2.2 BreastIn their study, Singh et al. [123] described a conditionalGenerative Adversarial Networks (cGAN) model for breastmass segmentation in mammography. Experiments wereconducted on Digital Database for Screening Mammogra-phy (DDSM) public dataset and private dataset of mammo-grams from Hospital Universitari Sant Joan de Reus-Spai.They additionally used a simpler CNN to classify the seg-mented tumor area into different shapes (irregular, lobular,oval and round) and achieved an accuracy of 72% on DDSMdataset. Zhang et al. [124] exploited deep learning forimage intensity normalization in breast segmentation task.They used 460 subjects from Dynamic Contrast-enhancedMagnetic Resonance Imaging (DCEMRI). Each subject con-tained one T1 weighted pre-contrast and three T1 weightedpost-contrast images. Men et al. [125] trained a deep dilatedResNet for segmentation of Clinical Target Volume (CTV) inbreasts. Lee et al. [126] based on fully convolution neuralnetwork proposed an automated segmentation techniquefor the breast density estimation in mammograms. For theevaluation of their approach, they used full-field digitalscreening mammograms of 604 subjects. They fine tunedthe pre-trained network for breast density segmentationand estimation. The Percent Density (PD) estimation bytheir approach showed similarities with BI-RADS densityassessment by radiologists and outperformed the then state-of-the-art computational approaches.

3.2.3 EyeRetinal blood image segmentation is considered an im-portant and a challenging task in retinopathology. Zhanget al. [127] used a deep neural network for this pur-pose by exploiting the U-Net architecture [48] with resid-ual connection. They shown their results on three publicdatasets STARE [128], CHASEDB1 [129] and DRIVE [130]and achieved an AUC value of 97.99% for the DRIVEdataset. De et al. [131] applied deep learning for the retinaltissue segmentation. They used 14,884 three dimensionalOCT images for training their network. Their approachis claimed to be device independent - it maintains seg-mentation accuracy while using different device data. Inanother study of retinal blood vessels, Jebaseeli et al. [132]proposed a method to enhance the quality of retinal vesselsegmentation. They analysed the severity level of diabeticretinopathy. Their proposed method, Deep Learning BasedSVM (DLBSVM) model uses DRIVE, STARE, REVIEW, HRF,and DRIONS databases for training. Liu et al. [133] proposeda semi-supervised learning for retinal layer and fluid regionsegmentation in retinal OCT B-scans. Adversarial techniquewas exploited for the unlabeled data. Their technique re-sembles the U-Net fully convolutional architecture.

3.2.4 ChestDuan et al. [134] proposed a Deep Nested Level Set (DNLS)technique for the muti-region segmentation of cardiac MRimages in patients with Pulmonary Hypertension (PH).They compared their approach with a CNN method [135]and the conditional random field (CRF) CRF-CNN ap-proach [136]. DNLS is shown to outperform those tech-niques for all anatomical structures, specifically for my-ocardium. However, it requires more computations than

[135] which is the fastest method among the tested ap-proaches. Bai et al. [137] used FCN and RNN for thepixel-wise segmentation of Aortic sequences in MR images.They trained their model in an end-to-end manner fromsparse annotations by using a weighted loss function. Theproposed method consists of two parts, the first extractsfeatures from the FCN using a U-Net architecture [48]. Thesecond feeds these features to an RNN for segmentation.Among the used 500 Arotic MR images (provided by the UKBiobank), the study used random 400 images for training,and the rest were used for testing the models. Anotherrecent study on semi supervised myocardiac segmentationhas been conducted by Chartsias et al. [138], which waspresented as an oral paper in MICCAI 2018. Their proposednetwork, called Spatial Decomposition Network (SDNet),model 2D input images in two representations, namelyspatial representation of myocardiac as a binary mask anda latent representation of the remaining features in a vectorform. While not being a fully supervised techniques, theirmethod still achieves remarkable results for the segmenta-tion task.

Kervadec et al. [139] proposed a CNN based ENet [140]constrained loss function for segmentation of weakly super-vised Cardiac images. They achieved 90% accuracy on thepublic datasets of 2017 ACDC challenge5. Their approachis closes the gap between weakly and fully supervisedsegmentation in semantic medical imaging. In another studyof cariac CT and MRI images for multi-class image segmen-tation, Joyce et al. [141] proposed an adversarial approachconsisting of a shallow UNet like method. They also demon-strated improved segmentation with an unsupervised cost.

LanLonde et al. [142] introduced a CNN based tech-nique, termed SegCaps, by exploiting the capsule networks[143] for object segmentation. The authors exploited theLUNA16 subset of the LIDC-IDRI database and demon-strated the effectiveness of their method for analysing CTlung scans. It is shown that the method achieves bettersegmentation performance as compared to the popular U-Net. The SegCaps are able to handle large images with size512 × 512. In another study [144] of lung cancer segmen-tation using the LUNA16 dataset, Nam et al. proposed aCNN model using 24 convolution layers, 2 pooling, 2 decon-volutional layers and one fully connected layer. Similarly,Burlutskiy et al. [145] developed a deep learning frameworkfor lung cancer segmentation. They trained their modelusing the scans of 712 patients and tested on the scans of178 patients of fully annotated Tissue Micro-Arrays (TMAs).Their model is aimed at finding high potential cancer areasin TMA cores.

3.2.5 AbdomenRoth et al. [146] built a 3D FCN model for automatic se-mantic segmentation of 3D images. The model is trained onclinical Computed Tomography (CT) data, and it is shownto perform automated multi-organ segmentation of abdom-inal CT with 90% average Dice score across all targetedorgans. A CNN method, termed Kid-Net, is proposed forkidney vessels; artery, vein and collecting system (ureter)segmentation by Taha et al. [147]. Their model is trained in

5. https://www.creatis.insa-lyon.fr/Challenge/acdc/

Page 12: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

12

an end-to-end fashion using 3D CT-volume patches. Onepromising claim made by the authors is that their methodreduces kidney vessels segmentation time from hours tominutes. Their approach uses feature down-sampling andup-sampling to achieve higher classification and localiza-tion accuracies. Their network training methodology alsohandles unbalanced data, and focuses on reducing falsepositives. It is also claimed that the proposed method en-ables high-resolution segmentation with a limited memorybudget. The authors exploit the findings in [148] for thatpurpose.

Oktay et al. [149] recently presented an ‘attention gate’model to automatically find the target anatomy of differ-ent shapes and sizes. They essentially extended the U-Netmodel to an attention U-Net model for pancreas segmenta-tion. Their model can be utilized for organ localization anddetection tasks. They used 120 images of CT for trainingtheir model, and 30 images for testing. Overall, the algo-rithm achieves good performance with 2 to 3% increase indice score as compared to the existing methods. A relatedresearch on pancreas segmentation had been conductedpreviously using dense connection by Gibson et al. [150]and sparse convolutions by Heinrich et al. [151], [152],[153]. For multi-organ segmentation (i.e lung, heart, liver,bone) in unlabeled X-ray images, Zhang et al. [154] proposeda Task Driven Generative Adversarial Network (TD-GAN)automated technique. This is an unsupervised end-to-endmethod for medical image segmentation. They fine tuneda dense image-to-image network (DI2I) [46], [155] on syn-thetic Digitally Reconstructed Radiographs (DRRs) and X-ray images. In another study of multi organ segmentation,Tong et al. [156] proposed an FCN with a shape represen-tation model. Their experiments were carried out on H&Ndatasets of volumetric CT scans.

Yang et al. [157] used a conditional Generative Adver-sarial Network (cGAN) to segment the human liver in 3DCT images. Lessmann et al. [158] proposed an FCN basedtechnique for the automatic vetebra segmentation in CTimages. The underlying architecture of their network isinspired by U-Net. Their model is able to process a patchsize of 128 × 128 × 128 voxels. It achieves 95.8% accuracyfor classification and 92.1% for segmentation in the spinalimages used by the authors. Jin et al. [159] proposed a 3DCGAN to learn lung nodules conditioned on a VolumeOf Interest (VOI) with an erased central region in 3D CTimages. They trained their model on 1,000 nodules takenfrom LIDC dataset. The proposed CGAN was further usedto generate a dataset for Progressive Holistically NestedNetwork (P-HNN) model [160] which demonstrates im-proved segmentation performance.

3.2.6 MiscellaneousFor memory and computational efficiency, Xu et al. [161]applied a quantization mechanism to FCNs for the seg-mentation tasks in Medical Image Analysis. They alsoused quantization to mitigate the over fitting issue forbetter performance. The effectiveness of the developedmethod is demonstrated for 2015 MICCAI Gland Challengedataset [162]. As compared to [163] their method improvesthe results by up to 1% with 6.4× reduction in the memoryusage. Recently, Zhao et al. [119] proposed a deep learning

technique for 3D image instance segmentation. Their modelis trainable with weak annotations that needs 3D boundingboxes for all instances and full voxel annotations for only asmall fractions of instances. Liu et al. [164] employed a noveldeep reinforcement learning approach for the segmentationand classification of surgical gesture. Their approach per-forms well on JIGSAW dataset in terms of edit score as com-pared to previous similar works. Arif et al. [165] presented adeep FCN model called SPNet, as shape predictor for objectsegmentation. The X-ray images used in their study areof cervical vertebra. The dataset used in their experimentsincluded 124 training and 172 test images .Their SPNet wastrained for 30 epochs with a batch size of 50 images.

Sarker et al. [174] analyzed skin lesion segmentation withdeep learning. They used autoencoder and decoder net-works for feature extraction. The loss function is minimizedin their work by combining negative Log Likelihood andend-point-error for the segmentation of melanoma regionswith sharp edges. They evaluated their method SLSDeep onISBI datasets [114], [175] for skin lesion detection, achievingencouraging segmentation results. In another related studyof skin cancer, Mirikharaji et it. [176] also developed a deepFCN for skin lesion segmentation. They presented goodresults on ISBI 2017 dataset of dermoscopy images. Theyused two fully convolutional networks based on U-Net [48]and ResNet-DUC [22] in their technique. Yuan et al. [177]proposed a deep fully convolutional-deconvolutional neuralnetwork (CDNN) for the automatic skin lesion segmenta-tion, and acquired Jaccard index of 0.784 on the validationset of ISBI. Ambellan et al. [178] proposed a CNN basedon 3D Statistical Shape Models (SSMs) for the segmentationof knee and cartilage in MRI images. In Table 2, we alsosummarize few popular methods in medical image segmen-tation that appeared prior to the year 2018.

3.3 Registration

Image registration is a common task in medical imageanalysis that allows spatial alignment of images to a com-mon anatomical space [179]. It aims at aligning a sourceimage with a target image through transformations. Imageregistration is one of the main stream tasks in medical imageanalysis that has received ample attention even before thedeep learning era [180], [181], [182], [183], [184], [185], [186].Advent of deep learning has also caused neural networksto penetrate in medical image registration [187], [188], [189],[190].

3.3.1 BrainVan et al. [191] proposed a stacked bidirectional convolu-tional LSTM (C-LSTM) network for the reconstruction of3D images from the 4D spatio-temporal data. Previously,[192], [193] used CNN techniques for the reconstruction of3D CT and MRI images using four 3D convolutional layers.Lonning et al. [194] presented a deep learning method usingRecurrent Inference Machines (RIM) for the reconstructionof MRI. Deep learning based deformable image registrationhas also been recently performed by Sheikh et al. [195].They used deep FCN to generate spatial transformationsthrough under feed forward networks. In their experiments,they used cardiac MRI images from ACDC 2017 dataset

Page 13: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

13

TABLE 2: Summary of influential papers appeared in 2016 and 2017 (based on Google Scholar citation index in January2019) that exploit deep learning for the Segmentation tasks in Medical Image Analysis.

Reference Anatomic Site Image Modality Network type Data CitationsMilletari et al. [148] (2016) Brain MRI FCN Private data 550Ciceck et al. [110] (2016) Kidney CT 3D U-Net Private data 460Pereira et al. [166] (2016) Brain MRI CNN BRATS2013, 2015 411

Moeskops et al. [167] (2016) Brain MRI CNN 5 datasets 242Liskowski et al. [168] (2016) Eye Opthalmology DNN 400 images, DRIVE, STARE, CHASE 177Ibragimov et al. [169] (2016) Breast CT AE 1400 images 174

Havaei et al. [170] (2017) Brain MRI CNN 2013 BRATS 596Kamnistas et al. [171](2017) Brain MRI 11 layers 3D-CNN BRATS 2015 and ISLES 2015 517

Fang et al. [172] (2017) Eye OCT CNN 60 volumes 86Ibragimov et al. [173](2017) Liver CT CNN 50 images 56

and showed promising results in comparison to a movingmesh registration technique. Hou et al. [196] also proposeda learning based image registration technique using CNNs.They used 2D image slice transformation to construct 3Dimages using a canonical co-ordinate system. First, theysimulated their approach on fetal MRI images and then usedreal fetal brain MRI images for the experiments. Their workis also claimed to be promising for computational efficiency.In another study of image based registration [197], thesame group of authors evaluated their technique on CTand MRI datasets for different loss functions using SE(3)as a benchmark. They trained CNN directly on SE(3) andproposed a Riemannian manifold based formulation forpose estimation problem. The registration accuracy withtheir approach increased from 2D to 3D image based regis-tration as compared to previous methods. The authors fur-ther showed in [198] that CNN can reliably reconstruct 3Dimages using 2D image slices. Recently, Balakrishnan et al.[199] worked on 3D pairwise MR brain image registration.They proposed an unsupervised learning technique namedVoxelMorph CNN. They used a pair of two 3D imagesas input, with dimensions 160 × 192 × 224; and learnedshared parameters in their network for convolution layers.They demonstrated their method on 8 publicly availabledatasets of brain MRI images. On ABIDE dataset theirmodel achieved 1.5% improvement in the dice score. Itis claimed that their method is also computationally moreefficient than the exiting techniques for this problem.

3.3.2 Eye

Costa et al. [200] used adversarial autoencoders for the syn-thesis of retinal colored images. They trained a generativemodel to generate synthetic images and another modelto classify its output into a real or synthetic. The modelresults in an end-to-end retinal image synthesis systemand generates as many images as required by its users.It is demonstrated that the image space learned by themodel has an arguably well defined semantic structure.The synthesized images were shown to be visually andquantitatively different from the images used for trainingtheir model. The shown images reflect good visual quality.Mahapatra et al. [201] proposed an end-to-end deep learningmethod using generative adversarial networks for multi-modal image registration. They used retinal and cardiacimages for registration. Tang et al. [202] demonstrated arobust image registration approach based on mixture featureand structure preservation (MFSP) non rigid point matching

method. In their method they first extracted feature pointsby speed up robust feature (SURF) detector and partialintensity invariant feature descriptor (PIIFD) from modeland target retinal image. Then they used MFSP for featuremap detection.

Pan et al. [203] developed a deep learning techniqueto remove eye position difference for longitudinal 3Dretinal OCT images. In their method, they first performpre-processing for projection image then, to detect vesselshadows, they apply enhancement filters. The SURF al-gorithm [204] is then used to extract the feature points,whereas RANSAC [205] is applied for cleaning the outliers.Mahapatra et al. [206] also proposed an end-to-end deeplearning technique for image registration. They used GANswhich registered images in a single pass with deformationfield. They used ADAM optimizer [11] for minimizing thenetwork cost and trained the model using the Mean SquareError (MSE) loss.

3.3.3 ChestRelating to the anatomical region of chest, Eppenhof etal. [207] proposed a 3D FCN based technique for theregistration of CT lung inspiration-expiration image pairs.They validated the performance of their method usingtwo datasets, namely DIRLAB [208] and CREATIS [209].In general, there is a growing perception in the MedicalImaging community that Deep learning is a promising toolfor 2D and 3D image registration for the chest regions.De et al. [210] also trained a CNN model for the affineand deformable image registration. Their technique allowsregistration of the pairs of unseen images in a single pass.They applied their technique to cardiac cine MRI and chestCT images for registration. Zheng et al. [211] trained aCNN model for 2D/3D image registration problem undera Pairwise Domain Adaptation (PDA) technique that usessynthetic data. It is claimed that their method can learneffective representations for image registration with onlya limited number of training images. They demonstratedgeneralization and flexibility of their method for clinicalapplications. Their PDA method can be specially suitablewhere small training data is available.

3.3.4 AbdomenLv et al. [212] proposed a CNN based technique for the3D MRI abdomen image registration. They trained theirmodel for the spatial transformation analysis of differentimages. To demonstrate the effectiveness of their technique,

Page 14: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

14

they compared their method with three other approachesand claimed a reduction in the reconstruction time from1h to 1 minute. In another related study, Lv et al. [213]proposed a deep learning framework based on the popularU-net architecture. To evaluate the performance of theirtechnique they used 8 ROI’s from cortex and medulla ofsegmented kidney. It is demonstrated by the authors thatduring free breathing measurements, their normalized root-mean-square error (NRMSE) values for cortex and medullawere significantly lower after registration.

3.3.5 MiscellaneousYan et al. [214] presented an Adversarial Image Registration(AIR) method for multi-modal image MR-TRUS registra-tion [215]. They trained two deep networks concurrently,one for the generator component of the adversarial frame-work, and the other for the discriminator component. Intheir work, the authors learned not only an image regis-tration network but also a so-called metric network whichcomputes the quality of image registration. The data usedin their experimentation consists of 763 sets of 3D TRUSvolume and 2D MR volume with 512x512x26 voxels. Thedeveloped AIR network is also evaluated on clinical datasetsacquired through image-fusion guided prostate biopsy pro-cedures. For the visualization of 3D medical image dataZhao et al. [216] recently proposed a deep learning basedtechnique, named Respond-weighted Class Activation Map-ping (Respond-CAM). As compared to Grade-CAM [217]they claim better performance. Elss et al. [218] also employedConvolutional networks for single phase image motion incardiac CT 2D/3D images. They trained regression networkto successfully learn 2D motion estimation vectors. We alsosummarize few worth noting contributions from the years2016 and 2017 in Table 3.

3.4 ClassificationClassification of images is a long standing problem in Med-ical Image Analysis and other related fields, e.g. ComputerVision. In the context of medical imaging, classificationbecomes a fundamental task for Computer Aided Diagnosis(CAD). Hence, it is no surprise that many researchers haverecently tried to exploit the advances of deep learning forthis task in medical imaging.

3.4.1 BrainRelating to the anatomical region of Brain, Li et al. [228] useddeep learning to detect Autism Spectrum disorder (ASD)in functional Magnetic Resonance Imaging (fMRI). Theydeveloped a 2-stage neural network method. For the firststage, they trained a CNN (2CC3D) with 6 convolutionallayers, 4 max-pooling layers and 2 fully connected layers.Their network uses a sigmoid output layer. For the secondstage, in order to detect biomarkers for ASD, they tookadvantage of the anatomical structure of brain fMRI. Theydeveloped a frequency normalized sampling method forthat purpose. Their method is evaluated using multipledatabases, showing robust results for neurological functionof biomarkers.

In their work, Hosseini-Asl et al. [229] employed an auto-encoder architecture for diagnosing Alzheimer’s Disease

(AD) patients. They reported up to 99% accuracy on ADNIdataset. They exploited Transfer Learning to handle the datascarcity issue, and used a model that is pre-trained with thehelp of CAD Dementia dataset. Their network architectureis based on 3D convolutional kernels that models genericbrain features from sMRI data. The overall classificationprocess in their technique first spatially normalizes the brainsMRI data, then it learns the 3D CNN model using thenormalized data. The model is eventually fine-tuned onthe target domain, where the fine-tuning is performed ina supervised manner.

Recently, Yan et al. [230] proposed a deep chronectomelearning framework for the classification of MCI in brainusing Full Bidirectional Long Short-Term Memory (Full-BiLSTM) networks. Their method can be divided into twoparts, firstly a Full-LSTM is used to gather time varyinginformation in brain for which MCI can be diagnosed.Secondly, to mine the contextual information hidden indFC, they applied BiLSTM to access long range contextin both directions. They reported the performance of theirmodel on public dataset ADNI-2,achieving 73.6% accuracy.Hensfeld et al. [231] also proposed a deep learning algorithmfor the Autism Spectrum disorder (ASD) classification inrs-fMRI images on multi-site database ABIDE. They useddenoising autoencoders for unsupervised pretraining. Theclassification accuracy achieved by their algorithm on thesaid dataset is 70%.

In the context of classification related to the anatomicalregion of brain, Soussia et al. [232] provided a review of 28papers from 2010 to 2016 published in MICCAI. They re-viewed neuroimaging-based technical methods developedfor the Alzheimer Disease (AD) and Mild-Cognitive Im-pairment (MCI) classification tasks. The majority of papersused MRI for dementia classification and few worked topredict MCI conversion to AD at later observations. We referto [232] for the detailed discussions on the contributionsreviewed by this article. Gutierrez et al. [233] proposed adeep neural network, termed Multi-structure point network(MSPNet), for the shape analysis on multiple structures.This network is inspired by PointNet [234] that can directlyprocess point clouds. MSPNet achieves good classificationaccuracy for AD and MCI for the ADNI database.

3.4.2 BreastAwan et al. [235] proposed to use more context informationfor breast image classification. They used features of a CNNthat is pre-trained on ICIAR 2018 dataset for histologicalimages [236], and classified breast cancer as benign, carci-noma insitu (CIS) or breast invasive carcinoma (BIC). Theirtechnique performs patch based and context aware imageclassification. They used ResNet50 architecture and over-lapping patches of size 512×512. The extracted features areclassified using a Support Vector Machine in their approach.Due to the unavailability of large-scale data, they usedrandom rotation and flipping data augmentation techniquesduring the training process. It is claimed that their trainedmodel can also be applied to other tasks where contextualinformation and high resolution are required for optimalprediction.

When only weak annotations are available for images,such as in heterogeneous images, it is often useful to turn

Page 15: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

15

TABLE 3: Summary of influential papers appeared in 2016 and 2017 (based on Google Scholar citation index in January2019) that exploit deep learning for the Registration task in Medical Image Analysis.

Reference Anatomic Site Imaging Modality Network type Data CitationsMiao et al. [219] (2016) Chest X-ray CNN regression Synthetic 101Wu et al. [220] (2016) Brain MRI CNN LONI, ADNI databases 58

Simonvosky et al. [221] (2016) - Multiple CNN IXI 57Yang et al. [222] (2016) Brain MRI Encoder-decoder OASIS dataet 40

Barahmi et al. [193] (2016) Brain MRI CNN 15 subjects 36Zhang et al. [223] (2016) Head, Abdomen, Chest CT CNN Private 30

Nie et al. [224] (2017) Multi task MRI, CT FCN Private 110Kang et al. [225] (2017) Abdomen CT CNN CT low-dose Grand Challenge 98Yang et al. [190] (2017) Brain MRI Encoder-decoder 373 OASIS and 375 IBIS images 64De et al. [226] (2017) Brain MRI CNN Sunnybrook Cardiac Data [227] 50

to multiple instance learning (MIL). Courture et al. [237]described a CNN using quantile function for the classi-fication of 5 types of breast tumor histology. They fine-tuned AlexNet [23]. The data used in their study consistsof 1,713 images from the Carolina Breast Cancer Study,Phase 3 [238]. They improved the classification accuracyon this dataset from 68.6 to 85.6 for estrogen receptor (ER)task in breast images. Recently, MIL has also been usedfor breast cancer classification in [239] and [240] that per-form patch based classification of histopathology images.Antropova et al. [241] used 690 cases with 5 fold cross-validation of MRI maximum intensity projection for breastlession classification. They used a pre-trained VGGNet [31]for feature extraction, followed by an SVM classifier. Ribliet al. [242] applied a CNN based on VGG16 for lessionclassification in mammograms. They trained their modelusing DDSM dataset and tested it on INbreast [243]. Theyachieved the second best score for Mammography DREAMChallenge, with AUC of 0.95. Zheng et al. [244] proposed aCAD technique for breast cancer classification using CNNbased on pre-trained VGG-19 model. They evaluated theirtechnique’s performance on digital mammograms of pairsof 69 cancerous and 27 healthy subjects. They achieved thevalues of 0.928 for sensitivity and 0.991 for specificity ofclassification.

3.4.3 Eye

Pertaining to the region of eye, Gergeya et al. [71] took a datadriven approach using deep learning to classify Diabeticretinopathy (DR) in color fundus images. The authors usedpublic databases MESSIDOR 2 and E-ophtha to train andtest their models and achieved 0.94 and 0.95 AUC scorerespectively on the test partitions of these datasets. A con-volutional network is also employed by Pratt et al. [245]for diagnosing and classifying the severity of DR in colorfundus images. Their model is trained using the Kaggledatasets, and it achieved 75% DR severicity accuracy. Simi-larly, Ayhan et al. [246] also exploited the deep CNN archi-tecture of ResNet50 [247] for the fundus image classification.Mateen et al. [248] proposed a DR classification system basedon VGG-19. They also performed evaluation using Kaggledataset of 35,126 fundus images. It is claimed that theirmodel outperforms the more conventional techniques, e.g.SIFT as well as earlier deep networks, e.g. AlexNet in termsof accuracy for the same task.

3.4.4 Chest

Dey et al. [249] studied 3D CNNs for the diagnostic classi-fication of lung cancer between benign and malignant inCT images. Four networks were analyzed for their clas-sification task, namely a basic 3D CNN; a multi-outputCNN; a 3D DenseNet, and an augmented 3D DenseNet withmulti-outputs. They employed the public dataset LIDC-IDRI with 1, 010 CT images and a private dataset of 47 CTimages with both malignant and benign in this study. Thebest results are achieved by the 3D multi-output DenseNet(MoDenseNet) on both datasets, having accuracy 90.40% ascompared to previously reported accuracy of 89.90% [250].Gao et al. [251] proposed a deep CNN for the classificationof Interstitial Lung Disease (IDL) patterns on CT images.Previously, batch based algorithms were being used for thispurpose [252], [253]. In contrast, Gao et al. performed holisticclassification using the complete image as network input.Their experiments used a public data [254] on which theclassification accuracy improved to 87.9% from the previousresults of 86.1% [253]. For the holistic image classification,the overall accuracy of 68.8% was achieved. In anotherwork, Hoo et al. [94] et al. also analyzed three CNN ar-chitectures, namely CifarNet, AlexNet, and GoogLeNet, forinterstitial lung disease classification.

Biffi et al. [255] proposed a 3D convolutional genera-tive model for the classification of cardiac diseases. Theyachieved impressive performance for the classification ofhealthy and hypertrophic cardiomyopathy MR images. ForACDC MICCAI 2017 dataset they were able to achieve90% accuracy for classification of healthy subjects. Chenet al. [256] proposed a CNN based technique RadBot-CXRto categorize focal lung opacities, diffuse lung opacity, car-diomegaly, and abnormal hilar prominence in chest X-rayimages. They claim that their algorithm showed radiologistslevel performance for this task. Wang et al. [257] useddeep learning in analyzing histopathology images for thewhole slide lung cancer classification. Coudray et al. [258]used Inception3 CNN model to analyze whole slide imagesto classify lung cancer between adenocarcinoma (LUAD),squamous cell carcinoma (LUSC) or normal tissues. More-over, they also trained their model for the prediction often most common mutated genes in LUAD and achievedgood accuracies. Masood et al. [86] proposed a deep learningapproach DFCNet based on FCN, which is used to classifythe four stages of detected pulmonary lung cancer nodule.

Page 16: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

16

3.4.5 AbdomenRelating to abdomen, Tomczak et al. [259] employed deepMultiple Instance Learning (MIL) framework [260] for theclassification of esophageal cancer in histopathology im-ages. In another contribution, Frid et al. [261] used GANsfor the synthetic medical image data generation. Theymade classification of CT liver images as their test bedand performed classification of 182 lesions. The authorsdemostrated that by using augmented data with the GANframework, upto 7% improvement is possible in classifi-cation accuracy. For automatic classification of ultrasoundabdominal images, Xu et al. [262] proposed a multi-tasklearning framework based on CNN. For the experimentsthey used 187,219 ultrasound images and claimed betterclassification accuracy than human clinical experts.

3.4.6 MiscellaneousEsteva et al. [263] presented a CNN model to classify skincancer lesions. Their model is trained in an end-to-end man-ner directly from the images, taking pixels and disease labelsas inputs. They used datasets of 129,450 clinical imagesconsisting of 2,032 different diseases to train CNN. Theyclassified two most common cancer diseases; keratinocytecarcinomous versus benign seborrheic keratosis, and thedeadliest skin cancer; malignant melanomas versus benignnevi. For skin cancer sun exposure classification, Combaliaet al. [264] also applied a Monte Carlo method to only high-light the most elistic regions. Antony et al. [265] employed adeep CNN model for automatic quantification of severity ofknee osteoarthritis (OA). They used Kellgren and Lawrence(KL) grades to assess the severity of knee. In their work,using deep CNN pre-trained on ImageNet and fine-tunedon knee OA images resulted in good classification perfor-mance. Paserin et al. [266] recently worked on diagnosisand classification of developmental dysplasia of hip (DDH).They proposed a CNN-RNN technique for 3D ultrasoundvolumes for DDH. Their model consists of convolutionallayers for feature learning followed by recurrent layers forspatial relationship of their responses. Inspired by VGGnetwork [31], they used CNN with 5 convolutional layersfor feature extraction with ReLU activations, and 2×2 max-pooling with a stride of 2. Finally, they used LSTM networkthat has 256 units. They achieved 82% accuracy with AUC0.83 for 20 test volumes. Few notable contributions from theyears 2016 and 2017 are also summarized in Table 4.

4 DATASETS

Good quality data has always remained the primary re-quirement for learning reliable computational models. Thisis also true for deep models that also have the additionalrequirement of consuming large amount of training data.Recently, many public datasets for medical imaging taskshave started to emerge. There is also a growing trend inthe research community to compile lists of these datasets.For instance we can find few useful compilation of publicdataset lists at Github repositories and other webpages.Few medical image analysis products are also helping inproviding public datasets. Whereas detailed discussion onthe currently available public datasets for medical imag-ing tasks is outside the scope of this article, we provide

typical examples of the commonly used datasets in med-ical imaging by deep learning approaches in Table 5. TheTable is not intended to provide an exhaustive list. Werecommend the readers internet search for that purpose.A brief search can result in a long compilation of medicalimaging datasets. However, we summarize few examplesof contemporary datasets in Table 5 to make an importantpoint regarding deep learning research in the context ofMedical Image Analysis. With the exception of few datasets,the public datasets currently available for medical imagingtasks are small in terms of the number of samples andpatients. As compared to the datasets for general ComputerVision problems, where datasets typically range from fewhundred thousand to millions of annotated images, thedataset sizes for Medical imaging tasks are too small. Onthe other hand, we can see the emerging trend in MedicalImaging community of adopting the practices of broaderPattern Recognition community, and aiming at learningdeep models in end-to-end fashion. However, the broadercommunity has generally adopted such practices based onthe availability of large-scale annotated datasets, which is animportant requirement for inducing reliable deep models.Hence, it remains to be seen that how effectively end-to-end trained models can really perform the medical imageanalysis tasks without over-fitting to the training datasets.

5 CHALLENGES IN GOING DEEP

In this Section, we discuss the major challenges faced in fullyexploiting the powers of Deep Learning in Medical ImageAnalysis. Instead of describing the issues encountered inspecific tasks, we focus more on the fundamental challengesand explain the root causes of these problems for the Medi-cal Imaging community that can also help in understandingthe task-specific challenges. Dealing with these challenges isthe topic of discussion in Section 6.

5.0.0.1 Lack of appropriately annotated data: It canbe argued that the single aspect of Deep Learning that setsit apart from the rest of Machine Learning techniques is itsability to model extremely complex mathematical functions.Generally, we introduce more layers to learn more complexmodels - i.e. go deep. However, a deeper network must alsolearn more model parameters. A model with a large numberof parameters can only generalize well if we correspond-ingly use a very large amount of data to infer the parametervalues. This phenomenon is fundamental to any MachineLearning technique. A complex model inferred using alimited amount of data normally over-fits to the used dataand performs poorly on any other data. Such modeling ishighly undesirable because it gives a false impression oflearning the actual data distribution whereas the model isonly learning the peculiarities of the used training data.

Learning deep models is inherently unsuitable for thedomains where only limited amount of training data isavailable. Unfortunately, Medical Imaging is one such do-main. For most of the problems in Medical Image Analysis,there is only a limited amount of data that is annotated in amanner that is suitable to learn powerful deep models. Weencounter the problem of ‘lack of appropriately annotateddata’ so frequently in the current Deep Learning relatedMedical Imaging literature that it is not difficult to single out

Page 17: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

17

TABLE 4: Summary of notable contributions appearing in 2016 and 2017 that exploit deep learning for the Classificationtask in Medical Image Analysis. The citation index is based on Google Scholar (January 2019).

Reference Anatomic Site Image Modality Network type Data CitationsAnthimopoulos et al. [267] (2016) Lung CT CNN Uni. Hospital of Geneva [254]& Inselspital 253

Kallenberg et al. [268] (2016) Breast Mammography CNN 3 different databases 111Huynh et al. [269] (2016) Breast Mammography CNN Private data (219 lesions) 76

Yan et al. [270] (2016) 12 regions CT CNN Synthetic & Private 73Zhang et al. [223] (2016) Breast Elastography MLP Private data (227 images) 50Esteva et al. [263] (2017) Skin Histopatholohy CNN Private large-scale 1386

Sun et al. [271] (2017) Breast Mammography CNN Private data 40Christodoulidis et al. [272] (2017) Lung CT CNN Texture data as Transfer learning source 36

Lekadir et al. [273] (2017) Heart US CNN Private data 26Nibali et al. [250] (2017) Lung CT CNN LIDC/IDRI dataset 22

TABLE 5: Examples of popular databases used by Medical Image Analysis techniques that exploit Deep Learning.

Sr. Database Anatomic site Image modality Main task Patients/Images1 ILD [254] Lung CT Classification 120 patients2 LIDC-IDRI [90] Lung CT Detection 1,018 patients3 ADNI [58] Brain MRI Classification ¿800 patients4 MURA [111] Musculoskeletal X-ray Detection 40,561 images5 TCIA Multiple Multiple Multiple ¿35,000 patients6 BRATS [274] Brain MRI Segmentation -7 DDSM [275] Breast Mamography Segmentation 2,620 patients8 MESSIDOR-2 [276], [277] Eye OCT Classification 1,748 images9 ChestX-ray14 [278] Chest X-ray Multiple ¿100,000 images10 ACDC 2017 Brain MRI Classification 150 patients11 2015 MICCAI Gland Challenge Glands Histopathology Segmentation 165 images12 OAI Knee X-ray, MRI Multiple 4,796 patients13 DRIVE [128], [279] Eye SLO Segmentation 400 patients14 STARE [130] Eye SLO Segmentation 400 images15 CHASEDB1 [129] Eye SLO Segmentation 28 images16 OASIS-3 [57], [280], [281], [282], [283], [284], [117] Brain MRI, PET Segmentation 1,098 patients17 MIAS [285] Breast Mammography Classification 322 patients18 ISLES 2018 Brain MRI Segmentation 103 patients19 HVSMR 2018 [286] Heart CMR Segmentation 4 patients20 CAMELYON17 [113] Breast WSI Segmentation 899 images21 ISIC 2018 Skin JPEG Detection 2,600 images22 OpenNeuro Brain Multiple Classification 4,718 patients23 ABIDE Brain MRI Disease Diagnosis 1,114 patients24 INbreast [243] Breast Mammography Detection/Classification 410 images

this problem as ‘the fundamental challenge’ that MedicalImaging community is currently facing in fully exploitingthe advances in Deep Learning.

The Computer Vision community has been able to takefull advantage of Deep Learning because data annotationis relatively straightforward in that domain. Simple crowdsourcing can yield millions of accurately annotated images.This is not possible for Medical Images that require highlevel of specific expertise for annotation. Moreover, thestakes are also very high due to the nature of medicalapplication, requiring extra care in annotation. Although wecan also find large number of images in medical domain viasystems like PACS and OIS, however using them to traindeep models is still not easy because they lack appropriatelevel of annotations that is generally required for traininguseful deep models.

With only a few exceptions, e.g. [278] the public datasetsavailable in the Medical Imaging domain are not large-scale - a requirement for training effective deep models.In addition to the issues of hiding patient’s privacy, onemajor problem in forming large-scale public datasets is thatthe concrete labels required for computational modelingcan often not be easily inferred from medical reports. Thisis problematic if inter-observers are used to create large-

scale datasets. Moreover, the required annotations for deepmodels often do not perfectly align with the general medicalroutines. This becomes an additional problem even for theexperts to provide noise-free annotations.

Due to the primary importance of large-scale trainingdatasets in Deep Learning there is an obvious need todevelop such public datasets for Medical Imaging tasks.However, considering the practical challenges in accom-plishing this goal it is also imperative to simultaneouslydevelop techniques of exploiting Deep Learning with lessamount of data. We provide discussion on future directionsalong both of these dimension in Section 6.

5.0.0.2 Imbalanced data: One problem that occursmuch more commonly in Medical Imaging tasks as com-pared to general Computer Vision tasks is the imbalanceof samples in datasets. For instance, a dataset to train amodel for detecting breast cancer in mammograms maycontain only a limited number of positive samples but a verylarge number of negative samples. Training deep networkswith imbalanced data can induce models that are biased.Considering the low frequency of occurrences of positivesamples for many Medical Imaging tasks, balancing out theoriginal data can become as hard as developing large-scaledataset. Hence, extra care must be taken in inducing deep

Page 18: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

18

models for Medical Imaging tasks.5.0.0.3 Lack of confidence interval: Whereas the

Deep Learning literature often refers to the output of amodel as ‘prediction confidence’; the output signal of a neu-ron can only be interpreted as a single probability value. Thelack of provision of confidence interval around a predictedvalue is generally not desirable in Medical Imaging tasks.Litjens et al. [25] has noted that an increasing number ofdeep learning methods in Medical Imaging are striving tolearn deep models in an end-to-end manner. Whereas end-to-end learning is the epitome of Deep Learning, it is notcertain if this is the right way to exploit this technologyin Medical Imaging. To an extent, this conundrum is alsohindering the widespread use of Deep Learning in MedicalImaging.

6 FUTURE DIRECTIONS

With the recent increasing trend of exploiting Deep Learningin Medical Imaging tasks, we are likely to see a large influxof papers in this area in the near future. Here, we provideguidelines and directions to help those works in dealingwith the inherent challenges faced by Deep Learning inMedical Image Analysis. We draw our insights from thereviewed literature and the literature in the sister fields ofComputer Vision, Pattern Recognition and Machine Learn-ing. Due to the earlier use of Deep Learning in those fields,the techniques of dealing with the related challenges haveconsiderably matured in those areas. Hence, Medical ImageAnalysis can readily benefit from those findings in settingfruitful future directions.

Our discussion in this Section is primarily aimed atproviding guiding principles for the Medical Imaging com-munity. Therefore, we limit it to the fundamental issuesin Deep Learning. Based on the challenges mentioned inthe preceding Section, and the insights from the parallelscientific fields, we present our discussion along three di-rections, addressing the following major questions. (1) Howcan Medical Image Analysis still benefit from Deep Learningin the absence of large-scale annotated datasets? (2) Whatcan be done for developing large-scale Medical Imagingdatasets. (3) What should be the broader outlook of thisresearch direction to catapult it in taking full advantage ofthe advances in Deep Learning?

6.1 Dealing with smaller data size

6.1.1 Disentangling Medical Task Transfer LearningConsidering the obvious lack of large-scale annotateddatasets, Medical Imaging community has already startedexploiting ‘transfer learning’ [287], [288], [289]. In transferlearning, one can learn a complex model using data froma source domain where large-scale annotated images areavailable (e.g. natural images). Then, the model is furtherfine-tuned with data of the target domain where only a smallnumber of annotated images are available (e.g. medicalimages). It is clear from the literature that transfer learn-ing is proving advantageous for Medical Image Analysis.Nevertheless, one promising recent development in transferlearning [290] in Computer Vision literature remains com-pletely unexplored for Medical Image Analysis.

Zamir et al. [290] recently showed that performanceof transfer learning can be improved by carefully select-ing the source and target domains/tasks. By organizingdifferent tasks that let the deep models transfer well be-tween themselves, they developed a so-called ‘taskonomy’to guide the use of transfer learning for natural images.This concept has received significant appreciation in theComputer Vision community, resulting in the ‘best paper’award for the authors at the prestigious IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2018. Asimilar concept is worth exploring for the data deprivedMedical Imaging tasks. Disentangling medical tasks fortransfer learning may prove very beneficial. Another relatedresearch direction that can help in dealing with smallerdata size is to quantify the suitability of transfer learningbetween medical imaging and natural imaging tasks. Adefinitive understanding of the knowledge transfer abilitiesof the existing natural image models to the medical taskscan have a huge impact in Medical Image Analysis usingDeep Learning.

6.1.2 Wrapping Deep Features for Medical Imaging TasksThe existing literature shows an increasing trend of trainingdeep models for Medical tasks in an ‘end-to-end’ manner.For Deep Learning, end-to-end modeling is generally morepromising for the domains where large-scale annotateddata is available. Exploiting the existing deep models asfeature extractors and then performing further learning onthose features is a much more promising direction in theabsence of large-scale training datasets. There is a consid-erable evidence in the Pattern Recognition literature thatactivation signals of deeper layers in neural networks oftenform highly expressive image features. For natural images,Akhtar et al. [291] demonstrated that features extracted fromdeep models can be used to learn further effective higherlevel features using the techniques that require less trainingsamples. They used Dictionary Learning framework [292]to further wrap the deep features before using them with aclassifier.

We note that Medical Image Analysis literature hasalready seen reasonably successful attempts of using theexisting natural image deep models as feature extractorse.g. [293], [294], [295]. However, such attempts generallydirectly feed the features extracted from the pre-trainedmodels to a classifier. The direction we point towards entailpost-processing of deep features to better suit the require-ments of the underlying Medical Image Analysis task.

6.1.3 Training Partially Frozen Deep NetworksAs a general principle in Machine Learning, more amountof training data is required to train more complex com-putational models. In Deep Learning, the network depthnormally governs the model complexity, whereas deepernetworks also have more parameters that require large-scaledatasets for training. It is known that the layers of CNNs- the most relevant neural networks for image analysis -systematically break down images into their features fromlower level of abstraction to higher level of abstraction [296].It is also known that the initial layers of CNNs learn verysimilar filters for a variety of natural images. These observa-tions point towards the possibility of reducing the number

Page 19: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

19

of learn-able parameters in a network by freezing few of itslayers to the parameter values that are likely to be similar fora variety of images. Those parameter values can be directlyborrowed from other networks trained on similar tasks. Theremainder of the network - that now has less parametersbut has the same complexity - can then be trained for thetarget task as normal. Training partially frozen networks forMedical Imaging task can mitigate the issues caused by thelack of large-scale annotated datasets.

6.1.4 Using GANs for Synthetic Data GenerationGenerative Adversarial Networks (GANs) [39] are currentlyreceiving tremendous attention of Computer Vision commu-nity for their ability to mimic the distributions from whichimages are sampled. Among other uses of GANs, one canuse the GAN framework to generate realistic synthetic im-ages for any domain. These images can then be used to traindeeper models for that domain that generally outperformthe models trained with only (limited) original data. Thisproperty of GANs is of particular interest for Medical ImageAnalysis. Therefore, we can expect to see a large number offuture contributions in Medical Imaging that will exploitGANs. In fact, our literature review already found fewinitial applications of GANs in medical image analysis [297],[298], [200], [299]. However, extra care should be takenwhile exploiting the GAN framework. It should be notedthat GANs do not actually learn the original distributionof images, rather they only mimic it. Hence, the syntheticimages generated from GANs can still be very different fromthe original images. Therefore, instead of training the finalmodel with the data that includes GAN-generated data, itis often better to finally fine-tune such model with only theoriginal images.

6.1.5 Miscellaneous Data Augmentation TechniquesIn general, Computer Vision and Pattern Recognition lit-erature has also developed few elementary data augmen-tation techniques that have shown improvement in theperformance of deep models. Whereas these techniques aregenerally not as effective as sophisticated methods, such asusing GANs to increase data samples; they are still worthtaking advantage of. We list the most successful techniquesbelow. Again, we note that some of these methods havealready proven their effectiveness in the context of MedicalImage Analysis:

• Image flipping: A simple sideways flip of imagesdoubles the number of training samples, that oftenresults in a better model. For medical images, top-down flip is also a possibility due to the nature ofimages.

• Image cropping: Cropping different areas of a largerimage into smaller images and treating each oneof the cropped versions as an original image alsobenefits deep models. Five crops of equal dimensionsfrom an image is a popular strategy in ComputerVision literature. The crops are made using the fourcorners and the central region of the image.

• Adversarial training: Very recently, it is discovered thatwe can ‘fool’ deep models using adversarial images [2].These images are carefully computed such that they

appear the same as the original images to humans,however, a deep model is not able to recognizethem. Whereas developing such images is a differentresearch direction, one finding from that directionis that including those images in training data canimprove the performance of deep models [300]. Sinceadversarial examples are generated from the originalimages, they provide a useful data augmentationmethod that can be harnessed for Medical Imagingtasks.

• Rotation and random noise addition: In the context of3D data, rotating the 3D scans and adding smallamount of random noise (emulating jitters) are alsoconsidered useful data augmentation strategies [234].

6.2 Enhancing dataset sizesWhereas the techniques discussed in Section 6.1 can alle-viate the issues caused by smaller training datasets, theroot cause of those problems can only be eliminated byacquiring Deep Learning compatible large-scale annotateddatasets for Medical Image Analysis tasks. Considering thatDeep Learning has started outperforming human experts inMedical Image Analysis tasks [301], there is a strong needto implement protocols that make medical reports readilyconvertible to the formats useful for training computationalmodels, especially Deep Learning models. In this context,techniques from the fields of Document Analysis [302] andNatural Language Processing (NLP) [303] can be used toalleviate the extra burden falling on the medical experts dueto the implementations of such protocols.

Besides generating the new data at large-scales thatis useful in learning computational models, it is also im-portant to take advantage of the existing medical recordsfor exploiting the current advances in Deep Learning. Tohandle the large volume of un-organized data (in termsof compatibility with Machine Learning), data mining withhumans-in-the-loop [304] and active learning [27] can provebeneficial. Advances in Document Analysis and NLP canalso be exploited for this task.

6.3 Broader outlookWe can make one important observation regarding DeepLearning research by exploring the literature of differentresearch fields. That is, the advancement in Deep Learn-ing research has often experienced a quantum leap underthe breakthroughs provided by different sister fields. Forexample, the ‘residual learning’ concept [22] that enabledvery deep networks was first introduced in the literature ofComputer Vision. This idea (along with other breakthroughsin core Machine Learning research) eventually enabled thetabula rasa algorithm of AlphaGo Zero [305]. Following upon this observation, we can argue that significant advancescan be made in Deep Learning research in the context ofMedical Image Analysis if researchers from the sister fieldsof Computer Vision and Machine Learning are able to betterunderstand the Medical Image Analysis tasks.

Indeed, Medical Imaging community already involvesexperts from other related fields. However, this involve-ment is at a smaller scale. For the involvement of broaderMachine Learning and Computer Vision communities, a

Page 20: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

20

major hindrance is the Medical Imaging literature jargon.Medical literature is not easily understood by the experts ofother fields. One effective method to mitigate this problemcan be regular organization of Medical Imaging Workshopsand Tutorials in the reputed Computer Vision and MachineLearning Conferences, e.g. IEEE CVPR, ICCV, NeurIPS andICML. These events should particularly focus on playing therole of translating the Medical Imaging problems to othercommunities in terms of their topics of interest.

Another effective strategy to take advantage of DeepLearning advances is to outsource the Medical Imagingproblems by organizing online challenges, e.g. Kaggle com-petitions. The authors are already aware of few Kagglecompetitions related to Medical Imaging, e.g. Histopatho-logic cancer detection. However, we can easily notice thatMedical Imaging competitions are normally attracting fewerteams as compared to other competitions - currently 361for Histopathologic cancer detection. Generally, the numberof teams are orders of magnitude lower for the MedicalImaging competitions than those for the typical imagingcompetitions. In authors’ opinion, strict Medical parlanceadopted in organizing such competitions is the source of thisproblem. Explanation of Medical Imaging tasks using theterms more common among Computer Vision and MachineLearning communities can greatly help in improving thepopularity of Medical Imaging in those communities.

In short, one of the key strategies to fully exploit DeepLearning advances in Medical Imaging is to get the expertsfrom other fields, especially Computer Vision and MachineLearning; to involve in solving Medical Imaging tasks. Tothat end, the Medical Imaging community must put anextra effort in making its literature, online competitionsand the overall outlook of the filed more understandableto the experts from the other fields. Deep Learning is beingdubbed as ‘modern electricity’ by the experts. In the future,its ubiquitous nature will benefit those fields the most thatare better understood by the wider communities.

7 CONCLUSION

This article presented a review of the recent literature inDeep Learning for Medical Imaging. It contributed alongthree major directions. First, we presented an instructiveintroduction to the core concepts of Deep Learning. Keepingin view the general lack of understanding of Deep Learningframework among Medical Imaging researchers, we keptour discussion intuitive. This part of the paper can be un-derstood as a tutorial of Deep Learning concepts commonlyused in Medical Imaging. The second part of the paperpresented a comprehensive overview of the approaches inMedical Imaging that employ Deep Learning. Due the avail-ability of other review articles until the year 2017, we mainlyfocused on the literature published in the year 2018. Thethird major part of the article discussed the major challengesfaced by Deep Learning in Medical Image Analysis, anddiscussed the future directions to address those challenges.

Beside focusing on the very recent literature, this articleis also different from the existing related literature surveysin that it provides a Computer Vision/Machine Learningperspective to the use of Deep Learning in Medical ImageAnalysis. Using that perspective, we are not only able to

provide an intuitive understanding of the core concepts inDeep Learning for the Medical community, we also high-light the root cause of the challenges faced in this directionand recommend fruitful future directions by drawing on theinsights from multiple scientific fields.

From the reviewed literature, we can single out the ‘lackof large-scale annotated datasets’ as the major problem forDeep Learning in Medical Image Analysis. We have dis-cussed and recommended multiple strategies for the Medi-cal Imaging community that are adopted to address similarproblems in the sister scientific fields. We can conclude thatMedical Imaging can benefit significantly more from DeepLearning by encouraging collaborative research with Com-puter Vision and Machine Learning research communities.

REFERENCES

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol.521, no. 7553, p. 436, 2015.

[2] N. Akhtar and A. Mian, “Threat of adversarial attacks on deeplearning in computer vision: A survey,” IEEE Access, vol. 6, pp.14 410–14 430, 2018.

[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequencelearning with neural networks,” in Advances in neural informationprocessing systems, 2014, pp. 3104–3112.

[4] T. Ciodaro, D. Deva, J. De Seixas, and D. Damazio, “Onlineparticle detection with neural networks based on topologicalcalorimetry information,” in Journal of physics: conference series,vol. 368, no. 1. IOP Publishing, 2012, p. 012030.

[5] H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, D. Merico,R. K. Yuen, Y. Hua, S. Gueroussov, H. S. Najafabadi, T. R. Hugheset al., “The human splicing code reveals new insights into thegenetic determinants of disease,” Science, vol. 347, no. 6218, p.1254806, 2015.

[6] M. Helmstaedter, K. L. Briggman, S. C. Turaga, V. Jain, H. S.Seung, and W. Denk, “Connectomic reconstruction of the innerplexiform layer in the mouse retina,” Nature, vol. 500, no. 7461,p. 168, 2013.

[7] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik,“Deep neural nets as a method for quantitative structure–activityrelationships,” Journal of chemical information and modeling, vol. 55,no. 2, pp. 263–274, 2015.

[8] R. J. Schalkoff, Artificial neural networks, vol. 1.[9] F. Rosenblatt, “Principles of neurodynamics. perceptrons and the

theory of brain mechanisms,” CORNELL AERONAUTICAL LABINC BUFFALO NY, Tech. Rep., 1961.

[10] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learningrepresentations by back-propagating errors,” nature, vol. 323, no.6088, p. 533, 1986.

[11] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti-mization,” arXiv preprint arXiv:1412.6980, 2014.

[12] N. Qian, “On the momentum term in gradient descent learningalgorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.

[13] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deeplearning. MIT press Cambridge, 2016, vol. 1.

[14] R. Beale and T. Jackson, Neural Computing-an introduction. CRCPress, 1990.

[15] H. Becker, W. Nettleton, P. Meyers, J. Sweeney, and C. Nice,“Digital computer determination of a medical diagnostic indexdirectly from chest x-ray images,” IEEE Transactions on BiomedicalEngineering, no. 3, pp. 67–72, 1964.

[16] G. S. Lodwick, T. E. Keats, and J. P. Dorst, “The coding of roent-gen images for computer analysis as applied to lung cancer,”Radiology, vol. 81, no. 2, pp. 185–200, 1963.

[17] Y. Wu, M. L. Giger, K. Doi, C. J. Vyborny, R. A. Schmidt, and C. E.Metz, “Artificial neural networks in mammography: applicationto decision making in the diagnosis of breast cancer.” Radiology,vol. 187, no. 1, pp. 81–87, 1993.

[18] S.-C. Lo, S.-L. Lou, J.-S. Lin, M. T. Freedman, M. V. Chien, andS. K. Mun, “Artificial convolution neural network techniquesand applications for lung nodule detection,” IEEE Transactionson Medical Imaging, vol. 14, no. 4, pp. 711–718, 1995.

Page 21: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

21

[19] B. Sahiner, H.-P. Chan, N. Petrick, D. Wei, M. A. Helvie, D. D.Adler, and M. M. Goodsitt, “Classification of mass and normalbreast tissue: a convolution neural network classifier with spatialdomain and texture images,” IEEE transactions on Medical Imag-ing, vol. 15, no. 5, pp. 598–610, 1996.

[20] H.-P. Chan, S.-C. B. Lo, B. Sahiner, K. L. Lam, and M. A.Helvie, “Computer-aided detection of mammographic microcal-cifications: Pattern recognition with an artificial neural network,”Medical Physics, vol. 22, no. 10, pp. 1555–1567, 1995.

[21] W. Zhang, K. Doi, M. L. Giger, R. M. Nishikawa, and R. A.Schmidt, “An improved shift-invariant artificial neural networkfor computerized detection of clustered microcalcifications indigital mammograms,” Medical Physics, vol. 23, no. 4, pp. 595–601, 1996.

[22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learningfor image recognition,” in Proceedings of the IEEE conference oncomputer vision and pattern recognition, 2016, pp. 770–778.

[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-cation with deep convolutional neural networks,” in Advances inneural information processing systems, 2012, pp. 1097–1105.

[24] B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker,K. H. Cha, R. M. Summers, and M. L. Giger, “Deep learning inmedical imaging and radiation therapy,” Medical physics, 2018.

[25] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I.Sanchez, “A survey on deep learning in medical image analysis,”Medical image analysis, vol. 42, pp. 60–88, 2017.

[26] J. Ker, L. Wang, J. Rao, and T. Lim, “Deep learning applications inmedical image analysis,” IEEE Access, vol. 6, pp. 9375–9389, 2018.

[27] X. Zhu, “Semi-supervised learning literature survey,” ComputerScience, University of Wisconsin-Madison, vol. 2, no. 3, p. 4, 2006.

[28] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcementlearning: A survey,” Journal of artificial intelligence research, vol. 4,pp. 237–285, 1996.

[29] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-basedlearning applied to document recognition,” Proceedings of theIEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[30] I. Sobel and G. Feldman, “A 3x3 isotropic gradient operator forimage processing,” a talk at the Stanford Artificial Project in, pp.271–272, 1968.

[31] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” arXiv preprintarXiv:1409.1556, 2014.

[32] S. Sun, N. Akhtar, H. Song, A. Mian, and M. Shah, “Deepaffinity network for multiple object tracking,” arXiv preprintarXiv:1810.11780, 2018.

[33] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” arXivpreprint arXiv:1502.03167, 2015.

[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[35] C. Poultney, S. Chopra, Y. L. Cun et al., “Efficient learning ofsparse representations with an energy-based model,” in Advancesin neural information processing systems, 2007, pp. 1137–1144.

[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,“Stacked denoising autoencoders: Learning useful representa-tions in a deep network with a local denoising criterion,” Journalof machine learning research, vol. 11, no. Dec, pp. 3371–3408, 2010.

[37] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013.

[38] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contrac-tive auto-encoders: Explicit invariance during feature extraction,”in Proceedings of the 28th International Conference on InternationalConference on Machine Learning. Omnipress, 2011, pp. 833–840.

[39] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adver-sarial nets,” in Advances in neural information processing systems,2014, pp. 2672–2680.

[40] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, andR. Webb, “Learning from simulated and unsupervised imagesthrough adversarial training.” in CVPR, vol. 2, no. 4, 2017, p. 5.

[41] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krish-nan, “Unsupervised pixel-level domain adaptation with genera-tive adversarial networks,” in The IEEE Conference on ComputerVision and Pattern Recognition (CVPR), vol. 1, no. 2, 2017, p. 7.

[42] R. A. Yeh, C. Chen, T.-Y. Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do, “Semantic image inpainting with deepgenerative models.” in CVPR, vol. 2, no. 3, 2017, p. 4.

[43] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper withconvolutions,” in Proceedings of the IEEE conference on computervision and pattern recognition, 2015, pp. 1–9.

[44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,“Rethinking the inception architecture for computer vision,” inProceedings of the IEEE conference on computer vision and patternrecognition, 2016, pp. 2818–2826.

[45] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections onlearning.” in AAAI, vol. 4, 2017, p. 12.

[46] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,“Densely connected convolutional networks.” in CVPR, vol. 1,no. 2, 2017, p. 3.

[47] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional net-works for semantic segmentation,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2015, pp.3431–3440.

[48] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutionalnetworks for biomedical image segmentation,” in InternationalConference on Medical image computing and computer-assisted inter-vention. Springer, 2015, pp. 234–241.

[49] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: asystem for large-scale machine learning.” in OSDI, vol. 16, 2016,pp. 265–283.

[50] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito,Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automaticdifferentiation in pytorch,” 2017.

[51] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecturefor fast feature embedding,” in Proceedings of the 22nd ACMinternational conference on Multimedia. ACM, 2014, pp. 675–678.

[52] F. Chollet et al., “Keras,” https://keras.io, 2015.[53] Theano Development Team, “Theano: A Python framework

for fast computation of mathematical expressions,” arXive-prints, vol. abs/1605.02688, May 2016. [Online]. Available:http://arxiv.org/abs/1605.02688

[54] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neuralnetworks for matlab,” in Proceedings of the 23rd ACM internationalconference on Multimedia. ACM, 2015, pp. 689–692.

[55] R. Collobert, S. Bengio, and J. Mariethoz, “Torch: a modularmachine learning software library,” Idiap, Tech. Rep., 2002.

[56] J. Islam and Y. Zhang, “Early diagnosis of alzheimers disease:A neuroimaging study with deep learning architectures,” inProceedings of the IEEE Conference on Computer Vision and PatternRecognition Workshops, 2018, pp. 1881–1883.

[57] D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, andR. L. Buckner, “Open access series of imaging studies: longi-tudinal mri data in nondemented and demented older adults,”Journal of cognitive neuroscience, vol. 22, no. 12, pp. 2677–2684,2010.

[58] R. C. Petersen, P. Aisen, L. A. Beckett, M. Donohue, A. Gamst,D. J. Harvey, C. Jack, W. Jagust, L. Shaw, A. Toga et al.,“Alzheimer’s disease neuroimaging initiative (adni): clinicalcharacterization,” Neurology, vol. 74, no. 3, pp. 201–209, 2010.

[59] X. Chen and E. Konukoglu, “Unsupervised detection of lesionsin brain mri using constrained adversarial auto-encoders,” arXivpreprint arXiv:1806.04972, 2018.

[60] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Ad-versarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.

[61] Z. Alaverdyan, J. Jung, R. Bouet, and C. Lartizien, “Regularizedsiamese neural network for unsupervised outlier detection onbrain multiparametric magnetic resonance imaging: applicationto epilepsy lesion screening,” 2018.

[62] A. Panda, T. K. Mishra, and V. G. Phaniharam, “Automatedbrain tumor detection using discriminative clustering based mrisegmentation,” in Smart Innovations in Communication and Compu-tational Sciences. Springer, 2019, pp. 117–126.

[63] K. R. Laukamp, F. Thiele, G. Shakirin, D. Zopfs, A. Faymonville,M. Timmer, D. Maintz, M. Perkuhn, and J. Borggrefe, “Fullyautomated detection and segmentation of meningiomas usingdeep learning on routine multiparametric mri,” European radi-ology, vol. 29, no. 1, pp. 124–132, 2019.

Page 22: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

22

[64] B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken,N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen,Q. F. Manson, M. Balkenhol et al., “Diagnostic assessment of deeplearning algorithms for detection of lymph node metastases inwomen with breast cancer,” Jama, vol. 318, no. 22, pp. 2199–2210,2017.

[65] T.-C. Chiang, Y.-S. Huang, R.-T. Chen, C.-S. Huang, and R.-F.Chang, “Tumor detection in automated breast ultrasound using3-d cnn and prioritized candidate aggregation,” IEEE Transactionson Medical Imaging, vol. 38, no. 1, pp. 240–249, 2019.

[66] M. U. Dalmıs, S. Vreemann, T. Kooi, R. M. Mann, N. Karssemeijer,and A. Gubern-Merida, “Fully automated detection of breastcancer in screening mri using convolutional neural networks,”Journal of Medical Imaging, vol. 5, no. 1, p. 014502, 2018.

[67] J. Zhang, E. H. Cain, A. Saha, Z. Zhu, and M. A. Mazurowski,“Breast mass detection in mammography and tomosynthesisvia fully convolutional network-based heatmap regression,” inMedical Imaging 2018: Computer-Aided Diagnosis, vol. 10575. In-ternational Society for Optics and Photonics, 2018, p. 1057525.

[68] F. Li, H. Chen, Z. Liu, X. Zhang, and Z. Wu, “Fully automateddetection of retinal disorders by image-based deep learning,”Graefe’s Archive for Clinical and Experimental Ophthalmology, pp.1–11, 2019.

[69] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Im-agenet: A large-scale hierarchical image database,” in ComputerVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conferenceon. Ieee, 2009, pp. 248–255.

[70] M. D. Abramoff, Y. Lou, A. Erginay, W. Clarida, R. Amelon,J. C. Folk, and M. Niemeijer, “Improved automated detectionof diabetic retinopathy on a publicly available dataset throughintegration of deep learning,” Investigative ophthalmology & visualscience, vol. 57, no. 13, pp. 5200–5206, 2016.

[71] R. Gargeya and T. Leng, “Automated identification of diabeticretinopathy using deep learning,” Ophthalmology, vol. 124, no. 7,pp. 962–969, 2017.

[72] T. Schlegl, S. M. Waldstein, H. Bogunovic, F. Endstraßer,A. Sadeghipour, A.-M. Philip, D. Podkowinski, B. S. Gerendas,G. Langs, and U. Schmidt-Erfurth, “Fully automated detectionand quantification of macular fluid in oct using deep learning,”Ophthalmology, vol. 125, no. 4, pp. 549–558, 2018.

[73] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang,S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al., “Identi-fying medical diagnoses and treatable diseases by image-baseddeep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018.

[74] U. Food, D. Administration et al., “Fda permits marketing of ar-tificial intelligence-based device to detect certain diabetes-relatedeye problems,” News Release, April, 2018.

[75] Z. Li, Y. He, S. Keel, W. Meng, R. T. Chang, and M. He, “Efficacyof a deep learning system for detecting glaucomatous opticneuropathy based on color fundus photographs,” Ophthalmology,2018.

[76] M. Christopher, A. Belghith, C. Bowd, J. A. Proudfoot, M. H.Goldbaum, R. N. Weinreb, C. A. Girkin, J. M. Liebmann, andL. M. Zangwill, “Performance of deep learning architectures andtransfer learning for detecting glaucomatous optic neuropathyin fundus photographs,” Scientific reports, vol. 8, no. 1, p. 16685,2018.

[77] P. Khojasteh, L. A. P. Junior, T. Carvalho, E. Rezende, B. Aliah-mad, J. P. Papa, and D. K. Kumar, “Exudate detection in fundusimages using deeply-learnable features,” Computers in biology andmedicine, vol. 104, pp. 62–69, 2019.

[78] R. Kalviainen and H. Uusitalo, “Diaretdb1 diabetic retinopathydatabase and evaluation protocol,” in Medical Image Understand-ing and Analysis, vol. 2007. Citeseer, 2007, p. 61.

[79] E. Decenciere, G. Cazuguel, X. Zhang, G. Thibault, J.-C. Klein,F. Meyer, B. Marcotegui, G. Quellec, M. Lamard, R. Danno et al.,“Teleophta: Machine learning and image processing methods forteleophthalmology,” Irbm, vol. 34, no. 2, pp. 196–203, 2013.

[80] W. Zhu, Y. S. Vang, Y. Huang, and X. Xie, “Deepem: Deep3d convnets with em for weakly supervised pulmonary noduledetection,” arXiv preprint arXiv:1805.05373, 2018.

[81] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van denBogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurtset al., “Validation, comparison, and combination of algorithmsfor automatic detection of pulmonary nodules in computed to-mography images: the luna16 challenge,” Medical image analysis,vol. 42, pp. 1–13, 2017.

[82] I. Oksuz, B. Ruijsink, E. Puyol-Anton, A. Bustin, G. Cruz, C. Pri-eto, D. Rueckert, J. A. Schnabel, and A. P. King, “Deep learningusing k-space based data augmentation for automated cardiac mrmotion artefact detection,” in International Conference on MedicalImage Computing and Computer-Assisted Intervention. Springer,2018, pp. 250–258.

[83] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri,“Learning spatiotemporal features with 3d convolutional net-works,” in Proceedings of the IEEE international conference on com-puter vision, 2015, pp. 4489–4497.

[84] Z. Li, C. Wang, M. Han, Y. Xue, W. Wei, L.-J. Li, and F.-F. Li,“Thoracic disease identification and localization with limitedsupervision,” arXiv preprint arXiv:1711.06373, 2017.

[85] X. Yi, S. Adams, P. Babyn, and A. Elnajmi, “Automatic catheterdetection in pediatric x-ray images using a scale-recurrent net-work and synthetic data,” arXiv preprint arXiv:1806.00921, 2018.

[86] A. Masood, B. Sheng, P. Li, X. Hou, X. Wei, J. Qin, and D. Feng,“Computer-assisted decision support system in pulmonary can-cer detection and stage classification on ct images,” Journal ofbiomedical informatics, vol. 79, pp. 117–128, 2018.

[87] G. Gonzalez, S. Y. Ash, G. Vegas-Sanchez-Ferrero,J. Onieva Onieva, F. N. Rahaghi, J. C. Ross, A. Dıaz, R. SanJose Estepar, and G. R. Washko, “Disease staging and prognosisin smokers using deep learning in chest computed tomography,”American journal of respiratory and critical care medicine, vol. 197,no. 2, pp. 193–203, 2018.

[88] C. Gonzalez-Gonzalo, B. Liefers, B. van Ginneken, and C. I.Sanchez, “Improving weakly-supervised lesion localization withiterative saliency map refinement,” 2018.

[89] M. Winkels and T. S. Cohen, “3d g-cnns for pulmonary noduledetection,” arXiv preprint arXiv:1804.04656, 2018.

[90] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R.Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A.Hoffman et al., “The lung image database consortium (lidc) andimage database resource initiative (idri): a completed referencedatabase of lung nodules on ct scans,” Medical physics, vol. 38,no. 2, pp. 915–931, 2011.

[91] A. Alansary, O. Oktay, Y. Li, L. Le Folgoc, B. Hou, G. Vaillant,B. Glocker, B. Kainz, and D. Rueckert, “Evaluating reinforcementlearning agents for anatomical landmark detection,” 2018.

[92] M. Ferlaino, C. A. Glastonbury, C. Motta-Mejia, M. Vatish,I. Granne, S. Kennedy, C. M. Lindgren, and C. Nellaker, “Towardsdeep cellular phenotyping in placental histology,” arXiv preprintarXiv:1804.03270, 2018.

[93] F.-C. Ghesu, B. Georgescu, Y. Zheng, S. Grbic, A. Maier, J. Horneg-ger, and D. Comaniciu, “Multi-scale deep reinforcement learningfor real-time 3d-landmark detection in ct scans,” IEEE transactionson pattern analysis and machine intelligence, vol. 41, no. 1, pp. 176–189, 2019.

[94] S. Hoo-Chang, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues,J. Yao, D. Mollura, and R. M. Summers, “Deep convolutionalneural networks for computer-aided detection: Cnn architectures,dataset characteristics and transfer learning,” IEEE transactions onmedical imaging, vol. 35, no. 5, p. 1285, 2016.

[95] K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. Snead,I. A. Cree, and N. M. Rajpoot, “Locality sensitive deep learningfor detection and classification of nuclei in routine colon cancerhistology images,” IEEE transactions on medical imaging, vol. 35,no. 5, pp. 1196–1206, 2016.

[96] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J.Van Riel, M. M. W. Wille, M. Naqibullah, C. I. Sanchez, andB. van Ginneken, “Pulmonary nodule detection in ct images: falsepositive reduction using multi-view convolutional networks,”IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1160–1169,2016.

[97] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and A. Madab-hushi, “Stacked sparse autoencoder (ssae) for nuclei detection onbreast cancer histopathology images,” IEEE transactions on medicalimaging, vol. 35, no. 1, pp. 119–130, 2016.

[98] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck,“Deep learning for identifying metastatic breast cancer,” arXivpreprint arXiv:1606.05718, 2016.

[99] T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Merida, C. I.Sanchez, R. Mann, A. den Heeten, and N. Karssemeijer, “Largescale deep learning for computer aided detection of mammo-graphic lesions,” Medical image analysis, vol. 35, pp. 303–312, 2017.

Page 23: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

23

[100] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan,D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya et al., “Chexnet:Radiologist-level pneumonia detection on chest x-rays with deeplearning,” arXiv preprint arXiv:1711.05225, 2017.

[101] Y. Liu, K. Gadepalli, M. Norouzi, G. E. Dahl, T. Kohlberger,A. Boyko, S. Venugopalan, A. Timofeev, P. Q. Nelson, G. S. Cor-rado et al., “Detecting cancer metastases on gigapixel pathologyimages,” arXiv preprint arXiv:1703.02442, 2017.

[102] M. Ghafoorian, N. Karssemeijer, T. Heskes, M. Bergkamp,J. Wissink, J. Obels, K. Keizer, F.-E. de Leeuw, B. van Ginneken,E. Marchiori et al., “Deep multi-scale location-aware 3d convo-lutional neural networks for automated detection of lacunes ofpresumed vascular origin,” NeuroImage: Clinical, vol. 14, pp. 391–399, 2017.

[103] Q. Dou, H. Chen, L. Yu, J. Qin, and P.-A. Heng, “Multilevelcontextual 3-d cnns for false positive reduction in pulmonarynodule detection,” IEEE Transactions on Biomedical Engineering,vol. 64, no. 7, pp. 1558–1567, 2017.

[104] J. Zhang, M. Liu, and D. Shen, “Detecting anatomical landmarksfrom limited medical imaging data using two-stage task-orienteddeep neural networks,” IEEE Transactions on Image Processing,vol. 26, no. 10, pp. 4753–4764, 2017.

[105] A. Katzmann, A. Muehlberg, M. Suehling, D. Noerenberg, J. W.Holch, V. Heinemann, and H.-M. Gross, “Predicting lesiongrowth and patient survival in colorectal cancer patients usingdeep neural networks,” 2018.

[106] Q. Meng, C. Baumgartner, M. Sinclair, J. Housden, M. Rajchl,A. Gomez, B. Hou, N. Toussaint, V. Zimmer, J. Tan et al., “Auto-matic shadow detection in 2d ultrasound images,” in Data DrivenTreatment Response Assessment and Preterm, Perinatal, and PaediatricImage Analysis. Springer, 2018, pp. 66–75.

[107] Y. Horie, T. Yoshio, K. Aoyama, S. Yoshimizu, Y. Horiuchi,A. Ishiyama, T. Hirasawa, T. Tsuchida, T. Ozawa, S. Ishiharaet al., “Diagnostic outcomes of esophageal cancer by artificial in-telligence using convolutional neural networks,” Gastrointestinalendoscopy, vol. 89, no. 1, pp. 25–32, 2019.

[108] K. Yasaka, H. Akai, O. Abe, and S. Kiryu, “Deep learning withconvolutional neural network for differentiation of liver massesat dynamic contrast-enhanced ct: a preliminary study,” Radiology,vol. 286, no. 3, pp. 887–896, 2017.

[109] D. Zhang, J. Wang, J. H. Noble, and B. M. Dawant, “Accuratedetection of inner ears in head cts using a deep volume-to-volume regression network with false positive suppression anda shape-based constraint,” arXiv preprint arXiv:1806.04725, 2018.

[110] O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ron-neberger, “3d u-net: learning dense volumetric segmentationfrom sparse annotation,” in International Conference on MedicalImage Computing and Computer-Assisted Intervention. Springer,2016, pp. 424–432.

[111] P. Rajpurkar, J. Irvin, A. Bagul, D. Ding, T. Duan, H. Mehta,B. Yang, K. Zhu, D. Laird, R. L. Ball et al., “Mura dataset: To-wards radiologist-level abnormality detection in musculoskeletalradiographs,” arXiv preprint arXiv:1712.06957, 2017.

[112] Y. Li and W. Ping, “Cancer metastasis detection with neuralconditional random field,” arXiv preprint arXiv:1806.07064, 2018.

[113] P. Bandi, O. Geessink, Q. Manson, M. van Dijk, M. Balkenhol,M. Hermsen, B. E. Bejnordi, B. Lee, K. Paeng, A. Zhong et al.,“From detection of individual metastases to classification oflymph node status at the patient level: the camelyon17 chal-lenge,” IEEE Transactions on Medical Imaging, 2018.

[114] N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A.Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kit-tler et al., “Skin lesion analysis toward melanoma detection: Achallenge at the 2017 international symposium on biomedicalimaging (isbi), hosted by the international skin imaging collab-oration (isic),” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15thInternational Symposium on. IEEE, 2018, pp. 168–172.

[115] M. Forouzanfar, N. Forghani, and M. Teshnehlab, “Parameteroptimization of improved fuzzy c-means clustering algorithmfor brain mr image segmentation,” Engineering Applications ofArtificial Intelligence, vol. 23, no. 2, pp. 160–168, 2010.

[116] R. Dey and Y. Hong, “Compnet: Complementary segmentationnetwork for brain mri extraction,” arXiv preprint arXiv:1804.00521,2018.

[117] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris,and R. L. Buckner, “Open access series of imaging studies (oasis):cross-sectional mri data in young, middle aged, nondemented,

and demented older adults,” Journal of cognitive neuroscience,vol. 19, no. 9, pp. 1498–1507, 2007.

[118] J. Kleesiek, G. Urban, A. Hubert, D. Schwarz, K. Maier-Hein,M. Bendszus, and A. Biller, “Deep mri brain extraction: a 3dconvolutional neural network for skull stripping,” NeuroImage,vol. 129, pp. 460–469, 2016.

[119] X. Zhao, Y. Wu, G. Song, Z. Li, Y. Zhang, and Y. Fan, “Adeep learning model integrating fcnns and crfs for brain tumorsegmentation,” Medical image analysis, vol. 43, pp. 98–111, 2018.

[120] T. Nair, D. Precup, D. L. Arnold, and T. Arbel, “Exploringuncertainty measures in deep networks for multiple sclerosislesion detection and segmentation,” in International Conferenceon Medical Image Computing and Computer-Assisted Intervention.Springer, 2018, pp. 655–663.

[121] A. G. Roy, S. Conjeti, N. Navab, and C. Wachinger, “Inherentbrain segmentation quality control from fully convnet montecarlo sampling,” arXiv preprint arXiv:1804.07046, 2018.

[122] R. Robinson, O. Oktay, W. Bai, V. V. Valindria, M. M. Sanghvi,N. Aung, J. M. Paiva, F. Zemrak, K. Fung, E. Lukaschuk et al.,“Subject-level prediction of segmentation failure using real-timeconvolutional neural nets,” 2018.

[123] V. K. Singh, S. Romani, H. A. Rashwan, F. Akram, N. Pandey,M. Sarker, M. Kamal, J. T. Barrena, A. Saleh, M. Arenas et al.,“Conditional generative adversarial and convolutional networksfor x-ray breast mass segmentation and shape classification,”arXiv preprint arXiv:1805.10207, 2018.

[124] J. Zhang, A. Saha, B. J. Soher, and M. A. Mazurowski, “Au-tomatic deep learning-based normalization of breast dynamiccontrast-enhanced magnetic resonance images,” arXiv preprintarXiv:1807.02152, 2018.

[125] K. Men, T. Zhang, X. Chen, B. Chen, Y. Tang, S. Wang, Y. Li, andJ. Dai, “Fully automatic and robust segmentation of the clinicaltarget volume for radiotherapy of breast cancer using big dataand deep learning,” Physica Medica, vol. 50, pp. 13–19, 2018.

[126] J. Lee and R. M. Nishikawa, “Automated mammographic breastdensity estimation using a fully convolutional network,” Medicalphysics, vol. 45, no. 3, pp. 1178–1190, 2018.

[127] Y. Zhang and A. Chung, “Deep supervision with additionallabels for retinal vessel segmentation task,” arXiv preprintarXiv:1806.02132, 2018.

[128] J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever, andB. Van Ginneken, “Ridge-based vessel segmentation in colorimages of the retina,” IEEE transactions on medical imaging, vol. 23,no. 4, pp. 501–509, 2004.

[129] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R.Rudnicka, C. G. Owen, and S. A. Barman, “An ensembleclassification-based approach applied to retinal blood vessel seg-mentation,” IEEE Transactions on Biomedical Engineering, vol. 59,no. 9, pp. 2538–2548, 2012.

[130] A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating bloodvessels in retinal images by piecewise threshold probing of amatched filter response,” IEEE Transactions on Medical imaging,vol. 19, no. 3, pp. 203–210, 2000.

[131] J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov,N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. ODonoghue,D. Visentin et al., “Clinically applicable deep learning for diagno-sis and referral in retinal disease,” Nature medicine, vol. 24, no. 9,p. 1342, 2018.

[132] T. J. Jebaseeli, C. A. D. Durai, and J. D. Peter, “Segmentation ofretinal blood vessels from ophthalmologic diabetic retinopathyimages,” Computers & Electrical Engineering, vol. 73, pp. 245–258,2019.

[133] X. Liu, J. Cao, T. Fu, Z. Pan, W. Hu, K. Zhang, and J. Liu, “Semi-supervised automatic segmentation of layer and fluid region inretinal optical coherence tomography images using adversariallearning,” IEEE Access, vol. 7, pp. 3046–3061, 2019.

[134] J. Duan, J. Schlemper, W. Bai, T. J. Dawes, G. Bello, G. Doumou,A. De Marvao, D. P. ORegan, and D. Rueckert, “Deep nested levelsets: Fully automated segmentation of cardiac mr images in pa-tients with pulmonary hypertension,” in International Conferenceon Medical Image Computing and Computer-Assisted Intervention.Springer, 2018, pp. 595–603.

[135] W. Bai, M. Sinclair, G. Tarroni, O. Oktay, M. Rajchl, G. Vail-lant, A. M. Lee, N. Aung, E. Lukaschuk, M. M. Sanghvi et al.,“Human-level cmr image analysis with deep fully convolutionalnetworks,” arXiv preprint arXiv:1710.09289, 2017.

Page 24: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

24

[136] P. Krahenbuhl and V. Koltun, “Efficient inference in fully con-nected crfs with gaussian edge potentials,” in Advances in neuralinformation processing systems, 2011, pp. 109–117.

[137] W. Bai, H. Suzuki, C. Qin, G. Tarroni, O. Oktay, P. M. Matthews,and D. Rueckert, “Recurrent neural networks for aortic imagesequence segmentation with sparse annotations,” in InternationalConference on Medical Image Computing and Computer-Assisted In-tervention. Springer, 2018, pp. 586–594.

[138] A. Chartsias, T. Joyce, G. Papanastasiou, S. Semple, M. Williams,D. Newby, R. Dharmakumar, and S. A. Tsaftaris, “Factorisedspatial representation learning: application in semi-supervisedmyocardial segmentation,” arXiv preprint arXiv:1803.07031, 2018.

[139] H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, and I. B.Ayed, “Constrained-cnn losses forweakly supervised segmenta-tion,” arXiv preprint arXiv:1805.04628.

[140] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: Adeep neural network architecture for real-time semantic segmen-tation,” arXiv preprint arXiv:1606.02147, 2016.

[141] T. Joyce, A. Chartsias, and S. A. Tsaftaris, “Deep multi-classsegmentation without ground-truth labels,” 2018.

[142] R. LaLonde and U. Bagci, “Capsules for object segmentation,”arXiv preprint arXiv:1804.04241, 2018.

[143] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing betweencapsules,” in Advances in Neural Information Processing Systems,2017, pp. 3856–3866.

[144] C.-M. Nam, J. Kim, and K. J. Lee, “Lung nodule segmentationwith convolutional neural network trained by simple diameterinformation,” 2018.

[145] N. Burlutskiy, F. Gu, L. K. Wilen, M. Backman, and P. Micke,“A deep learning framework for automatic diagnosis in lungcancer,” arXiv preprint arXiv:1807.10466, 2018.

[146] H. R. Roth, C. Shen, H. Oda, M. Oda, Y. Hayashi, K. Misawa, andK. Mori, “Deep learning and its application to medical imagesegmentation,” Medical Imaging Technology, vol. 36, no. 2, pp. 63–71, 2018.

[147] A. Taha, P. Lo, J. Li, and T. Zhao, “Kid-net: Convolution networksfor kidney vessels segmentation from ct-volumes,” arXiv preprintarXiv:1806.06769, 2018.

[148] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolu-tional neural networks for volumetric medical image segmenta-tion,” in 3D Vision (3DV), 2016 Fourth International Conference on.IEEE, 2016, pp. 565–571.

[149] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Mi-sawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al.,“Attention u-net: Learning where to look for the pancreas,” arXivpreprint arXiv:1804.03999, 2018.

[150] E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gu-rusamy, B. R. Davidson, S. P. Pereira, M. J. Clarkson, and D. C.Barratt, “Towards image-guided pancreas and biliary endoscopy:automatic multi-organ segmentation on abdominal ct with densedilated networks,” in International Conference on Medical ImageComputing and Computer-Assisted Intervention. Springer, 2017,pp. 728–736.

[151] M. P. Heinrich, O. Oktay, and N. Bouteldja, “Obelisk-one kernelto solve nearly everything: Unified 3d binary convolutions forimage analysis,” 2018.

[152] M. P. Heinrich and O. Oktay, “Briefnet: deep pancreas segmenta-tion using binary sparse convolutions,” in International Conferenceon Medical Image Computing and Computer-Assisted Intervention.Springer, 2017, pp. 329–337.

[153] M. P. Heinrich, M. Blendowski, and O. Oktay, “Ternarynet: fasterdeep model inference without gpus for medical 3d segmentationusing sparse and binary convolutions,” International journal ofcomputer assisted radiology and surgery, pp. 1–10, 2018.

[154] Y. Zhang, S. Miao, T. Mansi, and R. Liao, “Task driven generativemodeling for unsupervised domain adaptation: Application tox-ray image segmentation,” arXiv preprint arXiv:1806.07201, 2018.

[155] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,”arXiv preprint, 2017.

[156] N. Tong, S. Gou, S. Yang, D. Ruan, and K. Sheng, “Fully automaticmulti-organ segmentation for head and neck cancer radiotherapyusing shape representation model constrained fully convolu-tional neural networks,” Medical physics, vol. 45, no. 10, pp. 4558–4567, 2018.

[157] D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic,D. Metaxas, and D. Comaniciu, “Automatic liver segmentation

using an adversarial image-to-image network,” in InternationalConference on Medical Image Computing and Computer-Assisted In-tervention. Springer, 2017, pp. 507–515.

[158] N. Lessmann, B. van Ginneken, P. A. de Jong, and I. Isgum, “Iter-ative fully convolutional neural networks for automatic vertebrasegmentation,” arXiv preprint arXiv:1804.04383, 2018.

[159] D. Jin, Z. Xu, Y. Tang, A. P. Harrison, and D. J. Mollura, “Ct-realistic lung nodule simulation from 3d conditional genera-tive adversarial networks for robust lung segmentation,” arXivpreprint arXiv:1806.04051, 2018.

[160] A. P. Harrison, Z. Xu, K. George, L. Lu, R. M. Summers, and D. J.Mollura, “Progressive and multi-path holistically nested neuralnetworks for pathological lung segmentation from ct images,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 621–629.

[161] X. Xu, Q. Lu, L. Yang, S. Hu, D. Chen, Y. Hu, and Y. Shi, “Quan-tization of fully convolutional networks for accurate biomedicalimage segmentation,” Preprint at https://arxiv. org/abs/1803.04907,2018.

[162] K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng,Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchezet al., “Gland segmentation in colon histology images: The glaschallenge contest,” Medical image analysis, vol. 35, pp. 489–502,2017.

[163] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestiveannotation: A deep active learning framework for biomedicalimage segmentation,” in International Conference on Medical ImageComputing and Computer-Assisted Intervention. Springer, 2017, pp.399–407.

[164] D. Liu and T. Jiang, “Deep reinforcement learning for sur-gical gesture segmentation and classification,” arXiv preprintarXiv:1806.08089, 2018.

[165] S. M. R. Al Arif, K. Knapp, and G. Slabaugh, “Spnet: Shapeprediction using a fully convolutional neural network,” in In-ternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 430–439.

[166] S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain tumor seg-mentation using convolutional neural networks in mri images,”IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1240–1251,2016.

[167] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries,M. J. Benders, and I. Isgum, “Automatic segmentation of mr brainimages with a convolutional neural network,” IEEE transactionson medical imaging, vol. 35, no. 5, pp. 1252–1261, 2016.

[168] P. Liskowski and K. Krawiec, “Segmenting retinal blood vesselswith deep neural networks,” IEEE transactions on medical imaging,vol. 35, no. 11, pp. 2369–2380, 2016.

[169] J.-Z. Cheng, D. Ni, Y.-H. Chou, J. Qin, C.-M. Tiu, Y.-C. Chang, C.-S. Huang, D. Shen, and C.-M. Chen, “Computer-aided diagnosiswith deep learning architecture: applications to breast lesions inus images and pulmonary nodules in ct scans,” Scientific reports,vol. 6, p. 24454, 2016.

[170] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville,Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle, “Brain tumorsegmentation with deep neural networks,” Medical image analysis,vol. 35, pp. 18–31, 2017.

[171] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D.Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesionsegmentation,” Medical image analysis, vol. 36, pp. 61–78, 2017.

[172] L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu,“Automatic segmentation of nine retinal layer boundaries in octimages of non-exudative amd patients using deep learning andgraph search,” Biomedical optics express, vol. 8, no. 5, pp. 2732–2744, 2017.

[173] B. Ibragimov and L. Xing, “Segmentation of organs-at-risks inhead and neck ct images using convolutional neural networks,”Medical physics, vol. 44, no. 2, pp. 547–557, 2017.

[174] M. Sarker, M. Kamal, H. A. Rashwan, S. F. Banu, A. Saleh,V. K. Singh, F. U. Chowdhury, S. Abdulwahab, S. Romani,P. Radeva et al., “Slsdeep: Skin lesion segmentation based ondilated residual and pyramid pooling networks,” arXiv preprintarXiv:1805.10241, 2018.

[175] D. Gutman, N. C. Codella, E. Celebi, B. Helba, M. Marchetti,N. Mishra, and A. Halpern, “Skin lesion analysis towardmelanoma detection: A challenge at the international symposiumon biomedical imaging (isbi) 2016, hosted by the international

Page 25: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

25

skin imaging collaboration (isic),” arXiv preprint arXiv:1605.01397,2016.

[176] Z. Mirikharaji and G. Hamarneh, “Star shape prior in fullyconvolutional networks for skin lesion segmentation,” in Inter-national Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 737–745.

[177] Y. Yuan, “Automatic skin lesion segmentation with fullyconvolutional-deconvolutional networks,” arXiv preprintarXiv:1703.05165, 2017.

[178] F. Ambellan, A. Tack, M. Ehlke, and S. Zachow, “Automatedsegmentation of knee bone and cartilage combining statisticalshape knowledge and convolutional neural networks: Data fromthe osteoarthritis initiative,” Medical Image Analysis, 2018.

[179] A. Klein, J. Andersson, B. A. Ardekani, J. Ashburner, B. Avants,M.-C. Chiang, G. E. Christensen, D. L. Collins, J. Gee, P. Hellieret al., “Evaluation of 14 nonlinear deformation algorithms appliedto human brain mri registration,” Neuroimage, vol. 46, no. 3, pp.786–802, 2009.

[180] R. Bajcsy and S. Kovacic, “Multiresolution elastic matching,”Computer vision, graphics, and image processing, vol. 46, no. 1, pp.1–21, 1989.

[181] J.-P. Thirion, “Image matching as a diffusion process: an analogywith maxwell’s demons,” Medical image analysis, vol. 2, no. 3, pp.243–260, 1998.

[182] M. F. Beg, M. I. Miller, A. Trouve, and L. Younes, “Computinglarge deformation metric mappings via geodesic flows of diffeo-morphisms,” International journal of computer vision, vol. 61, no. 2,pp. 139–157, 2005.

[183] J. Ashburner, “A fast diffeomorphic image registration algo-rithm,” Neuroimage, vol. 38, no. 1, pp. 95–113, 2007.

[184] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Sym-metric diffeomorphic image registration with cross-correlation:evaluating automated labeling of elderly and neurodegenerativebrain,” Medical image analysis, vol. 12, no. 1, pp. 26–41, 2008.

[185] B. Glocker, N. Komodakis, G. Tziritas, N. Navab, and N. Para-gios, “Dense image registration through mrfs and efficient linearprogramming,” Medical image analysis, vol. 12, no. 6, pp. 731–741,2008.

[186] A. V. Dalca, A. Bobu, N. S. Rost, and P. Golland, “Patch-baseddiscrete registration of clinical brain images,” in InternationalWorkshop on Patch-based Techniques in Medical Imaging. Springer,2016, pp. 60–67.

[187] J. Krebs, T. Mansi, H. Delingette, L. Zhang, F. C. Ghesu, S. Miao,A. K. Maier, N. Ayache, R. Liao, and A. Kamen, “Robust non-rigid registration through agent-based action learning,” in In-ternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 344–352.

[188] M.-M. Rohe, M. Datar, T. Heimann, M. Sermesant, and X. Pennec,“Svf-net: Learning deformable image registration using shapematching,” in International Conference on Medical Image Computingand Computer-Assisted Intervention. Springer, 2017, pp. 266–274.

[189] H. Sokooti, B. de Vos, F. Berendsen, B. P. Lelieveldt, I. Isgum,and M. Staring, “Nonrigid image registration using multi-scale3d convolutional neural networks,” in International Conferenceon Medical Image Computing and Computer-Assisted Intervention.Springer, 2017, pp. 232–239.

[190] X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Quicksilver:Fast predictive image registration–a deep learning approach,”NeuroImage, vol. 158, pp. 378–396, 2017.

[191] S. C. van de Leemput, M. Prokop, B. van Ginneken, and R. Man-niesing, “Stacked bidirectional convolutional lstms for 3d non-contrast ct reconstruction from spatiotemporal 4d ct,” 2018.

[192] D. Nie, X. Cao, Y. Gao, L. Wang, and D. Shen, “Estimating ctimage from mri data using 3d fully convolutional networks,” inDeep Learning and Data Labeling for Medical Applications. Springer,2016, pp. 170–178.

[193] K. Bahrami, F. Shi, I. Rekik, and D. Shen, “Convolutional neuralnetwork for reconstruction of 7t-like images from 3t mri usingappearance and anatomical features,” in Deep Learning and DataLabeling for Medical Applications. Springer, 2016, pp. 39–47.

[194] K. Lønning, P. Putzky, M. W. Caan, and M. Welling, “Recurrentinference machines for accelerated mri reconstruction,” 2018.

[195] A. Sheikhjafari, M. Noga, K. Punithakumar, and N. Ray, “Un-supervised deformable image registration with fully connectedgenerative neural network,” 2018.

[196] B. Hou, N. Miolane, B. Khanal, M. Lee, A. Alansary, S. McDon-agh, J. Hajnal, D. Rueckert, B. Glocker, and B. Kainz, “Deep poseestimation for image-based registration,” 2018.

[197] B. Hou, B. Khanal, A. Alansary, S. McDonagh, A. Davidson,M. Rutherford, J. V. Hajnal, D. Rueckert, B. Glocker, and B. Kainz,“Image-based registration in canonical atlas space,” 2018.

[198] ——, “3d reconstruction in canonical co-ordinate space fromarbitrarily oriented 2d images,” IEEE Transactions on MedicalImaging, 2018.

[199] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V.Dalca, “An unsupervised learning model for deformable medi-cal image registration,” in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2018, pp. 9252–9260.

[200] P. Costa, A. Galdran, M. I. Meyer, M. Niemeijer, M. Abramoff,A. M. Mendonca, and A. Campilho, “End-to-end adversarialretinal image synthesis,” IEEE transactions on medical imaging,vol. 37, no. 3, pp. 781–791, 2018.

[201] D. Mahapatra, B. Antony, S. Sedai, and R. Garnavi, “Deformablemedical image registration using generative adversarial net-works,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th Inter-national Symposium on. IEEE, 2018, pp. 1449–1453.

[202] H. Tang, A. Pan, Y. Yang, K. Yang, Y. Luo, S. Zhang, andS. H. Ong, “Retinal image registration based on robust non-rigidpoint matching method,” Journal of Medical Imaging and HealthInformatics, vol. 8, no. 2, pp. 240–249, 2018.

[203] L. Pan, F. Shi, W. Zhu, B. Nie, L. Guan, and X. Chen, “Detectionand registration of vessels for longitudinal 3d retinal oct imagesusing surf,” in Medical Imaging 2018: Biomedical Applications inMolecular, Structural, and Functional Imaging, vol. 10578. Interna-tional Society for Optics and Photonics, 2018, p. 105782P.

[204] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robustfeatures,” in European conference on computer vision. Springer,2006, pp. 404–417.

[205] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model fitting with applications to image analysisand automated cartography,” Communications of the ACM, vol. 24,no. 6, pp. 381–395, 1981.

[206] D. Mahapatra, “Elastic registration of medical images with gans,”arXiv preprint arXiv:1805.02369, 2018.

[207] K. A. Eppenhof, M. W. Lafarge, P. Moeskops, M. Veta, andJ. P. Pluim, “Deformable image registration using convolutionalneural networks,” in Medical Imaging 2018: Image Processing, vol.10574. International Society for Optics and Photonics, 2018, p.105740S.

[208] E. Castillo, R. Castillo, J. Martinez, M. Shenoy, and T. Guerrero,“Four-dimensional deformable image registration using trajec-tory modeling,” Physics in Medicine & Biology, vol. 55, no. 1, p.305, 2009.

[209] J. Vandemeulebroucke, O. Bernard, S. Rit, J. Kybic, P. Clarysse,and D. Sarrut, “Automated segmentation of a motion mask topreserve sliding motion in deformable registration of thoracicct,” Medical physics, vol. 39, no. 2, pp. 1006–1015, 2012.

[210] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Star-ing, and I. Isgum, “A deep learning framework for unsupervisedaffine and deformable image registration,” Medical image analysis,vol. 52, pp. 128–143, 2019.

[211] J. Zheng, S. Miao, Z. J. Wang, and R. Liao, “Pairwise domainadaptation module for cnn-based 2-d/3-d registration,” Journalof Medical Imaging, vol. 5, no. 2, p. 021204, 2018.

[212] J. Lv, M. Yang, J. Zhang, and X. Wang, “Respiratory motioncorrection for free-breathing 3d abdominal mri using cnn-basedimage registration: a feasibility study,” The British journal ofradiology, vol. 91, no. xxxx, p. 20170788, 2018.

[213] J. Lv, W. Huang, J. Zhang, and X. Wang, “Performance of u-net based pyramidal lucas-kanade registration on free-breathingmulti-b-value diffusion mri of the kidney,” The British journal ofradiology, vol. 91, no. 1086, p. 20170813, 2018.

[214] P. Yan, S. Xu, A. R. Rastinehad, and B. J. Wood, “Adversarial im-age registration with application for mr and trus image fusion,”arXiv preprint arXiv:1804.11024, 2018.

[215] Y. Hu, M. Modat, E. Gibson, N. Ghavami, E. Bonmati, C. M.Moore, M. Emberton, J. A. Noble, D. C. Barratt, and T. Ver-cauteren, “Label-driven weakly-supervised learning for multi-modal deformarle image registration,” in Biomedical Imaging (ISBI2018), 2018 IEEE 15th International Symposium on. IEEE, 2018, pp.1070–1074.

Page 26: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

26

[216] G. Zhao, B. Zhou, K. Wang, R. Jiang, and M. Xu, “Respond-cam:Analyzing deep models for 3d imaging data by visualizations,”arXiv preprint arXiv:1806.00102, 2018.

[217] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh,D. Batra et al., “Grad-cam: Visual explanations from deep net-works via gradient-based localization.” in ICCV, 2017, pp. 618–626.

[218] T. Elss, H. Nickisch, T. Wissel, R. Bippus, M. Morlock, andM. Grass, “Motion estimation in coronary ct angiography imagesusing convolutional neural networks,” 2018.

[219] S. Miao, Z. J. Wang, and R. Liao, “A cnn regression approachfor real-time 2d/3d registration,” IEEE transactions on medicalimaging, vol. 35, no. 5, pp. 1352–1363, 2016.

[220] G. Wu, M. Kim, Q. Wang, B. C. Munsell, and D. Shen, “Scalablehigh-performance image registration framework by unsuper-vised deep feature representations learning,” IEEE Transactionson Biomedical Engineering, vol. 63, no. 7, pp. 1505–1516, 2016.

[221] M. Simonovsky, B. Gutierrez-Becker, D. Mateus, N. Navab, andN. Komodakis, “A deep metric for multimodal registration,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 10–18.

[222] X. Yang, R. Kwitt, and M. Niethammer, “Fast predictive imageregistration,” in Deep Learning and Data Labeling for Medical Appli-cations. Springer, 2016, pp. 48–57.

[223] Q. Zhang, Y. Xiao, W. Dai, J. Suo, C. Wang, J. Shi, and H. Zheng,“Deep learning based classification of breast tumors with shear-wave elastography,” Ultrasonics, vol. 72, pp. 150–157, 2016.

[224] D. Nie, R. Trullo, J. Lian, C. Petitjean, S. Ruan, Q. Wang, andD. Shen, “Medical image synthesis with context-aware generativeadversarial networks,” in International Conference on Medical ImageComputing and Computer-Assisted Intervention. Springer, 2017, pp.417–425.

[225] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neuralnetwork using directional wavelets for low-dose x-ray ct recon-struction,” Medical physics, vol. 44, no. 10, 2017.

[226] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, andI. Isgum, “End-to-end unsupervised deformable image registra-tion with a convolutional neural network,” in Deep Learning inMedical Image Analysis and Multimodal Learning for Clinical DecisionSupport. Springer, 2017, pp. 204–212.

[227] P. Radau, Y. Lu, K. Connelly, G. Paul, A. Dick, and G. Wright,“Evaluation framework for algorithms segmenting short axiscardiac mri,” The MIDAS Journal-Cardiac MR Left Ventricle Seg-mentation Challenge, vol. 49, 2009.

[228] X. Li, N. C. Dvornek, J. Zhuang, P. Ventola, and J. S. Duncan,“Brain biomarker interpretation in asd using deep learning andfmri,” arXiv preprint arXiv:1808.08296, 2018.

[229] E. H. Asl, M. Ghazal, A. Mahmoud, A. Aslantas, A. Shalaby,M. Casanova, G. Barnes, G. Gimelfarb, R. Keynton, and A. El Baz,“Alzheimers disease diagnostics by a 3d deeply supervisedadaptable convolutional network.”

[230] W. Yan, H. Zhang, J. Sui, and D. Shen, “Deep chronnectomelearning via full bidirectional long short-term memory networksfor mci diagnosis,” in International Conference on Medical ImageComputing and Computer-Assisted Intervention. Springer, 2018,pp. 249–257.

[231] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, andF. Meneguzzi, “Identification of autism spectrum disorder usingdeep learning and the abide dataset,” NeuroImage: Clinical, vol. 17,pp. 16–23, 2018.

[232] M. Soussia and I. Rekik, “A review on image-and network-basedbrain data analysis techniques for alzheimer’s disease diagnosisreveals a gap in developing predictive methods for prognosis,”arXiv preprint arXiv:1808.01951, 2018.

[233] B. Gutierrez-Becker and C. Wachinger, “Deep multi-structuralshape analysis: Application to neuroanatomy,” arXiv preprintarXiv:1806.01069, 2018.

[234] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deeplearning on point sets for 3d classification and segmentation,”Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, vol. 1,no. 2, p. 4, 2017.

[235] R. Awan, N. A. Koohbanani, M. Shaban, A. Lisowska, and N. Ra-jpoot, “Context-aware learning using transferable features forclassification of breast cancer histology images,” in InternationalConference Image Analysis and Recognition. Springer, 2018, pp.788–795.

[236] T. Araujo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy,A. Polonia, and A. Campilho, “Classification of breast cancerhistology images using convolutional neural networks,” PloS one,vol. 12, no. 6, p. e0177544, 2017.

[237] H. D. Couture, J. Marron, C. M. Perou, M. A. Troester, andM. Niethammer, “Multiple instance learning for heterogeneousimages: Training a cnn for histopathology,” arXiv preprintarXiv:1806.05083, 2018.

[238] M. A. Troester, X. Sun, E. H. Allott, J. Geradts, S. M. Cohen,C.-K. Tse, E. L. Kirk, L. B. Thorne, M. Mathews, Y. Li et al.,“Racial differences in pam50 subtypes in the carolina breastcancer study,” JNCI: Journal of the National Cancer Institute, vol.110, no. 2, 2018.

[239] P. Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte,and P. Honeine, “Multiple instance learning for histopathologicalbreast cancer image classification,” Expert Systems with Applica-tions, vol. 117, pp. 103–111, 2019.

[240] K. Roy, D. Banik, D. Bhattacharjee, and M. Nasipuri, “Patch-based system for classification of breast histology images us-ing deep learning,” Computerized Medical Imaging and Graphics,vol. 71, pp. 90–103, 2019.

[241] N. Antropova, H. Abe, and M. L. Giger, “Use of clinical mrimaximum intensity projections for improved breast lesion clas-sification with deep convolutional neural networks,” Journal ofMedical Imaging, vol. 5, no. 1, p. 014503, 2018.

[242] D. Ribli, A. Horvath, Z. Unger, P. Pollner, and I. Csabai, “Detect-ing and classifying lesions in mammograms with deep learning,”Scientific reports, vol. 8, no. 1, p. 4165, 2018.

[243] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso,and J. S. Cardoso, “Inbreast: toward a full-field digital mammo-graphic database,” Academic radiology, vol. 19, no. 2, pp. 236–248,2012.

[244] Y. Zheng, C. Yang, and A. Merkulov, “Breast cancer screeningusing convolutional neural network and follow-up digital mam-mography,” in Computational Imaging III, vol. 10669. Interna-tional Society for Optics and Photonics, 2018, p. 1066905.

[245] H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y. Zheng,“Convolutional neural networks for diabetic retinopathy,” Proce-dia Computer Science, vol. 90, pp. 200–205, 2016.

[246] M. S. Ayhan and P. Berens, “Test-time data augmentation forestimation of heteroscedastic aleatoric uncertainty in deep neuralnetworks,” 2018.

[247] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings indeep residual networks,” in European conference on computer vision.Springer, 2016, pp. 630–645.

[248] M. Mateen, J. Wen, S. Song, Z. Huang et al., “Fundus image clas-sification using vgg-19 architecture with pca and svd,” Symmetry,vol. 11, no. 1, p. 1, 2019.

[249] R. Dey, Z. Lu, and Y. Hong, “Diagnostic classification of lungnodules using 3d neural networks,” in Biomedical Imaging (ISBI2018), 2018 IEEE 15th International Symposium on. IEEE, 2018,pp. 774–778.

[250] A. Nibali, Z. He, and D. Wollersheim, “Pulmonary nodule clas-sification with deep residual networks,” International journal ofcomputer assisted radiology and surgery, vol. 12, no. 10, pp. 1799–1808, 2017.

[251] M. Gao, U. Bagci, L. Lu, A. Wu, M. Buty, H.-C. Shin, H. Roth, G. Z.Papadakis, A. Depeursinge, R. M. Summers et al., “Holistic clas-sification of ct attenuation patterns for interstitial lung diseasesvia deep convolutional neural networks,” Computer Methods inBiomechanics and Biomedical Engineering: Imaging & Visualization,vol. 6, no. 1, pp. 1–6, 2018.

[252] Y. Song, W. Cai, H. Huang, Y. Zhou, D. D. Feng, Y. Wang,M. J. Fulham, and M. Chen, “Large margin local estimate withapplications to medical image classification,” IEEE transactions onmedical imaging, vol. 34, no. 6, pp. 1362–1377, 2015.

[253] Y. Song, W. Cai, Y. Zhou, and D. D. Feng, “Feature-based imagepatch approximation for lung tissue classification,” IEEE Trans.Med. Imaging, vol. 32, no. 4, pp. 797–808, 2013.

[254] A. Depeursinge, A. Vargas, A. Platon, A. Geissbuhler, P.-A. Po-letti, and H. Muller, “Building a reference multimedia databasefor interstitial lung diseases,” Computerized medical imaging andgraphics, vol. 36, no. 3, pp. 227–238, 2012.

[255] C. Biffi, O. Oktay, G. Tarroni, W. Bai, A. De Marvao, G. Doumou,M. Rajchl, R. Bedair, S. Prasad, S. Cook et al., “Learning inter-pretable anatomical features through deep generative models:Application to cardiac remodeling,” in International Conference

Page 27: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

27

on Medical Image Computing and Computer-Assisted Intervention.Springer, 2018, pp. 464–471.

[256] C. Brestel, R. Shadmi, I. Tamir, M. Cohen-Sfaty, and E. Elnekave,“Radbot-cxr: Classification of four clinical finding categories inchest x-ray using deep learning,” 2018.

[257] X. Wang, H. Chen, C. Gan, H. Lin, Q. Dou, Q. Huang, M. Cai, andP.-A. Heng, “Weakly supervised learning for whole slide lungcancer image classification.”

[258] N. Coudray, P. S. Ocampo, T. Sakellaropoulos, N. Narula,M. Snuderl, D. Fenyo, A. L. Moreira, N. Razavian, and A. Tsiri-gos, “Classification and mutation prediction from non–small celllung cancer histopathology images using deep learning,” Naturemedicine, vol. 24, no. 10, p. 1559, 2018.

[259] J. M. Tomczak, M. Ilse, M. Welling, M. Jansen, H. G. Coleman,M. Lucas, K. de Laat, M. de Bruin, H. Marquering, M. J. van derWel et al., “Histopathological classification of precursor lesions ofesophageal adenocarcinoma: A deep multiple instance learningapproach,” 2018.

[260] M. Ilse, J. M. Tomczak, and M. Welling, “Attention-based deepmultiple instance learning,” arXiv preprint arXiv:1802.04712, 2018.

[261] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, andH. Greenspan, “Synthetic data augmentation using gan for im-proved liver lesion classification,” in Biomedical Imaging (ISBI2018), 2018 IEEE 15th International Symposium on. IEEE, 2018,pp. 289–293.

[262] Z. Xu, Y. Huo, J. Park, B. Landman, A. Milkowski, S. Grbic,and S. Zhou, “Less is more: Simultaneous view classificationand landmark detection for abdominal ultrasound images,” arXivpreprint arXiv:1805.10376, 2018.

[263] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,and S. Thrun, “Dermatologist-level classification of skin cancerwith deep neural networks,” Nature, vol. 542, no. 7639, p. 115,2017.

[264] M. Combalia and V. Vilaplana, “Monte-carlo sampling applied tomultiple instance learning for whole slide image classification,”2018.

[265] J. Antony, K. McGuinness, N. E. O’Connor, and K. Moran, “Quan-tifying radiographic knee osteoarthritis severity using deep con-volutional neural networks,” in Pattern Recognition (ICPR), 201623rd International Conference on. IEEE, 2016, pp. 1195–1200.

[266] O. Paserin, K. Mulpuri, A. Cooper, A. J. Hodgson, and R. Garbi,“Real time rnn based 3d ultrasound scan adequacy for develop-mental dysplasia of the hip,” in International Conference on MedicalImage Computing and Computer-Assisted Intervention. Springer,2018, pp. 365–373.

[267] M. Anthimopoulos, S. Christodoulidis, L. Ebner, A. Christe,and S. Mougiakakou, “Lung pattern classification for interstitiallung diseases using a deep convolutional neural network,” IEEEtransactions on medical imaging, vol. 35, no. 5, pp. 1207–1216, 2016.

[268] M. Kallenberg, K. Petersen, M. Nielsen, A. Y. Ng, P. Diao, C. Igel,C. M. Vachon, K. Holland, R. R. Winkel, N. Karssemeijer et al.,“Unsupervised deep learning applied to breast density segmen-tation and mammographic risk scoring,” IEEE transactions onmedical imaging, vol. 35, no. 5, pp. 1322–1331, 2016.

[269] B. Q. Huynh, H. Li, and M. L. Giger, “Digital mammographictumor classification using transfer learning from deep convolu-tional neural networks,” Journal of Medical Imaging, vol. 3, no. 3,p. 034501, 2016.

[270] Z. Yan, Y. Zhan, Z. Peng, S. Liao, Y. Shinagawa, S. Zhang, D. N.Metaxas, and X. S. Zhou, “Multi-instance deep learning: Discoverdiscriminative local anatomies for bodypart recognition,” IEEEtransactions on medical imaging, vol. 35, no. 5, pp. 1332–1343, 2016.

[271] W. Sun, T.-L. B. Tseng, J. Zhang, and W. Qian, “Enhancing deepconvolutional neural network scheme for breast cancer diagnosiswith unlabeled data,” Computerized Medical Imaging and Graphics,vol. 57, pp. 4–9, 2017.

[272] S. Christodoulidis, M. Anthimopoulos, L. Ebner, A. Christe, andS. Mougiakakou, “Multi-source transfer learning with convolu-tional neural networks for lung pattern analysis,” arXiv preprintarXiv:1612.02589, 2016.

[273] K. Lekadir, A. Galimzianova, A. Betriu, M. del Mar Vila, L. Igual,D. L. Rubin, E. Fernandez, P. Radeva, and S. Napel, “A convo-lutional neural network for automatic characterization of plaquecomposition in carotid ultrasound.” IEEE J. Biomedical and HealthInformatics, vol. 21, no. 1, pp. 48–55, 2017.

[274] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Fara-hani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest et al.,

“The multimodal brain tumor image segmentation benchmark(brats),” IEEE transactions on medical imaging, vol. 34, no. 10, p.1993, 2015.

[275] M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. P.Kegelmeyer, “The digital database for screening mammography,”in Proceedings of the 5th international workshop on digital mammog-raphy. Medical Physics Publishing, 2000, pp. 212–218.

[276] G. Quellec, M. Lamard, P. M. Josselin, G. Cazuguel, B. Cochener,and C. Roux, “Optimal wavelet transform for the detection ofmicroaneurysms in retina photographs.” IEEE Transactions onMedical Imaging, vol. 27, no. 9, pp. 1230–41, 2008.

[277] E. Decenciere, X. Zhang, G. Cazuguel, B. Lay, B. Cochener,C. Trone, P. Gain, R. Ordonez, P. Massin, A. Erginay et al., “Feed-back on a publicly distributed image database: the messidordatabase,” Image Analysis & Stereology, vol. 33, no. 3, pp. 231–234,2014.

[278] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers,“Chestx-ray8: Hospital-scale chest x-ray database and bench-marks on weakly-supervised classification and localization ofcommon thorax diseases,” in Computer Vision and Pattern Recogni-tion (CVPR), 2017 IEEE Conference on. IEEE, 2017, pp. 3462–3471.

[279] M. Niemeijer, J. Staal, B. van Ginneken, M. Loog, and M. D.Abramoff, “Comparative study of retinal vessel segmentationmethods on a new publicly available database,” in Medical Imag-ing 2004: Image Processing, vol. 5370. International Society forOptics and Photonics, 2004, pp. 648–657.

[280] A. F. Fotenos, A. Snyder, L. Girton, J. Morris, and R. Buckner,“Normative estimates of cross-sectional and longitudinal brainvolume decline in aging and ad,” Neurology, vol. 64, no. 6, pp.1032–1039, 2005.

[281] J. C. Morris, “The clinical dementia rating (cdr): current versionand scoring rules.” Neurology, 1993.

[282] R. L. Buckner, D. Head, J. Parker, A. F. Fotenos, D. Marcus, J. C.Morris, and A. Z. Snyder, “A unified approach for morphometricand functional data analysis in young, old, and demented adultsusing automated atlas-based head size normalization: reliabilityand validation against manual measurement of total intracranialvolume,” Neuroimage, vol. 23, no. 2, pp. 724–738, 2004.

[283] E. H. Rubin, M. Storandt, J. P. Miller, D. A. Kinscherf, E. A. Grant,J. C. Morris, and L. Berg, “A prospective study of cognitivefunction and onset of dementia in cognitively healthy elders,”Archives of neurology, vol. 55, no. 3, pp. 395–401, 1998.

[284] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain mrimages through a hidden markov random field model and theexpectation-maximization algorithm,” IEEE transactions on medi-cal imaging, vol. 20, no. 1, pp. 45–57, 2001.

[285] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis,I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok et al., “Mammo-graphic image analysis society (mias) database v1. 21,” 2015.

[286] D. F. Pace, A. V. Dalca, T. Geva, A. J. Powell, M. H. Moghari, andP. Golland, “Interactive whole-heart segmentation in congenitalheart disease,” in International Conference on Medical Image Com-puting and Computer-Assisted Intervention. Springer, 2015, pp.80–88.

[287] S. Azizi, P. Mousavi, P. Yan, A. Tahmasebi, J. T. Kwak, S. Xu,B. Turkbey, P. Choyke, P. Pinto, B. Wood et al., “Transfer learningfrom rf to b-mode temporal enhanced ultrasound features forprostate cancer detection,” International journal of computer assistedradiology and surgery, vol. 12, no. 7, pp. 1111–1121, 2017.

[288] T. Kooi, B. van Ginneken, N. Karssemeijer, and A. den Heeten,“Discriminating solitary cysts from soft tissue lesions in mam-mography using a pretrained deep convolutional neural net-work,” Medical physics, vol. 44, no. 3, pp. 1017–1027, 2017.

[289] R. K. Samala, H.-P. Chan, L. M. Hadjiiski, M. A. Helvie, K. H.Cha, and C. D. Richter, “Multi-task transfer learning deep convo-lutional neural network: application to computer-aided diagnosisof breast cancer on mammograms,” Physics in Medicine & Biology,vol. 62, no. 23, p. 8894, 2017.

[290] A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese,“Taskonomy: Disentangling task transfer learning,” in Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp. 3712–3722.

[291] N. Akhtar, A. S. Mian, and F. Porikli, “Joint discriminativebayesian dictionary and classifier learning.” in CVPR, vol. 4, 2017,p. 7.

[292] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionarylearning for sparse coding,” in Proceedings of the 26th annual

Page 28: 1 Going Deep in Medical Image Analysis: Concepts, Methods ... · Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar, Naeem K. Janjua Abstract—Medical Image Analysis is currently experiencing

28

international conference on machine learning. ACM, 2009, pp. 689–696.

[293] Y. Bar, I. Diamant, L. Wolf, and H. Greenspan, “Deep learningwith non-medical training used for chest pathology identifica-tion,” in Medical Imaging 2015: Computer-Aided Diagnosis, vol.9414. International Society for Optics and Photonics, 2015, p.94140V.

[294] F. Ciompi, B. de Hoop, S. J. van Riel, K. Chung, E. T. Scholten,M. Oudkerk, P. A. de Jong, M. Prokop, and B. van Ginneken,“Automatic classification of pulmonary peri-fissural nodules incomputed tomography using an ensemble of 2d views anda convolutional neural network out-of-the-box,” Medical imageanalysis, vol. 26, no. 1, pp. 195–202, 2015.

[295] U. Lopes and J. F. Valiati, “Pre-trained convolutional neural net-works as feature extractors for tuberculosis detection,” Computersin biology and medicine, vol. 89, pp. 135–143, 2017.

[296] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Un-derstanding neural networks through deep visualization,” arXivpreprint arXiv:1506.06579, 2015.

[297] L. Zhang, A. Gooya, and A. F. Frangi, “Semi-supervised assess-ment of incomplete lv coverage in cardiac mri using generativeadversarial nets,” in International Workshop on Simulation andSynthesis in Medical Imaging. Springer, 2017, pp. 61–68.

[298] F. Calimeri, A. Marzullo, C. Stamile, and G. Terracina, “Biomed-ical data augmentation using generative adversarial neural net-works,” in International Conference on Artificial Neural Networks.Springer, 2017, pp. 626–634.

[299] A. Lahiri, K. Ayush, P. K. Biswas, and P. Mitra, “Generativeadversarial learning for reducing manual annotation in semanticsegmentation on large scale miscroscopy images: Automatedvessel segmentation in retinal fundus image as test case,” inConference on Computer Vision and Pattern Recognition Workshops,2017, pp. 42–48.

[300] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explainingand harnessing adversarial examples (2014),” arXiv preprintarXiv:1412.6572.

[301] H. Haenssle, C. Fink, R. Schneiderbauer, F. Toberer, T. Buhl,A. Blum, A. Kalloo, A. B. H. Hassen, L. Thomas, A. Enk et al.,“Man against machine: diagnostic performance of a deep learn-ing convolutional neural network for dermoscopic melanomarecognition in comparison to 58 dermatologists,” Annals of On-cology, vol. 29, no. 8, pp. 1836–1842, 2018.

[302] G. A. Bowen, “Document analysis as a qualitative researchmethod,” Qualitative research journal, vol. 9, no. 2, pp. 27–40, 2009.

[303] G. G. Chowdhury, “Natural language processing,” Annual reviewof information science and technology, vol. 37, no. 1, pp. 51–89, 2003.

[304] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, andJ. Xiao, “Lsun: Construction of a large-scale image dataset us-ing deep learning with humans in the loop,” arXiv preprintarXiv:1506.03365, 2015.

[305] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Masteringthe game of go without human knowledge,” Nature, vol. 550, no.7676, p. 354, 2017.

Fouzia Altaf is currently pursuing her PhD atthe School of Science, Edith Cowan University(ECU), Australia. Previously, she completed herMaster of Philosophy and Master of Sciencedegrees from the University of the Punjab, Pak-istan. She is the recipient of Research TrainingProgram scholarship at ECU. Her current re-search pertains to the areas of Deep Learning,Medical Image Analysis and Computer Vision.

Syed Islam completed his PhD with Distinctionin Computer Engineering from the University ofWestern Australia (UWA) in 2011. He receivedhis MSc in Computer Engineering from KingFahd University of Petroleum and Minerals in2005 and BSc in Electrical and Electronic En-gineering from Islamic Institute of Technology in2000. He was a Research Assistant Professorat UWA from 2011 to 2015, a Research Fellowat Curtin University from 2013 to 2015 and aLecturer at UWA from 2015-2016. Since 2016,

he has been working as Lecturer in Computer Science at Edith CowanUniversity. He has published around 53 research articles and got ninepublic media releases. He obtained 14 competitive research grants forhis research in the area of Image Processing, Computer Vision andMedical Imaging. He has co-supervised to completion seven honoursand postgrad students and currently supervising one MS and seven PhDstudents. He is serving as Associate Editor of 13 international journals,Technical Committee Member of 24 conferences and regular reviewerof 26 journals. He is also serving seven professional bodies includingIEEE and Australian Computer Society (Senior Member).

Naveed Akhtar received his PhD in ComputerVision from The University of Western Australia(UWA) and Master degree in Computer Sciencefrom Hochschule Bonn-Rhein-Sieg, Germany(HBRS). His research in Computer Vision andPattern Recognition is regularly published in theprestigious venues of the field, including IEEECVPR and IEEE TPAMI. He has also served asa reviewer for these, and twelve other researchvenues in the broader field of Machine Learning.He is currently also serving as an Associate

Editor of the IEEE Access. During his PhD, he was a recipient of multiplescholarships, and winner of the Canon Extreme Imaging Competition in2015. Currently, he is a Research Fellow at UWA. Previously, he hasalso served on the same position at the Australian National University.His current research interests include Medical Imaging, AdversarialMachine Learning, Hyperspectral Image Analysis, and 3D Point CloudAnalysis.

Naeem Khalid Janjua is a Lecturer at theSchool of Science, Edith Cowan University,Perth. He is an Associate Editor for Interna-tional Journal of Computer System Science andEngineering (IJCSSE) and International Journalof Intelligent systems (IJEIS). He has publishedan authored book, a book chapter, and vari-ous articles in international journals and refer-eed conference proceedings. His areas of activeresearch are defeasible reasoning, argumenta-tion, ontologies, data modelling, cloud comput-

ing, machine learning and data mining. He works actively in the domainof business intelligence and Web-based intelligent decision supportsystems.