Introduction to Deep Learning for Biomedical...

Introduction to Deep Learning for Biomedical

Engineering

After a presentation made by:Evan Shelhamer, Jeff Donahue, Jon Long

caffe.berkeleyvision.orggithub.com/BVLC/caffe 1

Prof. Bart ter Haar Romeny

What isDeep Learning?

A typical Deep Convolutional Neural Network

ImageNet – Fei Fei Li

ImageNet Large ScaleVisual Recognition Competition(ILSVRC)

AlexNET

Litjens, Geert, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen AWM van der Laak, Bram van Ginneken, and Clara I. Sánchez. "A survey on deep learning in medical image analysis." arXiv preprint arXiv:1702.05747 (Feb 2017).

Power of heatmaps – Train on image level, visualize on pixel level.

10Samaneh Abbasi, Bart Romeny et al. TU/e:Recurrent Convolutional Neural Networks, MICCAI 2017, Quebec City

Samaneh Abbasi et al. TU/e:Recurrent ConvolutionalNeural Networks,MICCAI 2017, Quebec City

For Diabetic Retinopathy the best detection performance is by Quellec et al.: Az = 0.954 in Kaggle’s dataset and Az = 0.949 in e-Ophtha.

Why Deep Learning?

Applications

The Challengeof Recognition

Learning & Optimization

Network Tour Transfer Learning

Deep Learning for VisionDive into

Deep LearningWhat is DL?Why Now?

Caffe First Sip

Why Deep Learning? End-to-End Learning for Many Tasks

vision speech text control

Some examples

Demo: Google translate on smartphone (speech + images)

Demo: https://www.imageidentify.com/

How does this work?

Биомедицинская инженерияToday you can read this Russian text with your smartphone

Kaggle: Diabetic Retinopathy ChallengeBlog

Google Photos

Other examples:

Robot vision and recognition:Harvest robot for peppers.

Wageningen University, the Netherlands

Vision for self-driving cars

Aalsmeer, Netherlands, largest flower auction in the world

Quick facts and figures about the Dutch Horticulture industry

The Dutch horticulture sector is a global trendsetter and the undisputed international market leader in flowers, plants, bulbs and propagation material.

Did you know?• Holland has a 44% share of the worldwide trade in floricultural products, making it the dominant global supplier of flowers and flower products. Some 77% of all flower bulbs traded worldwide come from the Netherlands, the majority of which are tulips. 40% of the trade in 2015 was cut flowers and flower buds.• The sector is the number 1 exporter to the world for live trees, plants, bulbs, roots and cut flowers.• The sector is the number 3 exporter in nutritional horticulture products.• Of the approximately 1,800 new plant varieties that enter the European market each year, 65% originate in the Netherlands. In addition, Dutch breeders account for more than 35% of all applications for community plant variety rights.• The Dutch are one of the world’s largest exporter of seeds: the exports of seeds amounted to € 3.1 billion in 2014.• In 2014 the Netherlands was the world’s second largest exporter (in value) of fresh vegetables. The Netherlands exported vegetables with a market value of € 7 billion.

From Wikipedia:

Deep learning is a class of machine learning algorithms that

• use a deep cascade of many layers of nonlinear processing unitsfor feature extraction and transformation.

• Each successive layer uses the output from the previous layer as input. • The algorithms may be supervised or unsupervised.• Applications include pattern analysis (unsupervised) and classification (supervised).

• are based on the (unsupervised) learning of multiple levels of features or representations of the data.

• Higher level features are derived from lower level features to form a hierarchical representation.

Deep Learning

So we have to learn:

1. Overview in depth → Introduction, Caffe example2. What are filters? → Convolution and convolution networks3. What is learned? → Invariant geometric features4. How can kernels be learned? → Principal Component Analysis5. How does the visual system this? → Front-end vision, visual cortex6. How can we use this? → Software developments in Deep Learning7. Questions → and answers

Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even Dogs vs Cats image recognition with 98.9% accuracy.

Many winning entries in recent Kaggle Data Science competitions have used Deep Learning.

The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks.

http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html

Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.

In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoffrey Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).

Human vision and convolutional neural networks:

A cascade of increasing complexity

• Hierarchical network• Use of context

Wikipedia: Gestalt psychology or gestaltism (German: Gestalt "shape, form") is a philosophy of mind of the Berlin School of experimental psychology. Gestalt psychology is an attempt to understand the laws behind the ability to acquire and maintain meaningful perceptions in an apparently chaotic world. The central principle of gestalt psychology is that the mind forms a global whole with self-organizing tendencies. The assumed physiological mechanisms on which Gestalt theory rests are poorly defined and support for their existence is lacking. It is known as ‘perceptual grouping’.

AlexNET - pdf

Vision: the highest bandwidth input channel

Machines are useful mainly to the extent that they interact with the physical worldVisual information is the richest source of information about the real world

Vision is the highest-bandwidth mode for machines to obtain real-world info

Embedded vision enables our things to be:- More responsive- More personal and secure- Safer, more autonomous- Easier to use

subaru.com

http://www.kdnuggets.com/2017/02/top-arxiv-papers-january-convnets-wide-adversarial.html

Top papers on arXiv (https://arxiv.org/) :

Performance evaluation: http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

VisualObjectClasses

Why Now?1.Data

ImageNet et al.: millions of labeled (crowdsourced) images1.Compute

GPUs: terabytes/s memory bandwidth, teraflops compute1.Technique

new optimization know-how,new variants on old architectures,new tools for rapid experimentation

Why Now? DataFor example:

>14 million labeled images>1 million with bounding boxes

>300,000 images with labeled and segmented objects

Why Now? GPUs

Parallel processorsfor parallel models:

Inherent Parallelismsame op, different data

Bandwidthlots of data in and out

Tuned PrimitivescuDNN and cuBLASfor deep nets for matrices 34

Nvidia News URL

GPU – Graphical Processing Unit

Thousands of parallell coresFully programmable in e.g. CUDAVery affordableShared large memory (e.g. 12 GB)In large server banksCan be rented by Amazon, Baidu, Alibaba etc.

Titan Xp GPU

Why Now? TechniqueNon-convex and high-dimensional learning is okaywith the right design choices

e.g. non-saturating non-linearities

Learning by Stochastic Gradient Descent (SGD) with momentum and other variants — more later!

instead of

Examples from NVIDIA:https://developer.nvidia.com/deep-learning

DeepBreak

What is Deep Learning?

Compositional ModelsLearned End-to-End

Hierarchy of Representations- vision: pixel, motif, part, object- text: character, word, clause, sentence- speech: audio, band, phone, word concrete

abstract

layer1

layer2

output

Back-propagation jointly learnsall of the model parameters tooptimize the output for the task—more on this later!

What is Deep Learning?

Compositional ModelsLearned End-to-End

layer1

layer2

output

Shallow Learning

[slide credit K. Cho]

Separation of hand engineering and machine learning

= a conclusion reached on the basis of evidence and reasoning

Hand-Engineered Features

43Features from years of vision expertise by the whole community are nowsurpassed by learned representations and these transfer across tasks

[figure credit R. Fergus]

Deep Learning

44[slide credit K. Cho]

End-to-End Learning Representations

The visual world is too vast and variedto fully describe by hand

Learn the representation from datalocal appearance parts and texture objects and semantics

[figure credit H. Lee]

Hierarchical growth of complexity

End-to-End Learning Tasks

The visual world is too vast and variedto fully describe by hand

Learn the task from data

Types of Learning

Vast space of models!

[figure credit Marc’aurelio Ranzato, CVPR 2014 tutorial]

Deep Network

Recurrent Network

Convolutional Network

Example: TensorFlow (URL)

The Neural Networks ZOO : http://www.asimovinstitute.org/neural-network-zoo/

Neural Network Graphs : http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

History

Is deep learning 4, 20, or 50 years old? What’s changed?

2000s Sparse, Probabilistic, and Layer-wise models (Hinton, Bengio, Ng)2012 DL popularized in vision by contest victory (Krizhevsky et al. 2012)

Rosenblatt’s Perceptron52

Radial Basis Function

Convolutional Networks: 1989

LeNet: a layered model composed of convolution and subsampling layers followed by a holistic representationand ultimately a classifier for handwritten digits [LeNet]

Note: channel dimension goes upas spatial dimension goes down... still a common pattern today

AlexNet: a layered model composed of convolution, subsampling, and further operations followed by a holistic representation and all-in-all a landmark classifier onILSVRC12 [AlexNet]

+ data+ gpu+ non-saturating non-linearity+ regularization 54

Convolutional Networks: 2012

FC 1000

FC 4096 / ReLU

Max Pool 3x3s2

Conv 3x3s1, 256 / ReLU

Max Pool 3x3s2

Local Response Norm

Max Pool 3x3s2

Local Response NormConv 11x11s4, 96 /

FC-ReLU:stack at end of the net to learn outputmajority of the learned parameters

Conv-Pool: 1+ conv are followed by pooling to subsamplespatial size shrinks; receptive field grows

Conv-ReLU:all conv are followed by non-linearityin this case ReLU

Convnet Design Patterns

Convnet Computation: 2012 & 2014AlexNet inference for a single image (3x227x227 input):

- 725M FLOPS

- 60M parameters (60,965,224 to be exact)

- 408 mb GPU memory in Caffe<12 gb for batch size of 1,500

- <1ms / image on Titan X with cuDNN v4for batch size >= 256

Compare GoogleNet (ILSVRC14 winner):- 2x FLOPs- 0.1x the parameters- 14% more accurate

Architecture matters!But the computational primitives are the same.

FC 1000

FC 4096 / ReLU

Max Pool 3x3s2

Local Response Norm

Max Pool 3x3s2

Local Response Norm

params FLOPsAlexNet

Convolutional Nets: 2014

GoogLeNet ILSVRC14 Winner: ~6.6% Top-5 error- composition of multi-scale dimension-reduced

“Inception” modules- no FC layers and only 5 million parameters

+ depth+ auxiliary classifiers+ dimensionality reduction

57[Szegedy15]

1x1 Convolution

- reduce channel dimension to control 1. parameter count 2. computation- stack with non-linearity for deeper net- found in many of the latest nets

each filter has size64x1x1 and does a64-dim dot product

1x1 convwith 32 filters

[figure credit A. Karpathy]

VGG16 ILSVRC14 Runner-up: ~7.3% Top-5 error- 13 layers of 3x3 convolution interleaved with

max pooling + 3 fully-connected layers - simple architecture, good for transfer learning- 155 million params and more expensive to compute

+ depth+ fine-tuning deeper and deeper+ stacking small filters

FC 1000

FC 4096 / ReLU

FC 4096 / ReLUMax Pool 2x2s2

Max Pool 2x2s2

Conv 3x3s1, 128 / ReLU Max Pool 2x2s2

Max Pool 2x2s2

stack 23x3 conv

for a 5x5 receptive field

[figure creditA. Karpathy]

[Simonyan15]

ILSVRC15 and COCO15 Winner: MSRA ResNet- classification- detection- segmentation

Learn residual mapping w.r.t. identity

- very deep 100+ layer nets

- skip connections across layers

- batch normalization

Kaiming He, et al.Deep Residual Learning for Image RecognitionarXiv 1512.03385. Dec. 2015.

[He15]

MSRA ResNet

(~5x the layers shown here)

ILSVRC15 Winner 3.5% Top-5 error andCOCO15 Winner with >10% lead for detection and segmentation

- MSRA Residual Net (ResNet): 101 and 152 layer networks- skip and sum layers to form residuals- batch normalization (optimization trick) 61[He15]

Mathematica demo MNIST

MNIST Visualizations

Why Now?1.Data

ImageNet et al.: millions of labeled (crowdsourced) images1.Compute

GPUs: terabytes/s memory bandwidth, teraflops compute1.Technique

new optimization know-how,new variants on old architectures,new tools for rapid experimentation

Why Now? DataFor example:

>14 million labeled images>1 million with bounding boxes

>300,000 images with labeled and segmented objects

Why Now? GPUs

Parallel processorsfor parallel models:

Inherent Parallelismsame op, different data

Bandwidthlots of data in and out

Tuned PrimitivescuDNN and cuBLASfor deep nets for matrices 65

Nvidia News URL

GPU – Graphical Processing Unit

Thousands of parallell coresFully programmable in e.g. CUDAVery affordableShared large memory (e.g. 12 GB)In large server banksCan be rented by Amazon, Baidu, Alibaba etc.

Titan Xp GPU

Why Now? TechniqueNon-convex and high-dimensional learning is okaywith the right design choices

e.g. non-saturating non-linearities

Learning by Stochastic Gradient Descent (SGD) with momentum and other variants — more later!

instead of

framework

Why Now? Deep Learning Frameworks

networkinternal

representation

tools:visualization, profiling, debugging, etc.

layer library:fast implementations of common functions and gradients

backend:dispatch compute for learning and inference

frontend:a language for any network, any task

Deep Learning Frameworks

all open sourcewe like to brew our networks with Caffe

CaffeBerkeley / BVLCC++ / CUDA, Python, MATLAB

TorchFacebook + NYULua (C++)

TheanoU. MontrealPython

TensorFlowGooglePython (C++)

- This isn’t a problem (except for neuroscientists)

- Be wary of neural realism hype or “it just works because it’s like the brain”

- network, not neural networkunit, not neuron

Not So “Neural”

These models are not how the brain worksWe don’t know how the brain works!

Visual Recognition TasksClassification- what kind of image?- which kind(s) of objects?

Challenges- appearance varies by

lighting, pose, context, ...- clutter- fine-grained categorization

(horse or exact species) 72

❏ dog❏ car❏ horse❏ bike❏ cat❏ bottle❏ person

Image Classification: ILSVRC 2010-2015

[graph credit K. He]74

top-5error

❏ dog❏ car❏ horse❏ bike❏ cat❏ bottle❏ person

ImageNet Large Scale Visual Recognition Competition

Website

AlexNET - pdf

Visual Recognition Tasks

car person horse

Detection- what objects are there?- where are the objects?

Challenges- localization- multiple instances- small objects

Detection: PASCAL VOC

[graph credit R. Girshick]76

R-CNN:regions +convnets

state-of-the-art, in Caffe

Visual object classes

Semantic Segmentation- what kind of thing

is each pixel part of?- what kind of stuff

is each pixel?

Challenges- tension between

recognition and localization

- amount of computation

Visual Recognition Tasks

Some examples:

• NVIDIA news:https://news.developer.nvidia.com/google-releases-tensorflow-1-0/http://nvidianews.nvidia.com/news?q=neural+nets&year=&month=&c=&from=&to= http://nvidianews.nvidia.com/news?q=deep+learning&year=&month=&c=&from=&to=

• Free book:http://neuralnetworksanddeeplearning.com/

• Other books:MIT: https://pdfs.semanticscholar.org/751f/aab15cbb955b07537fc38901bc96d4e70f57.pdf

• New companies:http://aidence.com/

• Papers:Classical paper: http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.htmlImageNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks(cited 11342 times) CAD: https://www.nature.com/articles/srep24454

• Google TensorFlow:https://www.tensorflow.org/get_started/

• Kaggle Diabetic Retinopahy Challenge: https://www.kaggle.com/c/diabetic-retinopathy-detection(see also our BMIE project: www.retinacheck.org/zh/index.html).

• Google Diabetic Retinopathy paper:https://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html?m=1

Some Basics of Deep Learning

Why Deep Learning?

Applications

The Challengeof Recognition

Learning & Optimization

Network Tour Transfer Learning

Deep Learning for Vision

Embedded Vision Alliance Tutorial – © Shelhamer, Donahue, Long

Dive intoDeep Learning

What is DL?Why Now?

Caffe First Sip

First Dive Into Deep Learning

Deep Learning is

Stacking LayersandLearning End-to-End

Deep networks are layered models made bystacking different types of transformation

A layer is a transformation

Stacking Layers

x’ = layer(x)

x2 = layer1(x1)x3 = layer2(x2)...

How do layers stack?

Networks run layer-by-layer, composingthe input-output transformation of each layer

Layered Networks

layer1

layer2

output

layer1

layer2

output

During learning, the error is passed backlayer-by-layer to tune the transformations

layer1

layer2

output

input What kind of layers should we stack?

= layer1(input)= layer2(x1)

output+ error

Non-linearity

The simplest layers

Matrix Multiplication

(for example)

Matrix Multiplication

Multiply input x by weights W and add bias bLearns linear transformations

K x O dimensionalK inputsO outputs

O outputs

Matrix Multiplication == Fully Connected Layer

Output is a function of every input, or the input and output are“fully connected”

Abbreviated as FC

[figure credit BDTI]

- Suppose our data points (x) are 2D and each comes with a label y, where y = -1 or y = 1

- Learn a weight vector w = [w1; w2]

- Predict the class of a given xby sign(wTx) = sign(w1x1 + w2x2)

Linear Classification

To classify we need to separate the data into red vs. blue

y = -1

- Suppose our data points (x) are 2D and each comes with a label y, where y = -1 or y = 1

- Learn a weight vector w = [w1; w2]

- Predict the class of a given xby sign(wTx) = sign(w1x1 + w2x2)

Linear Classification

y = -1

Linearity is Not Enough

y = -1

Non-linearity!

y = -1

Non-linearity!

The Limits of Linearity

Linear steps collapse and stay linear

Linear layers alone do not meaningfully stack

The Shallowest Deep Net

Deep nets are made by stacking learned linear layersand simple pointwise non-linear layers

Due to the Rectified Linear Unit (ReLU) non-linearity max(0, x), x3 cannot be computed as a linear function of x1

Linear Non-linear, Deep

add ReLU

Non-linearity is needed to deepen the representationMany non-linearities or activations to choose from

Non-linearityReLU

Sigmoid

Yet more non-linearities

Sigmoid

Leaky ReLU

When in doubt, ReLU

Worth trying Leaky ReLU, ELU

Avoid Sigmoid

Define Your First Net

Let’s go non-linear ona classification problem

Try It OutDeep Learning in your browser demos

Designing for Sight

Convolutional Networks or convnets are nets for vision

- functional fit for the visual worldby compositionality and feature sharing

- learned end-to-end to handle visual detailfor more accuracy and less engineering

Convnets are the dominant architectures for visual tasks

Visual StructureLocal Processing: pixels close together go togetherreceptive fields capture local detail

Across Space: the same what, no matter whererecognize the same input in different places

Can rely on spatial coherence This is not a cat

All of these are cats

Vision Layers

Convolution/Filteringlinear layer for vision

Poolingspatial summarization max pool 2x2

with stride 2

Learned Filter

[figure credit A. Karpathy, cs231n course]

So use the same weights between nodes with the same spatial relationship

Convolution: A Linear Layer for VisionImages have translation invariant semantics: these are all equally squirrels

This is convolution (or correlation—used interchangeably in vision)Convolution means fewer parameters for more efficient learning

A Filter

input is 3x32x32 dataa color image (3 RGB channels) and square (32x32)

A filter is a spatially local and cross-channel templateConvnet filters are learned

[figure adapted from A. Karpathy]

A Filter

filter is 3x5x5 weights- spatially local: kernel size is 5x5- cross-channel: connected across all input channels

A Filter

filter is 3x5x5 weights- spatially local: kernel size is 5x5- cross-channel: connected across all input channels

total parameters:3*52 = 75 filter weights + 1 bias

One filter evaluation is a dot product between the input window and weights + bias

Convolution

inputfilterbiasoutput

3x32x323x5x5

Convolution

3x32x323x5x5

feature map

1x28x28

Convolving the filter with the input gives a feature map.

Convolution

3x32x323x5x5

feature map

Convolving the filter with the input gives a feature map.

1x28x28

Filter parameters:FC parameters:

3*52 = 753*322 = 3,072 [figure adapted from A. Karpathy]

Convolution Layer (conv)

inputfiltersbiasoutput

3x32x326x3x5x5

66x28x28

feature maps

Convolution layers have multiple filters for more modeling capacity

Convolution Layer

Convolution Layer (conv)

inputfiltersbiasoutput

3x32x326x3x5x5

66x28x28

feature maps

Convolution layers have multiple filters for more modeling capacity

Convolution LayerLearned Filters from AlexNet conv1

conv1 has 96 filters foredge, color, and frequency

richer than 3D RGB [figure adapted from A. Karpathy]

Pooling (pool)

2x2 pooling, stride 2Max pooling

Average pooling

Spatial summary by computingoperation over window with stride

- overlapping or non-overlapping

- separate across channels

- Current fashion:3x3 max poolingwith stride 2

Pooling

- reduce resolution

- increase receptive field sizefor later layers

- save computation

- add invariance to translation/noise within pooling window

64x224x22464x112x112

Fully Connected Layers (FC)

Learn a global feature from the full feature mapsOften found at the end of convnetsNote: this could likewise be done by a large convolution kernel

feature maps2x2x2

unroll

input1 x 8

weights8 x 3

outputsor units

bias1 x 3

Normalization Layers (Deprecated)Local response normalization was popular for a time but is now deprecated;more recent networks do not include these layers

- layers compute differentiable transformations

- types of layers: conv, ReLU, pool, FC

- parameters (conv, FC) or not (pool, ReLU)

- arguments like kernel size, stride, etc. (conv pool)

Layer Review

Convnet Architecture

Input Image Scores

Conv 3x3s1, 10 / ReLU Type: Conv Kernel Size: 3x3 Stride: 1 Channels:10 Activation: ReLU

Max Pool 3x3s1

Stack convolution, non-linearity, and pooling until global FC layer classifier

Data augmentation: making muchmore data

transform the training data, without changing its truth

… and anything else you can come up ith! ( d bi ti f th b

horizontal flipscat still a cat

random crops/scalesviews of catcat cat darker cat

relighting

See a Net Learn to See

Let’s watch a convnet as it learnshow to recognize objects in images

MNIST demo: Try It Out

Cifar 10 demo: Try It Out

Internalfunctionality

Supervised Learning

Given labeled data:(x1, y1), (x2, y2), …, (xN, yN)

Goal: find a function f such that yn = f(xn)for all n, “as well as possible”

labeldata

What does “as well as possible” mean?Pick a loss function ℓ(y, ŷ): how wrong is it to predict ŷ when the true label is y?Minimize the total loss over all data:

E.g. ℓ(y, ŷ) = ‖y - ŷ‖2 “Euclidean Loss” or everyday linear regression

Supervised Loss

Parametric Learning

How do we find the label-prediction function f?Parametric answer: pick it from a family determined by a set of parameters θ:

E.g. f(x; θ) = θ x “linear prediction”For us: f is a network, θ is a set of weights

f(x) = f(x; θ)matrix vector

Parametric Supervised Learning

Altogether: our goal is to find θ in order to loss true label

parameters(weights)

model(network)

predicted label

sum over data 126

Underfitting and Overfitting

underfitting:not enough parameters to model the data

overfitting:enough parameters to memorize the training set without generalizing

fewer parameters more parameters

RegularizationHow can we prevent overfitting without reducing the number of parameters?

Add a regularization penalty to our loss: “complicated” solutions are worse128

Regularization: Weight Decay and Dropout

Weight Decay: minimize L(θ) + λ‖θ‖2 to pull weights toward zeroλ (scalar) is an optimization setting… pick it empiricallyaka “L2 regularization”

Dropout: during training, randomly set a fraction p of activations to zerop is an optimization setting (often 0.5)forces model to be robust to noise

Gradient Descent: Intuition

Want to minimize “loss” function L(x; θ)

θ axis

L(x; θ)

Move in the direction of the gradient

old θnew θ

θ (vector): parameter to updatex (vector): input data (fixed on this slide)

The gradient tells you, for each element of the network parameters,how the loss changes in response to a change in that parameter.

Stochastic Gradient Descent (SGD)

Want to minimize “loss” function L(x; θ)1. Pick input datum x

2. Compute parameter gradient

3. Multiply by learning rate

4. Update parameters θ

(The alternative is to average the gradient over all available data,“batch gradient descent”:

That’s too slow for big data!)

Why “Stochastic”?

The gradient depends on the choice of input datum xChoose x randomly (or just cycle through all data in a fixed order)

SGD with Weight Decay and Momentum

weight decay(regularization)

SGD with Weight Decay and Momentum

There are many other variants:Adam, RMSprop, AdaDelta, AdaGrad, Nesterov, ...

weight decay(regularization)

momentum(p is a number less than 1)

Sigmoid

Layer GradientsMatrix Multiply Gradients

Back-propagation: The Chain Rule

layer1

loss (ℓ)

A net is a composition of layer functionsThe gradient of a net is the product of layer gradients

Back-propagation in a Bigger Net

layer1

layer2

output

y truth

Backward passForward pass

Introduction to Deep Learning for Biomedical...

Documents

Transcript of Introduction to Deep Learning for Biomedical...

Convolutional Neural Networks for Biomedical Image ...Deep Learning Intro Perceptron and MLP intro Convolutional NN intro Deep CNN Tools and methods for Deep CNNs. Artificial Neural

NTU & NTUST Tangible Interaction Design Project Proposal 3

Presentasi LPJ NTUST-ISA 2013

How to Select Course at NTUST

Deep Learning and Artificial Intelligence in Biomedical ...

Introduction to Fluid Mechanics Ming-Jyh CHERN NTUST.

NTU & NTUST Tangible Interaction Design Project Proposal Final

Biomedical image reconstruction: From the foundations to deep … · 2020. 5. 20. · Biomedical image reconstruction: From the foundations to deep neural nets Tutorial 10, IEEE Int.

NTUST Course Selection - How to

Biologically-derived Materials for Powering Next Generation Biomedical … · 2018-07-13 · Implantable biomedical devices Stimulate deep brain for treating obesity Tissue stimulation

NTUST-ISA Welcome Party Spring 2013

Include2011 ntust

09/16/2010© 2010 NTUST Today Course overview and information.

人際關係與溝通技巧 Ntust pp120

09/16/2010© 2010 NTUST Chapter 3 Course overview and information.

NTUST-ISA Welcome Party Fall 2012

NTUST, spatial media group

Peresmian Kerjasama WM dengan NTUST

Public Speaking Seminar Material for NTUST-ISA

Recent Advances in Photoacoustic Imaging for Deep-Tissue … · 2016-10-05 · Recent Advances in Photoacoustic Imaging for Deep-Tissue Biomedical Applications Sheng 1Wang ,2 3 *,