Lecture 15: Learning Basics Neural Network / Deep Machine Learning …€¦ · Machine Learning...

UVA CS 6316:Machine Learning

Lecture 15: Neural Network / Deep Learning Basics

1 + ewx+b

Logistic Regression

Sigmoid Function(aka logistic, logit, “S”, soft-step)

P(Y=1|x) =

1 + ez

Logistic Regression

ŷ = sigmoid(z) =

bSummingFunction

Multiply by weights

ŷ = P(Y=1|x;w)

Σ x wib+=z i

OutputSigmoidFunction

1 + ez

Logistic Regression

= wT x + b

ŷ = sigmoid(z) =6

px11xpp = 3

bSummingFunction

SigmoidFunction

Multiply by weights

Σ x wib+=z i

Output

1x1Input

ŷ = P(Y=1|x;w)

“Neuron” or “Perceptron”

From here on, we leave out bias for simplicity

Neurons

9http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

“Block View” of a Neuron

x *w z

parameterized block

1 + ez

z = wT x

ŷ = sigmoid(z) =px11xp1x1

Dot Product SigmoidInput Output

Neuron Representation

The linear transformation and nonlinearity together is typically considered a single neuron

Neuron Representation

Multiple Neurons

Input x

1 + ez

z =WT x

ŷ= sigmoid(z) =px1dxpdx1

dx1 dx1

Multiple Neurons

Wmatrix

vectorz

Input x

ŷ = P(Y=1|x;W)

Neural Network (1 Hidden Layer)

Input x

h1subscript represents layer

ŷ = P(Y=1|x;W)

Input x

ŷ = P(Y=1|x;W,w)

z1 =W1T x

h1= sigmoid(z1) px1dxpdx1

dx1 dx1

z2 =w2T h1

ŷ= sigmoid(z2) px11xp1x1

1x1 1x1

Input x

ŷ = P(Y=1|x;W,w)

z1 =W1T x

dx1 dx1

z2 =w2T h1

1x1 1x1

Dot Product Sigmoid

x *W1 z1

W1 is a matrix z1 is a vector

Output

Dot Product Sigmoid

*w2 z2

z1 =W1T x

dx1 dx1

z2 =w2T h1

1x1 1x1

Dot Product Sigmoid

x *W1 z1

W1 is a matrix z1 is a vector

Output

Dot Product Sigmoid

*w2 z2

z1 =W1T x

dx1 dx1

z2 =w2T h1

1x1 1x1

Neural Network (2 Hidden Layers)

1st hidden layer

2nd hiddenlayer

Outputlayer

Neural Network (2 Hidden Layers)

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =wT h2 ŷ = sigmoid(z3)

h1 h2W1

hidden layer 1 output

Multi-Class Neural Network (2 Hidden Layers)

h1 h2W1 w3

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =wT h2 ŷ = sigmoid(z3)

“Block View” Of Multi-Layer Neural Network

1st hidden layer

2nd hidden layer

Output layer

z1 z2 z3h1 h2 ŷ

“Deep” Neural Networks (i.e. > 1 hidden layer)

Toy Example: Will I pass this class?

http://introtodeeplearning.com/2019/materials/2019_6S191_L1.pdf

Binary Classification LossBinary Cross Entropy Loss

ŷ = P(Y=1|X,W)

E= loss = - logP(Y = ŷ | X= x ) = - y log(ŷ) - (1 - y) log(1-ŷ)

true output

Multi-Class Classification LossCross Entropy Loss

“Softmax” function. Normalizing function which converts each class output to a probability.

E = loss = - yj log ŷjΣj = 1...K

= P( yi = 1 | x )

“0” for all except true class

0.10.70.2

Regression LossMean Squared Error Loss

E = loss= ( y - ŷ )212

true output

x ŷ E (ŷ; W)

W* = argmin E( ŷ ; W)W

Training Neural Networks

How do we learn the optimal weights WL for our task??● Gradient descent:

LeCun et. al. Efficient Backpropagation. 1998

WL(t+1) = WL(t) - 𝜂 𝝏 E𝝏 WL(t)

Gradient Descent

W* = argmin E( ŷ ; W)W

Gradient Descent

𝝏 E𝝏 W(t)

Gradient Descent

𝝏 E𝝏 W(t)

Gradient Descent

Training Neural Networks

45LeCun et. al. Efficient Backpropagation. 1998

But how do we get gradients of lower layers (e.g. W1) ?● Backpropagation!

○ Repeated application of chain rule of calculus○ Locally minimize the objective○ Requires all “blocks” of the network to be differentiable

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Tells us: by increasing x by a scale of 1, we decrease f by a scale of 4

ŷ = P(y=1|X,W)

Backpropagation(binary classification example)

Example on 1-hidden layer network for binary classification

x ŷ*W1

z1z2h1

x ŷ*W1

z1 z2h1

E = loss =

Gradient Descent to Minimize loss: Need to find these!

x ŷ*W1

z1 z2h1

x ŷ*W1

z1 h1 z2

Exploit the chain rule!

z1 ŷh1 z2

chain rule

z1 ŷh1 z2

already computed!

z1 ŷh1 z2

“Local-ness” of Backpropagation

“local gradients”activations

gradients

“Local-ness” of Backpropagation

activations

gradients

“local gradients”

Example: Sigmoid Block

sigmoid(x) = 𝛔 (x)

f3f2f1

Deep Learning: Layers of Differentiable Parameterized Functions (with nonlinearities)

E(ŷ,y)

Building Deep Neural Nets

Block Example Implementation

Deep Learning Frameworks

Pytorch Sample Code

Building Deep Neural Nets

“GoogLeNet” for Object Classification

Backprop Demo

h1 =exp(z1)

1 + exp(z1)exp(z2)

1 + exp(z2)h2 =

E = ( y - ŷ )2

w(t+1) = w(t) - 𝜂 𝝏 E𝝏 w(t)

𝝏 E𝝏 wi

z1 = x1w1 + x2w3 z2 = x1w2 + x2w4

ŷ = h1w5 + h2w6 f3

Nonlinearity Functions (i.e. transfer or activation functions)

SummingFunction

90http://introtodeeplearning.com/2019/materials/2019_6S191_L1.pdf

91https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

usually works best in practicehttps://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Neural Net Pipeline

1. Initialize weights2. For each batch of input x samples S:

a. Run the network “Forward” on S to compute outputs and lossb. Run the network “Backward” using outputs and loss to compute gradientsc. Update weights using SGD (or a similar method)

3. Repeat step 2 until loss convergence

Non-Convexity of Neural Nets

In very high dimensions, there exists many local minimum which are about the same.

Pascanu, et. al. On the saddle point problem for non-convex optimization 2014

Advantage of Neural Nets

As long as it’s fully differentiable, we can train the model to automatically learn features for us.

Lecture 15: Learning Basics Neural Network / Deep Machine Learning …€¦ · Machine Learning...

Documents

Transcript of Lecture 15: Learning Basics Neural Network / Deep Machine Learning …€¦ · Machine Learning...

Lecture 1: Introduction to Neural Networks and Deep Learning · Robotics and AI Music and the arts! o A brief history of Neural Networks and Deep Learning o Basics of Neural Networks

Basics of Neural Network Programming - Stanford Universitycs230.stanford.edu/files/C1M2.pdf · Basics of Neural Network Programming Explanation of logistic regression cost function

Artificial Neural Networks Lect3: Neural Network Learning rules

Deep Learning in Neural Networks: An Overviejuergen/DeepLearning28May2014.pdf · Deep Learning in Neural Networks: An Overview ... 1 Introduction to Deep Learning ... Neural Fitted

CS 6501: Deep Learning for Computer Graphics Basics of ...connelly/class/2016/deep_learning_graphics/03... · CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks

Basics Of Neural Network Analysis

Learning Neural Networks (NN)

Basics of Neural Nets and Past-Tense model

Neural Network Learning Rules

NVIDIA SDK 在高教中的应用 · 6.1 - Recurrent Neural Network Basics 6.2 - Advanced Recurrent Neural Networks 6.3 - Sequences Modeling with Deep Learning 6.4 - Embedding Methods

Keywords: Machine learning, reinforcement learning, artificial neural … · Keywords: Machine learning, reinforcement learning, artificial neural network, mobile robot, control,

Neural Optimizer Search with Reinforcement Learning - … · Neural Optimizer Search with Reinforcement Learning ... Neural Optimizer Search with Reinforcement Learning ... Our approach

Neural Networks - Seoul National Universitymrl.snu.ac.kr/courses/CourseMLforCG/NeuralNetworks.pdf · 2018-03-26 · •Neural network basics •Convolutional Neural Networks •Recurrent

Overview Course introduction Neural Processing: Basic Issues Neural Communication: Basics Vision, Motor Control: Models.

NEURAL NETWORKS: Basics using MATLAB Neural Network ...

Master of Technology SYLLABUSFuzzy Logic. Artificial Neural Networks - Basics of ANN – Topology - Learning Processes - Supervised and unsupervised learning. Least mean square algorithm,

Basics and Features of Artificial Neural Networks

1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies.

neural network learning

A Self-Learning Neural Network - Neural Information ...papers.nips.cc/paper/189-a-self-learning-neural-network.pdfA Self-Learning Neural Network 771 voltages were allowed to change