万物の根源を考える - せたなべ哲学(渡辺 ...moon-the.biz/2/14S1.pdf · プラトンbc427-347
COMP3411/9414: Artificial Intelligence Deep Learning...
Transcript of COMP3411/9414: Artificial Intelligence Deep Learning...
COMP3411/9414: ArtificialIntelligence
Deep Learning Introduction
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 1
Deep Learning Networks
� Backpropagation trains a network by passing corrections from the
output nodes to previous units in the network.
� Training is based on classification, the network weights are initially
random, and are adjusted according to a gradient.
� With increased depth, learning is less effective. The gradient becomes
weaker with depth, as learning must pass down from the classification
layer.
� Backpropagation networks require extensive training data, which can
be difficult or costly to produce
� Training can become stuck in local minima and not find better
solutions
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 2
Deep Learning Networks
� Deep Learning techniques address a number of these issues
� Discovery of re-used features, potential for more modular learning
� Learning from unlabelled data (followed by supervised learning)
� The ability to train deeper networks, and capture intermediate features
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 3
Deep Learning Networks
� Machine Learning theory says we can learn any function with
accuracy as close as we want with a single layer, so why bother?
2-layer MLPs and SVMs are “universal”
� The right representation can be much more efficient for particular
tasks
� There is significant modularity in the brain- deep networks of re-used
features are seen in vision, and are useful for audio and natural
language tasks
� A more promising approach for more general AI
� http://www.technologyreview.com/featuredstory/513696/deep-learning
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 4
Deep Learning Networks
� Common techniques:
◮ Unsupervised learning, to pre-train the network
◮ Feature learning takes place one layer at a time. Outputs from
features of one layer are used as inputs for the next.
◮ After pre-training, supervised learning is performed on the
network using backpropagation
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 5
Layer-wise Pre-Training
a
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 6
Deep Learning Networks
� Main approaches:
◮ Autoencoder networks (unsupervised pre-training)
◮ Restricted Boltzmann Machine networks (unsupervised pre-
training)
◮ Convolutional Neural Networks (sparse, deep topology)
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 7
Autoencoder networks
b
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 8
Autoencoder networks
� Data is provided as input, and the output of the network tries to
reconstruct the input
� Learning is performed using backpropagation or related methods
� The target output of the network is set to the input
� The aim of training is to minimise the error of reconstruction
� A reduced set of hidden units is used, creating an information
bottleneck
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 9
Restricted Boltzmann Machine Networks
d
� RBMs are another technique for pre-training, to capture features of
the input. They are recurrent networks, with a number of stable states
� Given an input, the network can be sampled. Activations are passed
from the input to the hidden layer, then from hidden to the input layer,
repeating until stability is reached.
� The visible layer provides a reconstruction of the input. Training the
network allows capturing features of the input.
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 10
Restricted Boltzmann Machine Networks
� Stable states of the network have low energy values
� Gibbs sampling is used to find a low energy state
� The energy values of configurations are defined by the weights
� Probabilistic: weights → energy values → probabilities
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 11
Restricted Boltzmann Machine Networks
P(y)
input sum
1
0
� Units are binary stochastic neurons. Either on or off, firing is given
by a probability value according to the sum of inputs
� The activation function describes the probability of firing P(y j) as a
function of the input ∑i xiwi j
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 12
Alternating Gibbs Sampling
c
� The input is presented at the visible units
� Hidden units are updated based on probabilistic activations. Visible
units are updated subsequently. This repeats until stability is reached.
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 13
Pre-trained Deep Networks
Putting it all together:
� This approach can be used to pre-train a network, before performing
supervised learning
� A classification layer can be added to the top layer, representing
classes for supervised learning. A fine-tuning stage adjusts
weights using an error function defined at the output nodes, by
backpropagation.
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 14
Deep Learning Networks
f
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 15
Convolutional Neural Networks
e
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 16
Convolutional Neural Networks
� CNNs are a form of deep neural network with a specific topology,
based on structure seen in the visual system
� Each unit has a limited receptive field
� Units are convolutional, the same set of weights are used to find a
response in multiple positions
� Convolutional and sub-sampling layers perform specific functions,
acting in a manner similar to simple and complex cells in the visual
system
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 17
Summary
� Deep Learning approaches introduce a number of new techniques that
allow an increase in depth and modularity of neural networks
� Unsupervised learning allows capturing structure from observations,
without relying on feedback from classifications
� Unsupervised pre-training improves the reliability and accuracy of
supervised learning
� These techniques offer many new opportunities for machine learning
and more general artificial intelligence
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 18
Application- Speech Recognition
� “Context-Dependent Deep Neural Network Hidden Markov Model”
� Hidden Markov Model- learn relationships of transitions between
hidden states, and between hidden states and observations
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 19
Application- Speech Recognition
� The HMM system can be used to infer the phoneme symbols from
audio samples, based on the relationships between acoustic patterns
and symbols, and probabilities of symbol sequences
� Hidden Markov Model- learn relationships of transitions between
hidden states, and between hidden states and observations
� The Deep Neural Network is used to learn the probability distribution
of symbolic states from audio
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 20
Application- Speech Recognition
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 21
Application- Speech Recognition
� 7 layers, 2048 hidden units at each layer
� Trained on 309 hours of training data
� Each layer pre-trained as a Restricted Boltzmann Machine
� Fine-tuned using 9304 triphone states (output layer)
� Improvement from 27.4% to 18.5% error (30% improvement)
� Demonstration: http://www.youtube.com/watch?v=Nu-nlQqFCKg
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 22
References
� “Improving neural networks by preventing co-adaptation of feature
detectors”, Hinton et al 2012, http://arxiv.org/abs/1207.0580
� “Reducing the Dimensionality of Data with Neural Networks”,
Hinton, Salakhutdinov 2007
� “Conversational Speech Transcription Using Context-Dependent
Deep Neural Networks”, Seide 2011, http://research.microsoft.com/apps/pub
� “Context-Dependent Pre-Trained Deep Neural Networks for Large-
Vocabulary Speech Recognition”, Dahl 2012, http://ieeexplore.ieee.org/stamp
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 23
References
� “Scientists See Promise in Deep-Learning Programs”, http://www.nytimes.com/20
� “ ImageNet Classification with Deep Convolutional Neural Net-
works”, Krizhevsky et al 2012, http://nips.cc/Conferences/2012/Program/even
� “Building high-level features using large scale unsupervised
learning”, Le et al 2011, http://rxiv.org/abs/1112.6209
� “In a Big Network of Computers, Evidence of Machine Learning”,
http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-co
COMP3411/9414 c©Anthony Knittel, 2014
COMP3411/9414 Deep Learning Introduction 24
References
COMP3411/9414 c©Anthony Knittel, 2014
afigure by Yoshua Bengio, Montreal. “Learning Deep Architectures for AI”bfigure by Andrew Ng, Stanford. “Sparse Autoencoder”cfigure by Geoff Hinton, Toronto. “The next generation of neural networks”dfigure by LISA lab, Toronto. “Deep Learning tutorials: Restricted Boltzmann Ma-
chines”. http://deeplearning.net/tutorial/rbm.htmlefigure by LISA lab, Toronto. “Deep Learning tutorials: Convolutional Neural Networks”.
http://deeplearning.net/tutorial/lenet.htmlffigure from Zeiler & Fergus 2013