Advanced Topics - University of Notre Damerjohns15/cse40647.sp14/www... · Machine Learning and AI...
Transcript of Advanced Topics - University of Notre Damerjohns15/cse40647.sp14/www... · Machine Learning and AI...
Advanced Topics
Data Preprocessing
Advanced Topics
Last class
• Neural Networks
2
Data Preprocessing
Advanced Topics
Popularity and Applications
• Major increase in popularity of Neural Networks
• Google developed a couple of efficient methods that allow for the training of huge deep NN – Asynchronous Distributed Gradient Descent
– L-BFGS
• These have recently been made available to the public
3
Data Preprocessing
Advanced Topics
Large scale ANN
4
Model
Training Data
Machine Learning and AI via Brain simulations - Andrew Ng
Data Preprocessing
Advanced Topics
Popularity and Applications
5
Parameter Server
Model
Workers
Data Shards
Machine Learning and AI via Brain simulations - Andrew Ng
Parameter Server
Model
Workers
Data
Coordinator
(small messages)
L-BFGS Asynchronous Distributed Gradient Descent
• 20,000 cores in a single cluster
• up to 1 billion data items / mega-batch (in ~1 hour)
Data Preprocessing
Advanced Topics
Neural Networks
6 Machine Learning and AI via Brain simulations - Andrew Ng
Data Preprocessing
Advanced Topics
Popularity and Applications
7 Machine Learning and AI via Brain simulations - Andrew Ng
Data Preprocessing
Advanced Topics
Popularity and Applications
8 Machine Learning and AI via Brain simulations - Andrew Ng
Data Preprocessing
Advanced Topics
Learning from Unlabeled Data
9 Machine Learning and AI via Brain simulations - Andrew Ng
[Banko & Brill, 2001]
Training set size (millions)
A
ccu
racy
“It’s not who has the best algorithm that wins. It’s who has the most data.”
Data Preprocessing
Advanced Topics
10 Machine Learning and AI via Brain simulations - Andrew Ng
Preliminaries Data Understanding
Data Preprocessing
Data Modeling Validation & Interpretation
Advanced Topics
Data Preprocessing
Advanced Topics
Feature Learning
A set of techniques in machine learning that learn a transformation of "raw" inputs to a representation that can be effectively exploited in a supervised learning task
such as classification
11
Data Preprocessing
Advanced Topics
Feature Learning
• Can be supervised or unsupervised
• Example of algorithms: – Autoencoders
– Matrix Factorization
– Restricted Boltzmann machine
– Clustering
– Neural Networks
12
Data Preprocessing
Advanced Topics
Feature Learning
13
• Each k-centroid can be used to represent a feature for supervised learning
• Each feature 𝑗 has value 1 iff the 𝑗𝑡ℎ centroid learned by k-means is the closest to the instance under consideration
Data Preprocessing
Advanced Topics
Deep Learning
14
• The motivation: Some data representations make it easier to learn particular tasks (e.g., image classification)
• Ex.: Our assignment 2: – It’s hard to give your computer an image and ask “what season does it represent?” but if we can
apply transformations that describe similar images with a simpler representation, we might accomplish that task
Andrew Ng
• Machine learning algorithms to model high-level data abstractions using a series of transformations
• Based on feature learning • The motivation: Some data representations
make it easier to learn particular tasks (e.g., image classification)
Data Preprocessing
Advanced Topics
Self-Taught Learning (Unsupervised Feature Learning)
15 Machine Learning and AI via Brain simulations - Andrew Ng
Testing:
What is this?
Not motorcycles
Unlabeled images …
Motorcycles
Data Preprocessing
Advanced Topics
• Supervised learning
• Semi-supervised learning
• Self-taught learning (unsupervised feature learning)
Cars Motorcycles
Random images
Car Motorcycle
Cars Motorcycles
Unlabeled cars & motorcycles
Data Preprocessing
Advanced Topics
Self-Taught Learning
• One can always try to get more labeled data, but this can be expensive – Amazon Mechanical Turk
– Expert feature engineering
• The promise of self-taught learning and unsupervised feature learning: – If we can get our algorithms to learn from unlabeled data, then we can
easily obtain and learn from massive amounts of it
• 1 instance of unlabeled data < 1 instance of labeled data • Billions of instances of unlabeled data >> some labeled data
17
Data Preprocessing
Advanced Topics
Self-Taught Learning
• The idea: – Give the algorithm a large amount of unlabeled data
– The algorithm learns a feature representation of that data
– If the end goal is to perform classification, one can find a small set of labeled instances to probe the model and adapt it to the supervised task
18
Data Preprocessing
Advanced Topics
Autoencoders
• An artificial neural network used for learning efficient codings
• Can be used for dimensionality reduction
• Similar to a Multilayer Perceptron with an input layer and one or more hidden layers
• The difference between autoencoders and MLP: – Autoencoders have the same number of inputs and outputs
– Instead of predict y, autoencoders try to reconstruct x
19
Data Preprocessing
Advanced Topics
Autoencoders
20
If the hidden layers are narrower than input layer, then the activations of the final layers can be regarded as compressed representation of the input
Data Preprocessing
Advanced Topics
Self-Taught Learning in Practice
21
• Suppose we have an unlabeled training set
*𝑥𝑢1
, 𝑥𝑢2
, … , 𝑥𝑢𝑚𝑢 + with 𝑚𝑢 unlabeled instances
• Step 1: Train an autoencoder on this data
Data Preprocessing
Advanced Topics
Self-Taught Learning in Practice
22
• After step 1, we will have learned all weight parameters for the network
• We can also visualize the algorithm for computing the features/activations 𝑎 s the following neural network:
Data Preprocessing
Advanced Topics
Self-Taught Learning in Practice
23
• Step 2:
– Training set: 𝑥𝑙1
, 𝑦 1 , 𝑥𝑙2
, 𝑦 2 , … , (𝑥𝑙𝑚𝑙 , 𝑦(𝑚𝑙)) of 𝑚𝑙 labeled
examples
– Feed training example 𝑥𝑙1
to the autoencoder and obtain its
corresponding vector of activations 𝑎𝑙(1)
– Repeat that for all training examples
Data Preprocessing
Advanced Topics
Self-Taught Learning in Practice
24
• Step 3:
– Replace the original feature with 𝑎𝑙(1)
– The training set then becomes:
• 𝑎𝑙1
, 𝑦 1 , 𝑎𝑙2
, 𝑦 2 , … , (𝑎𝑙𝑚𝑙 , 𝑦(𝑚𝑙))
• Step 4: – Train a supervised algorithm using this new training set to obtain a
function that makes predictions on the y values
Data Preprocessing
Advanced Topics
An Example of Self-Taught Learning
25 Machine Learning and AI via Brain simulations - Andrew Ng
• What Google did:
– Trained on 10 million images (YouTube)
– 1000 machines (16,000 cores) for 1 week.
– 1.15 billion parameters
– Test on novel images
Training set (YouTube) Test set (FITW + ImageNet)
Data Preprocessing
Advanced Topics
Result Highlights
26 Machine Learning and AI via Brain simulations - Andrew Ng
• The face neuron
Top stimuli from the test set Optimal stimulus by numerical optimization
Data Preprocessing
Advanced Topics
Result Highlights
27 Machine Learning and AI via Brain simulations - Andrew Ng
• The face neuron
Feature value
Random distractors
Faces
Frequency
Faces
Random distractors
Data Preprocessing
Advanced Topics
Result Highlights
28 Machine Learning and AI via Brain simulations - Andrew Ng
• The cat neuron
Optimal stimulus by numerical optimization
Data Preprocessing
Advanced Topics
Result Highlights
29 Machine Learning and AI via Brain simulations - Andrew Ng
Data Preprocessing
Advanced Topics
Best stimuli
Pooling Size = 5
Number
of maps = 8
Image Size = 200
Number of output
channels = 8
Number of input
channels = 3
On
e la
yer
RF size = 18
Input to another layer above
(image with 8 channels)
W
H
LCN Size = 5
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Data Preprocessing
Advanced Topics
Pooling Size = 5
Number
of maps = 8
Image Size = 200
Number of output
channels = 8
Number of input
channels = 3
On
e la
yer
RF size = 18
Input to another layer above
(image with 8 channels)
W
H
LCN Size = 5
Feature 7
Feature 8
Feature 6
Feature 9
Best stimuli
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Data Preprocessing
Advanced Topics
Pooling Size = 5
Number
of maps = 8
Image Size = 200
Number of output
channels = 8
Number of input
channels = 3
On
e la
yer
RF size = 18
Input to another layer above
(image with 8 channels)
W
H
LCN Size = 5
Feature 11
Feature 10
Feature 12
Feature 13
Best stimuli
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012