ECML-2015 Presentation
-
Upload
anirban-santara -
Category
Software
-
view
273 -
download
0
Transcript of ECML-2015 Presentation
Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training
ByAnirban Santara*, Debapriya Maji, DP Tejas, Pabitra
Mitra and Arobinda Gupta
Department of Computer Science and Engineering
Paper id: 6PDCKDD Workshop
Department of Computer Science and Engineering 7 September 2015
Page #2Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
IntroductionAutoencoder:• It is an artificial neural network used for unsupervised learning.• It consists of an encoder followed by a symmetrical decoder.• It learns to reconstruct the input with minimum amount of
deformation at the output of the decoder.
Deep Stacked Autoencoder:• It is an autoencoder with 3 or more hidden layers of neurons.• Learns representations of hierarchically increasing levels of
abstraction from the data.
Uses:• Efficient non-linear dimensionality reduction. E.g. Hinton 2006• Data-driven representation learning. E.g. Vincent 2009
Encoder Decoder
Department of Computer Science and Engineering 7 September 2015
Page #3Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Training stacked autoencoder
Initialization: Greedy layer-wise unsupervised pre-training
using RBM, for example. (Bengio 2009, Hinton 2006)
Fine-tuning: Back-propagation over the entire network
(Hinton 1989)
Department of Computer Science and Engineering 7 September 2015
Page #4Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Efforts at parallelizationData-level parallelism:• Calculations pertaining to different subsets of the data are
carried out at different processing nodes and the updates generated are averaged.
• Suitable for computing clusters as it requires little communication.
Network-level parallelism:• The neural network is partitioned (physically or logically)
and each part trains in parallel at a different computing node on the same whole dataset.
• Suitable for multi-core CPUs that allow fast inter-processor communication.
To the best of our knowledge, all existing methods of pre-training use a greedy layer-by-layer approach
Data
Network
Department of Computer Science and Engineering 7 September 2015
Page #5Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
A major drawback of greedy layer-wise pre-training
D1 D2 D3 D4
N1 epochs N2 epochs N3 epochs N4 epochs
Every layer Li waits idle for: All layers L1 through Li-1 before it
can start learning All the remaining layers, after it
has finished learning
The guiding philosophy of the proposed algorithm is to reduce the idle time of greedy layer-wise pre-training by introducing parallelism with synchronization
Department of Computer Science and Engineering 7 September 2015
Page #6Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Proposed algorithm: Synchronized layer-wise pre-training
D1[0]
T1
T2
T3
T4
D2[1] D2[3] D2[k-1] D2[k]
D3[1] D3[3] D3[m-1] D3[m]
D4[1] D4[2] D4[n-1] D4[n]
time
Di[n]Data
for ith layer
in nth epoch
• Algorithm starts with T1 beginning to learn L1
• Ti waits until Ti-1 has completed one epoch of training
• Every time Ti completes one epoch, it transforms Di with the current weights and biases and modifies Di+1
Department of Computer Science and Engineering 7 September 2015
Page #7Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Proposed algorithm: Synchronized layer-wise pre-training (contd.)
T1
T2
T3
T4
time
N2 epochs D2[N2+1]
N3 epochsD3N3+1]
N4 epochsD4N4+1]
N1 epochs• Every thread Ti
executes a specified Ni epochs of learning and goes to sleep
• If Ti-1 modifies Di after that, Ti wakes up, executes one epoch of learning and goes back to sleep
• The algorithm terminates when all the threads have finished their stipulated iterations
Department of Computer Science and Engineering 7 September 2015
Page #8Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Experimental set-up
Experiment Pre-training Fine-tuning1. Benchmark with greedy layer-wise
pre-training20 epochs of greedy layer-wise pre-
training of each layer using RBM10 epochs of fine-tuning with
backpropagation over the entire architecture
1. Verification of proposed synchronized layer-wise pre-
training algorithm
Minimum 20 epochs (Ni=20) of synchronized layer-wise pre-training of
each layer using RBM
10 epochs of fine-tuning with backpropagation over the entire
architecture
• Problem: Dimensionality reduction of handwritten digits of MNIST dataset using deep stacked autoencoder using mean squared error to measure reconstruction accuracy.
• Architecture:
• Experiments:
Parameter Value
Depth 5
Layer dimensions 784, 1000, 500, 250, 30
Activation function sigmoid
Fig: sample digits from MNIST
Department of Computer Science and Engineering 7 September 2015
Page #9Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Experimental set-up (contd.)• Parameters for the learning algorithms:
• System specifications:
Parameter Value
RBM (Contrastive Divergence)
Learning rate 0.1
Momentum 0.5 for the first 5 epochs and 0.9 afterwards
Backpropagation Learning rate 0.001
Parameter Value
Number of CPU cores 8
Main memory 8 GB
Department of Computer Science and Engineering 7 September 2015
Page #10Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Results (convergence)Reconstruction error for training set
Reconstruction error for validation set
Department of Computer Science and Engineering 7 September 2015
Page #11Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Results: curious behaviour of innermost layerReconstruction error for training set
Reconstruction error for validation set
Department of Computer Science and Engineering 7 September 2015
Page #12Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Results: variation of overall reconstruction error on validation set
During pre-training
During fine-tuning
Department of Computer Science and Engineering 7 September 2015
Page #13Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Comparison with greedy layer-wise pre-training
Algorithm Training error Test error
Greedy pre-training 8.00 8.19
Synchronized pre-training 8.39 8.57
• Average squared reconstruction error per digit:
• Execution times:
The proposed algorithm converges 1h 26min 49 sec faster which is a 26.17% speedup.
Algorithm Pre-training time Fine-tuning time
Greedy pre-training 3h 14min 43sec 2h 16min 59secs
Synchronized pre-training 1h 49min 11sec 2h 15min 42secs
Fig: Samples from MNIST dataset
Fig: Reconstructed digits from benchmark algorithmFig: Reconstructed digits from proposed algorithm
Department of Computer Science and Engineering 7 September 2015
Page #14Faster learning of deep stacked autoencoders on multi-core systems using
synchronized layer-wise pre-training
Summary• To reduce the idle time of greedy layer-wise pre-
training by introducing parallelism with synchronization
• The hidden layers start learning from immature training data.
• The training data is updated after every epoch of learning of the previous layer
• Convergence with performance at par with the benchmark (on MNIST dataset)
• 26.17% faster convergence observed using multiple cores of CPU
Motivation
Approach
Achievements