Multimedia data mining using deep learning
-
Upload
peter-wlodarczak -
Category
Data & Analytics
-
view
478 -
download
0
Transcript of Multimedia data mining using deep learning
![Page 2: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/2.jpg)
Agenda
Aims
Multimedia Data Mining
Artificial Neural Networks
Deep learning
Challenges
Discussion
![Page 3: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/3.jpg)
Aims
Analyze multimedia data for:
Object/face recognition
Voice commands
Natural Language Processing
Classification
Automatic caption generation
Record linkage (entity resolution)
![Page 4: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/4.jpg)
Multimedia Data Mining I
Multimedia data mining:
Unprecedented amount of Multimedia data
since Web 2.0 and Social Media
Prosumer data
Uses algorithms to extract useful patterns
and relations from image, audio and video
data
Traditional methods often not satisfactory
Unsuitable for high dimensionality
![Page 5: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/5.jpg)
Multimedia Data Mining II
Multimedia data mining has been
improved using deep learning in:
Visual data mining
Natural Language Processing
Deep learner are:
Machine Learning schemes
Usually multi-layered artificial neural
networks
![Page 6: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/6.jpg)
Artificial Neural Networks I
Artificial Neural Networks:
Suitable to give good approximations for
complex problems
Consist of perceptrons, neurons,
and weighted connections,
the axons
![Page 7: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/7.jpg)
Artificial Neural Networks II
Perceptron (Neuron)
Linear classifier
Data linearly separable using a hyperplane
Where w = weights, a = real-valued vector,
feature vector, a0 = bias
Binary classifier f(a) that maps its input
vector a to a single, binary output value
w0a0 + w1a1 + w2a2 + … + wkak = 0
![Page 8: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/8.jpg)
Artificial Neural Networks III
w0
1
bias
attr
a1
attr
a2
attr
a3
w1 w2
w3
f(a) = kwkak + b
f(a) > 0 or
f(a) < 0
![Page 9: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/9.jpg)
Artificial Neural Networks III
Training data
sex mask cape tie ears smokes class
Batman male yes yes no yes no Good
Robin male yes yes no no no Good
Alfred male no no yes no no Good
Penguin male no no yes no yes Bad
Catwoman female yes no no yes no Bad
Joker male no no no no no Bad
Test data
Batgirl female yes yes no yes no ?
Riddler male yes no no no no ?
Supervised learning
![Page 10: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/10.jpg)
Artificial Neural Networks IV
Not all data is linearly separable
![Page 11: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/11.jpg)
Artificial Neural Networks V
Multilayer Perceptron
Perceptrons organized in several layers
A layer is fully interconnected with the next
layer
All nodes except input node are perceptrons
Feedforward neural network
Uses backpropagation for training
Error propagated back to minimize loss function
![Page 12: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/12.jpg)
Artificial Neural Networks VI
Multilayer perceptron can be used for
non-linear, multiclass classification
![Page 13: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/13.jpg)
Artificial Neural Networks VII
Gradient descent optimization method
for learning weights
![Page 14: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/14.jpg)
Artificial Neural Networks VIII
Complexity has to be accurate
(Occam’s razor)
Schapire 2004
![Page 15: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/15.jpg)
Artificial Neural Networks IX
Schapire 2004
![Page 16: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/16.jpg)
Artificial Neural Networks X
For building an accurate classifier:
Enough training examples
Good performance on training set
Classifier that is not too complex,
overfitting
Allows to get approximate solutions for
very complex problems
Support Vector Machines (SVM) are a
much simpler alternative to ANN
![Page 17: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/17.jpg)
Deep learning I
Deep learning
No clear distinction to shallow learner
Multiple layers of non-linear processing
units
Each layer represents features at a higher
level
Forms a hierarchical representation
Majority of deep learners are aNN
![Page 18: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/18.jpg)
Deep learning II
Deep learning neural networks
Uses Rectified Linear Unit (ReLU)
Learn faster
Half-wave rectifier
f(z) = max(z, 0)
Use backpropagation for adjusting the
weights
![Page 19: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/19.jpg)
Deep learning III - ConvNet
LeNet 2015
![Page 20: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/20.jpg)
Deep learning IV - ConvNet
Convolutional neural networks
Inspired by the animal visual cortex
Visual cortex is the most powerful visual
processing system in existence
Typically two stages:
Convolutional stage
Pooling stage
Characterized by
sparse connectivity
shared weights
![Page 21: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/21.jpg)
Deep learning V - ConvNet
Shared weights
Subsets share weights and bias to form
feature map
Replicated across entire visual field
![Page 22: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/22.jpg)
Deep learning VI - ConvNet
Each layer accepts 3D input vector and
transforms it into a 3D output vector
Filters activate when specific feature is
mapped
CS231n 2015
![Page 23: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/23.jpg)
Deep learning VII - ConvNet
Receptive field spans all feature maps
LeNet 2015
![Page 24: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/24.jpg)
Deep learning VIII - ConvNet
MaxPooling
Non-linear down-sampling
Partitions input into non-overlapping
rectangles
Outputs maximum value for each sub-
region
Minimizes computation for next layer
Reduces dimensionality of intermediate
representations
![Page 25: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/25.jpg)
Deep learning IX - ConvNet
Convolutional and sampling sublayers
UFLDL 2015
![Page 26: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/26.jpg)
Deep learning X - ConvNet
Image cascading max-pooling with
convolutionary layer
Similar to edge detector
![Page 27: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/27.jpg)
Deep learning XI - RNN
Recurrent neural networks
Contain directed cycles
Take sequences as input, no fixed size
input and output vectors, e. g. natural
speech
![Page 28: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/28.jpg)
Deep learning XII - RNN
No fixed size of computations
Much simpler than ConvNets
Maintain inner state exhibiting dynamic
temporal behavior
Optimized through backpropagation
Can be extended with long time memory
extensions
Don’t necessary need sequences of inputs
![Page 29: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/29.jpg)
Deep learning XIII - RNN
Training RNN is a non-linear global
optimization problem
Trained using stochastic gradient descent
Non-linear, differentiable activation
function, e. g. rectifier
Trained through backpropagation through
time (BPTT)
Genetic algorithms can be used for training
![Page 30: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/30.jpg)
Deep learning XIV - RNN
Many different architectures for RNN
Elman SRN Spiking neural network
![Page 31: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/31.jpg)
Deep learning XV - RNN
RNN learns to read house numbers
RNN learns to paint house numbers
Karpathy 2015
![Page 32: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/32.jpg)
Deep learning XVI - RNN
RNN used for
Transcribe speech to text
Voice synthetization
Machine translation
![Page 33: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/33.jpg)
Deep learning XVII
Combining ConvNets and RNN for
image descriptions
Regions described
using language as
label space using
ConvNet
Language synthesizing
using RNN
Karpathy & Fei-Fei 2014
![Page 34: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/34.jpg)
Deep learning XVIII
ConvNet and RNN can be combined
Automated caption generation
![Page 35: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/35.jpg)
Deep learning XIX
Automatic feature extraction
No closed vocabulary set
Alignment of segments of sentences to
region on the image
Karpathy & Fei-Fei 2014
![Page 36: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/36.jpg)
Deep learning XX
Other applications
Object recognition
Movie classification
Handwriting recognition
Record linkage
![Page 37: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/37.jpg)
Challenges I
Main disadvantage large volumes of
training data needed
Overfitting if not enough training data
Optimization difficult
Finding relevant information
Privacy preservice data mining
![Page 38: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/38.jpg)
Challenges II
Describing actions
![Page 39: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/39.jpg)
Discussion
Future research in
Attention based models
Finding relevant information
Data democratization and Internet of
Things
Unsupervised learning
Semantic data modeling
Reasoning
![Page 40: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/40.jpg)
Thank you for the attention
Questions?
![Page 41: Multimedia data mining using deep learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/5876be271a28abad1a8b7571/html5/thumbnails/41.jpg)
References
Zhao, X, Li, X & Zhang, Z 2015, 'Multimedia Retrieval via Deep Learning to Rank ', IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1487 -
91 <http://ieeexplore.ieee.org.ezproxy.usq.edu.au/xpls/abs_all.jsp?arnumber=7054452>.
Yu, W, Zhuang, F, He, Q & Shi, Z 2015, 'Learning deep representations via extreme learning machines', Neurocomputing, vol. 149, Part A,
pp. 308-15, <http://www.sciencedirect.com/science/article/pii/S0925231214011461>.
Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhutdinov, R, Zemel, R & Bengio, Y 2015, 'Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention', Proceedings of the 32nd International Conference on Machine Learning from Data: Artificial Intelligence
and Statistics, vol. 37.
Xin, J, Wang, Z, Qu, L & Wang, G 2015, 'Elastic extreme learning machine for big data classification', Neurocomputing, vol. 149, Part A, pp.
464-71, <http://www.sciencedirect.com/science/article/pii/S0925231214011503>.
Weston, J, Chopra, S & Bordes, A 2015, 'Memory Networks', in 3rd International Conference on Learning Representations: proceedings of
the3rd International Conference on Learning Representations San Diego, viewed <http://arxiv.org/pdf/1410.3916v10.pdf>.
Weilong, H, Xinbo, G, Dacheng, T & Xuelong, L 2015, 'Blind Image Quality Assessment via Deep Learning', Neural Networks and Learning
Systems, IEEE Transactions on, vol. 26, no. 6, pp. 1275-86.
Wang, Y, Li, D, Du, Y & Pan, Z 2015, 'Anomaly detection in traffic using L1-norm minimization extreme learning machine', Neurocomputing,
vol. 149, Part A, pp. 415-25, <http://www.sciencedirect.com/science/article/pii/S0925231214011382>.
Vinyals, O, Toshev, A, Bengio, S & Erhan, D 2015, 'Show and Tell: A Neural Image Caption Generator', Google,
<http://arxiv.org/pdf/1411.4555v1.pdf>.
Noda, K, Yamaguchi, Y, Nakadai, K, Okuno, H & Ogata, T 2015, 'Audio-visual speech recognition using deep learning', Applied Intelligence,
vol. 42, no. 4, pp. 722-37, <http://dx.doi.org/10.1007/s10489-014-0629-7>.
Mao, W, Zhao, S, Mu, X & Wang, H 2015, 'Multi-dimensional extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 160-70,
<http://www.sciencedirect.com/science/article/pii/S0925231214011540>.
Liu, X, Wang, L, Huang, G-B, Zhang, J & Yin, J 2015, 'Multiple kernel extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 253-
64, <http://www.sciencedirect.com/science/article/pii/S0925231214011199>.
LeCun, Y, Bengio, Y & Hinton, G 2015, 'Deep learning', Nature, vol. 521, no. 7553, pp. 436-44, <http://dx.doi.org/10.1038/nature14539>.
Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R 2014, 'Dropout: a simple way to prevent neural networks from
overfitting', J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-58.
Karpathy, A & Fei-Fei, L 2014, 'Deep visual-semantic alignments for generating image descriptions', arXiv preprint arXiv:1412.2306.