CUDA & CAFFE
-
Upload
andrew-babiy -
Category
Software
-
view
1.764 -
download
7
Transcript of CUDA & CAFFE
CUDA & CAFFE
Использование CUDA и CAFFE для создания глубоких нейронных сетей
Babii A.S. - [email protected]
Why we need to learn methods of ‘deep learning’
Deep learning for image recognition tasks
Image classification
Object detection and localization
Object class segmentation
Problems related with dataset size
What if we have a large dataset?
What about types of parallel computing?
GPU - specificCPU - specific
1. Saman Amarasinghe, Matrix Multiply, a case study – 2008.
Optimization table for matrix multiplication[1]
If no parallelization, but we want to make it faster
1. Use profiler(gprof, valgrind, … )
2. Does application using BLAS?
3. Use vector or matrix form of data representation and include BLAS
4. SIMD – if no other way… use it for maximum perfomance on 1 core
Бабий А.С. - [email protected]
How to make it parallel?.
1. KML, PBLAS, ATLAS
2. Когда CPU Multicore эффективнее GPU ?
3. NVIDIA CUDA.
4. OpenCL
CUDA
Deep convolutional neural networks, CAFFE implementation
ConvNet configuration by Krizhevsky [2]
Deep convolutionnetwork example
Convolution Neural Network Architecture Model[3]
Feature maps
http://www.songho.ca/dsp/convolution/convolution.html
Convolution & pooling
Набор примитивов для сетей Deep Learning
1. Сверточный слой2. Слой фильтрации3. Обобщающий слой
Интеграция с Caffe
24-core Intel E5-2679v2 CPU @ 2.4GHz vs K40, NVIDIA
Feature maps
Feature map [4]
Накладываем друг на друга но, с «коэффициентом прозрачности»
Библиотеки для работы с deep learning
Caffe – deep convolutional neural network frameworkhttp://caffe.berkeleyvision.org ConvNetJS – JS based deep learning frameworkhttp://cs.stanford.edu/people/karpathy/convnetjs/DL4J - Java based deep learning frameworkhttp://deeplearning4j.org/Theano – CPU/GPU symbolic expression compiler in pythonhttp://deeplearning.net/software/theanoCuda-Convnet – A fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networkshttp://code.google.com/p/cuda-convnet/Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in luahttp://www.torch.ch/Accord.NET - C# deep learninghttp://accord-framework.net/, tutorial:http://whoopsidaisies.hatenablog.com/entry/2014/08/19/015420
http://deeplearning.net/software_links/
Работа с CAFFE
Начинать лучше с утилит командной строки:
build/tools
Наиболее доступный пример на базе MNIST – распознавания рукописных цифр
http://caffe.berkeleyvision.org/gathered/examples/mnist.html
cd $CAFFE_ROOT./data/mnist/get_mnist.sh./examples/mnist/create_mnist.sh
cd $CAFFE_ROOT./examples/mnist/train_lenet.sh
В каком виде подаются входные и выходные данные?
- databases (LevelDB or LMDB)
- directly from memory
- from files on disk in HDF5
- common image formats.
http://symas.com/mdb/ http://leveldb.org/
Input data
Output data
-snapshot file with mode
-snapshot file with solver state
Solver? Yes, we can continue breacked training from snapshot
Виды слоев CAFFE
Caffe stores and communicates data in 4-dimensional arrays called blobsname: "LogReg"layers { name: "mnist" type: DATA top: "data" top: "label" data_param { source: "input_leveldb" batch_size: 64 }}layers { name: "ip" type: INNER_PRODUCT bottom: "data" top: "ip" inner_product_param { num_output: 2 }}layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip" bottom: "label" top: "loss"}
Виды слоев
Convolutional layerRequired field num_output (c_o): the number of filters kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Pooling layerRequired kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Loss Layers, Activation / Neuron Layers, Data Layers, Common Layers
How to configure?
Ready to use models in folder: examples
Решение своей задачи
1. Заботимся о корректности, размере и покрытии выборок.
2. Компилируем Caffe с поддержкой GPU.
3. Конфигурируем сеть, отталкиваясь от примеров.
4. Тренируем, смотрим на результат тестовой выборки.
5. Если результат не устраивает- настраиваем и тренируем до получения достаточного результата
6. Для использования натренированной сети для одиночныхИзображений необходимо написать конфиг и воспользоваться C++, Python или Mathlab.
References
1. L. Deng and D. Yu, "Deep Learning: Methods and Applications“ http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-
SIG-039.pdf2. ConvNet configuration by Krizhevsky et alhttp://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf3. Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster http://parse.ele.tue.nl/education/cluster24. http://www.cs.toronto.edu/~ranzato/research/projects.html5. http://www.amolgmahurkar.com/classifySTLusingCNN.html
Спасибо за внимание !