Learning Spatiotemporal Features with 3D Convolutional...
Transcript of Learning Spatiotemporal Features with 3D Convolutional...
![Page 1: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/1.jpg)
LearningSpatiotemporalFeatureswith
3DConvolutionalNetworksDuTran,LubomirBourdev,RobFergus,LorenzoTorresani,ManoharPaluri
![Page 2: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/2.jpg)
EffectiveVideoDescriptor
• Generic– Canrepresentdifferenttypes
• Compact– Processing,storage
• Efficient– computation
• Simple– implementation
![Page 3: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/3.jpg)
3DConvolutionandPooling
• 3DConvolutionisbetterthan2DConvolutiontomodeltemporalinformation.– 2DCONV:performedonlyspatially,losetemporalinformation.
– 3DCONV:performedspatio-temporally,preservetemporalinformation.
• Samephenomenaisapplicableforpooling.
![Page 4: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/4.jpg)
2DConvolutionOn1-chInput
• Result:2DImage.
![Page 5: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/5.jpg)
2DConvolutionOnn-chInput
• Result:2DImage.
![Page 6: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/6.jpg)
3DConvolutionOnn-chInput
• Result:Volume
![Page 7: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/7.jpg)
IdentifyBestArchitectureFor3DConvNets(OnUCF101)
• Commonnetworksettings– Allvideoframesresizedinto128x171.– Videosaresplitintonon-overlapped16frameclip.– Input:3x16x128x171.– 5ConvolutionandPoolinglayer– 2FullyConnectedlayer– SoftmaxLosslayertopredictactionlabels
![Page 8: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/8.jpg)
IdentifyBestArchitectureFor3DConvNets(OnUCF101)
• VaryingNetworkArchitecture– Homogeneoustemporaldepth.• Depth–dfor1,3,5,7
– Varyingtemporaldepth.• Increasing:3-3-5-5-7• Decreasing:7-7-5-5-3-3
![Page 9: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/9.jpg)
3DConvolutionKernelTemporalDepthSearch
![Page 10: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/10.jpg)
SpatiotemporalFeatureLearning
• BestNetworkArchitecture–With3x3x3kernel
![Page 11: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/11.jpg)
SpatiotemporalFeatureLearning
• Datasetfortraining– Sports1MDataset• Largestvideoclassificationbenchmark• 1.1millionsportsvideos• 487categories
![Page 12: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/12.jpg)
Sports1MClassificationResults
![Page 13: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/13.jpg)
C3DVideoDescriptor
• C3DModelcanbeusedasafeatureextractorforvariousvideoanalysistasks.– Actionrecognition– Actionsimilarity– SceneandObjectrecognition
• Usingwithfc6activations– 4096dimension
![Page 14: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/14.jpg)
ActionRecognition
• Dataset:UCF101– 13.320video– 101humanaction
![Page 15: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/15.jpg)
ActionSimilarityLabeling
• Dataset:ASLAN– 3,631video– 432actionclass
![Page 16: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/16.jpg)
SceneObjectRecognition
• Dataset:YUPENN– 420video– 14scene
• Dataset:Maryland– 130video– 13scene
![Page 17: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/17.jpg)
WhyC3DFeatures?
• Generic• Compact• Efficient• Simple
Visualisation using t-SNE method:
L. van der Maaten and G. Hinton. Visualizing data using t-sne. JMLR
![Page 18: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/18.jpg)
WhatDoesC3DLearn?
Using deconvolution method in M. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014
![Page 19: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/19.jpg)
UsefulLinks
• http://vlg.cs.dartmouth.edu/c3d/• https://github.com/facebook/C3D
![Page 20: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/20.jpg)
Tools and software required:
- keras- tensorflow- ffmpeg(compiled form source)- opencv(compiled from source)
![Page 21: Learning Spatiotemporal Features with 3D Convolutional ...faculty.iitmandi.ac.in/~aditya/cs671/cs671_2017/data/Lect23.pdf · Learning Spatiotemporal Features with 3D Convolutional](https://reader033.fdocuments.net/reader033/viewer/2022060211/5f04c1757e708231d40f8b9d/html5/thumbnails/21.jpg)
Thank you