GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData...
Transcript of GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData...
![Page 1: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/1.jpg)
S9391 GstCUDA: Easy GStreamer
and CUDA Integration
Eng. Daniel Garbanzo MSc. Michael GrünerGTC March 2019
![Page 2: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/2.jpg)
Agenda
About RidgeRun
GStreamer Overview
CUDA Overview
GstCUDA Introduction
Application Examples
Performance Statistics
GstCUDA Demo on TX2
Q&A
2
![Page 3: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/3.jpg)
● US Company - R&D Lab in Costa Rica
● 15 years of experience
● Embedded Linux and GStreamer experts
● Custom multimedia solutions
● Digital signal/image processing
● AI and Machine Learning solutions
● System optimization: CUDA, GStreamer, OpenCL, OpenGL, OpenVX, Vulkan
● Support for embedded and resource constrained systems
● Professional services, dedicated teams and specialized tools
About Us
3
![Page 4: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/4.jpg)
● Complex multimedia applications require a lot of processing resources● GStreamer offers a flexible way for creating multimedia applications
● CUDA offers high performance accelerated processing capabilities
Medical Industry Automotive Industry Smart Devices Computer Vision
4
![Page 5: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/5.jpg)
● Open source framework for audio and video applications
● Based on a pipeline architecture
● Extensible design based on plugins (more than 1000 freely available)
● Automatic format and synchronization handling
● Tools for easy prototyping
Modularity FlexibilityPortability
5
![Page 6: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/6.jpg)
● Each plugin represents a different processing module
● The plugins are linked and arranged in a pipeline
● Freedom to build arbitrary pipelines for different applications6
Basic MP4 player GStreamer Pipeline
![Page 7: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/7.jpg)
Modular design lets you change your application easily!
7
Easily change your application end use
Easily change from SW to HW accelerated processing
![Page 8: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/8.jpg)
Code equivalent :gst-launch v4l2src ! videoconverter ! omxh265enc ! mpegtsmux ! udpsink
Code equivalent :gst-launch v4l2src ! videoconverter ! x265enc ! mpegtsmux ! filesink
Modular design lets you change your application easily!
8
![Page 9: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/9.jpg)
9
![Page 10: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/10.jpg)
GstCUDA
10
![Page 11: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/11.jpg)
GstCUDA
11
![Page 12: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/12.jpg)
What Does GstCUDA Solve?
12
![Page 13: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/13.jpg)
●●●
Integration Complexities
13
![Page 14: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/14.jpg)
Development Time
Without GstCUDA
WithGstCUDA
3 Months 10 days 5 days
Create GStreamer plugin with CUDA support
Generate CUDA algorithm
Integrate CUDA algorithm
10 days 0.1 day
Generate CUDA algorithm
Integrate CUDA algorithm
Total = 3.5 months
Total = 10.1 days
● Reduce development time
● Focus on the CUDA logic
● Minimize time to market
14
![Page 15: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/15.jpg)
●●●
Memcpy Memcpy
Performance Bottleneck
15
![Page 16: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/16.jpg)
Performance BottleneckWithout GstCUDA With GstCUDA
● Efficient memory handling improves performance
● Up to 2x 4K@60fps
● Data transfers bottleneck cause poor performance
● Limited framerate at high resolutions 16
![Page 17: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/17.jpg)
Supported Platforms● Focused for NVIDIA Embedded Platforms
Jetson TX1, TX2, TX2i and Nano Jetson AGX Xavier
17
![Page 18: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/18.jpg)
GstCUDA Key Features
18
![Page 19: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/19.jpg)
GstCUDA Key Features
19
![Page 20: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/20.jpg)
Framework Overview
20
![Page 21: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/21.jpg)
Quick Prototyping Elements
21
![Page 22: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/22.jpg)
location = median_filter.so
Cudafilter Element
22
![Page 23: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/23.jpg)
location = thermal_overlay.so
Cudamux Element
IR
23
![Page 24: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/24.jpg)
CUDA Algorithm Interface● Make your CUDA algorithm compatible by implementing these interfaces
Cudafilter Interface
bool open();
bool close();
bool process (const GstCudaData &inbuf,
GstCudaData &outbuf);
bool process_ip (const GstCudaData
&inbuf, GstCudaData &outbuf);
bool open();
bool close();
bool process (vector<GstCudaData>
&inbufs, GstCudaData &outbuf);
bool process_ip (vector<GstCudaData>
&inbufs, GstCudaData &outbuf);
Cudamux Interface
24
![Page 25: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/25.jpg)
Buffer Processing Methods
process_ip(In place)
process(Not in place)
25
![Page 26: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/26.jpg)
Create Your Custom Element
• •
● Some applications may require specialized elements● GstCUDA provides bases classes to simplify development
26
![Page 27: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/27.jpg)
GstCUDA Framework Usage Example
●
27
![Page 28: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/28.jpg)
GstCUDA Framework Summary
● Utils to handle memory interfaces
● GStreamer Unified Memory allocators
● Parent classes for different topologies
● The framework includes:
● Generic elements to evaluate custom algorithms
● Runtime loading of CUDA algorithms
● Complete GstCUDA element boilerplate
● CUDA algorithms for the prototyping elements
GstCUDA API Quick prototyping elements Set of examples
28
![Page 29: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/29.jpg)
GstCUDA Application Areas Examples Video
29
![Page 30: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/30.jpg)
Industrial Applications: Border Enhancement
30
![Page 31: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/31.jpg)
Automation Applications: Hough Transform
31
![Page 32: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/32.jpg)
Security Applications: Motion Detection/Estimation
32
![Page 33: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/33.jpg)
Performance Statistics
33
![Page 34: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/34.jpg)
Varying Algorithm / Fixed Image Size
● Image convolution algorithm
● Stressing compute capabilities
● Variable convolution kernel size
● 1080p@240fps / 1080p@60fps stream input
● Cudafilter element
● Unified Memory allocator
● Jetson TX2 platform
● Not In-place
Test Conditions
location = convolution.so
34
![Page 35: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/35.jpg)
Varying Algorithm / Fixed Image Size
Framerate Stats
35
![Page 36: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/36.jpg)
Varying Algorithm / Fixed Image Size
Processing Time Stats
36
![Page 37: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/37.jpg)
Varying Algorithm / Fixed Image Size
CPU Load Stats GPU Load Stats
37
*baseline = simple capture pipeline (without GstCUDA)
![Page 38: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/38.jpg)
Fixed Algorithm / Varying Image Size
● Memory copy algorithm
● Stressing data transfer
● Variable input resolution
● Cudafilter element
● Unified Memory allocator
● Jetson TX2 platform
● In-place vrs not In-place
Test Conditions
location = memcpy.so
38
![Page 39: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/39.jpg)
Fixed Algorithm / Varying Image Size
Framerate Stats
39
Note: Maximum Framerate limited to 245 fps by the video source
![Page 40: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/40.jpg)
Fixed Algorithm / Varying Image Size
Processing Time Stats
40
![Page 41: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/41.jpg)
Fixed Algorithm / Varying Image Size
CPU Load Stats GPU Load Stats
41*baseline = simple capture pipeline (without GstCUDA)
![Page 42: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/42.jpg)
Fixed Algorithm / Varying Image Size
● Simple image mixing algorithm
● Stressing data transfer
● Variable input resolution
● Cudamux element
● Unified Memory allocator
● In-place=True
● Jetson TX2 platform
Test Conditions
location = mixer.so
42
![Page 43: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/43.jpg)
Fixed Algorithm / Varying Image Size
Framerate Stats
43
Note: Maximum Framerate limited to 240fps by the video source
![Page 44: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/44.jpg)
Fixed Algorithm / Varying Image Size
CPU Load Stats GPU Load Stats
44*baseline = simple capture pipeline (without GstCUDA)
![Page 45: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/45.jpg)
GstCUDA Live Demo on Jetson TX2Sobel Filter 1080p60fps
45
gst-launch-1.0 nvcamerasrc sensor-id=2 fpsRange=60,60 ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1,format=I420" ! nvvidconv ! "video/x-raw" ! queue ! cudafilter in-place=false location=/borders.so ! queue ! nvoverlaysink
Code equivalent :
![Page 46: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip](https://reader034.fdocuments.net/reader034/viewer/2022050515/5f9fb0b26b80bf4173241c87/html5/thumbnails/46.jpg)
● GstCUDA wiki page:
○ gstcuda.ridgerun.com
● RidgeRun Website:
○ ridgerun.com
● RidgeRun Contact:
○ ridgerun.com/contact
Resources
46