FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM...
Transcript of FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM...
![Page 1: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/1.jpg)
Porting Nouveau to Tegra K1How NVIDIA became a Nouveau contributor
Alexandre Courbot, NVIDIAFOSDEM 2015
![Page 2: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/2.jpg)
In 2014 NVIDIA released the Tegra K1 SoC● 32 bit quad-core or 64-bit dual
core ARM● 192-cores low-power Kepler
GPU (OpenGL 4.3, CUDA)● Desktop Kepler already
supported by Nouveau
2014/02/01: NVIDIA to contribute Nouveau GK20A support
The Story So Far...
![Page 3: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/3.jpg)
(Incomplete) Credits
NVIDIANs:Thierry RedingTerje BergströmGregory RothVince Hsu
Ken AdamsLauri PeltonenStephen WarrenMark Zhang
… and the whole Nouveau community!
![Page 4: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/4.jpg)
GK20A/Nouveau overview
Nouveau bringup on Tegra K1
Challenges with memory management
Engines layout on Tegra
User-space (Mesa)
Outline
![Page 5: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/5.jpg)
GK20A Overview
Fully-featured Kepler part with unified shaders and per-process virtualization of the GPU
● Each process gets its own GPU context
● Memory virtualized per-context
● Graphics jobs submitted by user-space using pushbuffers
![Page 6: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/6.jpg)
Nouveau Architecture
Supports GPUs from Riva TNT (1998) to Maxwell (2014)● Extremely modular● GPU literally an assembling of engines and sub-devices
Supporting GK20A means● Finding/writing engines/subdevs for the chip● Allowing Nouveau to run on Tegra
![Page 7: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/7.jpg)
Platform Bus Support
Nouveau expects the GPU to be on a PCI bus● Provides GPU registers & BARs I/O addresses● pci_map_page() used to map system RAM to GPU
Abstract the bus and add platform bus support● I/O addresses provided by Device Tree● Replace deprecated pci_map_page() with DMA API→Nouveau can be instantiated from PCI or Device Tree
![Page 8: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/8.jpg)
No VBIOS
Video BIOS provides useful information (e.g. voltage tables for DVFS) and also performs critical initialization● Alternate way to provide power information via per-
chip static tables● Perform necessary initialization for GK20A in-driver
![Page 9: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/9.jpg)
No VRAM
GK20A has no video memory of its own● GPU is a direct client of Tegra’s Memory Controller● Free and direct access to system memory● Huge consequences for the driver
![Page 10: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/10.jpg)
Address Translation on Desktop Kepler
System RAM BAR1
CPU VA
CPU PA
System RAM
GPU VA
Video RAM
PCIe Bus
![Page 11: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/11.jpg)
Nouveau Memory Model
● 2 allocation targets:● VRAM● TT (system memory mapped to GPU)
● Target specified at buffer creation time● Coherency maintained thanks to BAR1 (for VRAM) and
PCIe (for TT)
![Page 12: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/12.jpg)
Address Translation on Mobile Kepler
System RAM
CPU VA
CPU PA
GPU VA
System RAM
IOMMU VA
BAR1
![Page 13: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/13.jpg)
Mobile Kepler Memory Model
● No more dedicated video memory● All allocations in system memory
● Not a carve out!● No coherency between CPU and GPU
● Must flush/invalidate CPU cache ourselves
![Page 14: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/14.jpg)
Living Without VRAM
How to handle VRAM allocations?● Emulate VRAM?
● Sub-optimal memory management● Dismiss VRAM allocations altogether?
● Requires more changes in the kernel & Mesa
Decision taken to not use a RAM device for GK20A● Better reflects reality, simplifies memory management● User-space needs to be aware of no-VRAM devices
![Page 15: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/15.jpg)
IOMMU introduces a second level of address translation● Useful to “flatten” context objects
● Instance blocks, PGTs, etc.● Also allows to maximize large page usage on the GPU
● IOMMU more efficient than GPU MMU
Using IOMMU
GPU VA
RAM
IOMMU VA
GPU VA
RAM
![Page 16: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/16.jpg)
CPU/GPU Coherency
● Handled transparently by PCIe for desktop● No such thing on Tegra: explicitly flush/invalidate
buffer objects (DMA API)● New flag for objects that must always be coherent
● Fences, GPFIFOs● ARM makes things more difficult
● A memory page cannot be mapped twice with different attributes
● Kernel already maps lowmem (first 760MB) cached● Cannot remap this memory with uncached attribute
![Page 17: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/17.jpg)
Multiple CPU Mappings Coherency
How to address the coherency issue?● Use GPU path when writing coherent buffers
● PRAMIN window (slow)● BAR1 (relatively scarce resource)
● Allocate coherent buffers using DMA API● dma_alloc_coherent() can fix the lowmem mapping● end up with permanent kernel mapping
![Page 18: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/18.jpg)
Engines Layout
Nouveau
VRAM
GR DISPENC ...
Discrete GPU
tegra-mc
V4L2 TegraDRMNouveau
System RAM
GR DISPENC
Tegra
● GeForce GTX 680 (GK104) provides a graphics engine (GR), display controllers, 3 copy engines, video decoder, video encoder, VRAM, ...
● GK20A only includes a graphics engine● Other functions already provided by different Tegra IPs
![Page 19: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/19.jpg)
Engines Layout
PRIME support is critical for this setup● Export required to display GPU buffers
Tegra K1 perfect fit for render-nodes● card0 (tegradrm) is the display device● renderD128 (nouveau) is the render device→ requires support at application or Mesa level
![Page 20: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/20.jpg)
Who Should Provide Memory?
The first driver in the chain?
A neutral allocator? (e.g. ION)
Why should each driver have its own allocator?
How to handle different engines capabilities?
tegra-mc
System RAM
Nouveau
GR
TegraDRM
DISP
V4L2
ENC
V4L2
CAM
![Page 21: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/21.jpg)
User-space (Mesa) changes
~25 LoC changed to recognize GK20A… and Mesa fully works
Some work required to avoid VRAM allocations
Some more work to integrate seamlessly with tegradrm?
![Page 22: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/22.jpg)
Conclusion
GK20A close to work out-of-the-box with NouveauRemaining tasks:● Firmware distribution● A few more kernel and Mesa patches pending
Great experience working with the Nouveau community● Plans to keep contributing support for future Tegra
SoCs
![Page 23: FOSDEM 2015 Alexandre Courbot, NVIDIA Porting Nouveau to ... · Alexandre Courbot, NVIDIA FOSDEM 2015. In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core](https://reader033.fdocuments.net/reader033/viewer/2022050606/5fad6bb010203359a83149c7/html5/thumbnails/23.jpg)
Thank you!
https://github.com/NVIDIA/tegra-nouveau-rootfs