Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables...

19
Common Accelerator Framework Warpdrive Update - BKK19-401 Zhangfei Gao, Linaro 2019.04

Transcript of Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables...

Page 1: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Common Accelerator Framework Warpdrive Update - BKK19-401Zhangfei Gao, Linaro2019.04

Page 2: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Agenda● Background● Target: Provide Accelerator● Investigation● Warpdrive● Performance

Page 3: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Background1. More and more hardware accelerators, such as

compressors/decompressors, encryptors/decryptors, and AI engines, are introduced to the market. Most of them need to be used in user space. We need software infrastructure to support these applications.

2. This is important especially to ARM-base solution, because ARM is good for domain specific customizing

Page 4: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

General platform

CPUs

Non-sva capable devices

Discrete devices

Integrated devices

memory

MM

U

IOMMU

Root Complex

Eg: Legacy Devices

Eg: PCIE attached Devices

Eg: Processor Accelerators

Page 5: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

TargetKunpeng920

zip/gzip

HiAccQM

On chip pcie interface

Accelerator Engine

App directly call accelerator engine

1024 queues, support up to 1024 process.

sr-iov : support pf & 63 vfs

Accelerators:zip/gzip, hpre, SM3/4, sec, poe

Page 6: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Investigation #1 no-iommu

CPU device

Host/Physical memory

VA

PAPA

VA

CPU page tables

CPU page tables

dma_alloc_coherent

dma_mmap_coherent

no-iommu

Limitations:a. Continuous memory

requiredb. Reserve memory (cma)

maybe required

Page 7: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Investigation #2: iommu

CPU device

Host/Physical memory

IOVA

PAPA

VA

CPU page tables

IOMMU page tablesiommu

Limitations:a, IOMMU_DOMAIN_UNMANAGED mode has to be used, to solve iova conflict with vfio, so dma_api can not be used.b. Multi-process does not support.

Page 8: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Investigation #3: iommu SVA

CPU device

Host/Physical memory

VA

PAPA

VA

CPU page tables

CPU page tablesIommu: SVA

Pro:a. Iommu directly use cpu page

tables so user space address can be recognized by kernel.

b. Malloc buffer can be used by kernel dma since device page fault.

c. Multi-process support since pasidd. Kernel dma_api can be used

where cd=0 is used.

Page 9: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Shared Virtual Address (SVA)

CD table

ssv = 0

ssv = 1 & ssid = x

STEkernel io-pgtable

process io-pgtable

DMA access to control queue are performed with ssv=0DMA access to the data queue are performed with ssv=1 & ssid = x (pasid)

Page 10: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Shared Virtual Address (SVA)

CD

Ctx table

STE

Stream tables

PTE

Stream tables

SID SSID IOVA IPA

SVA native enabling on ARM platform (Jean-Philippe Brucker, ARM)https://lkml.org/lkml/2019/2/20/518

SID: Stream ID, identifies a deviceSSID: substream id, identifies an address space (pasid)

Page 11: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Warpdrive

1. Accelerator framework for user space application. Proposed by Kenneth Lee from Hisilicon ([email protected]).

2. Includes kernel (uacce) and user (warpdrive lib) facilities.3. Based on iommu, protects kernel and other application by setting boundary to the hardware

access range.4. Especially using iommu-sva feature, maintaining unified address space between the

process and hardware context.5. Multi-process & multi-queue support.6. Compatibility (no-iommu, no-sva capable, sva capable)7. SR-IOV support, vfio-pci for virtual machine

Page 12: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Warpdriveapp zip app

library

uacce

SM3/4 hpre zip SEC POE

crypto

qm

get/put_queue

sys interfacechrdev interface

register

helper

Page 13: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Warpdrive kernel

1. Current Statusa. RFCV3 was sent by Kenneth directly using iommu interface

https://lkml.org/lkml/2018/11/12/1951b. Support no-iommu, non-sva capable, sva-capablec. Jean-Philippe Brucker sva v4 patch has been verified, using platform

device stall mode.i. https://lkml.org/lkml/2019/2/20/518ii. git://linux-arm.org/linux-jpb.git sva/current

d. Support zip/gzip, hpre

2. Plan a. In supporting: SM3/4, SEC, POE

Page 14: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Warpdrive user

1. Current statusa. Support zip/gzip, hpreb. Support multi-queuec. Support async mode, batch processing

2. Plan:a. OpenSSL interfaceb. Provide patches to compatible with zlib, switch to builtin zlib if not

found hardware

Page 15: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Challenge

1. SVA patches still in review https://lkml.org/lkml/2019/2/20/518a. SVA only sharing stage-1 page tables with the CPU, not support

sharing stage-2 yet.b. SVA support PF, but still not consider VFs.c. Platform devices using stall features need quirks.

Page 16: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Performance

cpu no-iommu iommu: non-sva iommu: sva iommu: sva(2q)

real(s) 1.91s 0.01s 0.01s 0.02s 0.01s

user(s) 1.91s 0.00s 0.00s 0.00s 0.00s

sys(s) 0.00s 0.01s 0.01s 0.01s 0.01s

speed(M/s): 12.565 2400 2400 1200 2400

cmd: time gzip <data.super> del

time ./test_hisi_zip -g <data.super> dst

size: data.super 24M

Page 17: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Performance

Page 18: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Welcome to joinKernel:https://github.com/Kenneth-Lee/linux-kernel-warpdrive.gitUser:https://github.com/Kenneth-Lee/warpdrive.git

Page 19: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used

Demo

Offline demo

[email protected]