HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

22
HW-Accelerated HD video HW-Accelerated HD video playback under playback under Linux Linux Zou Nan hai Zou Nan hai Open Source Technology Open Source Technology Center Center

Transcript of HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

Page 1: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

HW-Accelerated HD video HW-Accelerated HD video playback underplayback under

LinuxLinux

Zou Nan haiZou Nan haiOpen Source Technology Open Source Technology CenterCenter

Page 2: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

2

3D

EU Kernel

Media EngineMedia Engine

URB

Media (Video Front End)

Command Streamer

Thread Spawner

Thread Dispatcher

Indirect data

Th

read

p

aylo

ad

Video memory

Data port

Sampler

Page 3: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

3

Mode of operationMode of operation

Coded

data

Output pixelMCIDCT

VLD IS

IQ

VFE or

host

EU Kernels

Page 4: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

4

Current XVMC implementationCurrent XVMC implementation

coded data

Output pixelMCIDCT

VLD IS

IQ

Host Softwar

eper slice data

per macrobloc

k data

EU Kernels

Page 5: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

5

XVMCXVMC

XVMC lib

Media Application

DRI interface

X Server

Graphic Hardware

render , sync, resource management

mpeg stream

decode slice of macro blocks

media commands, video memory management

Page 6: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

6

Video Memory LayoutVideo Memory Layout

command stream

VFE state

Interface descriptors

media surface

EU kernel Instruction

media object

command

selected interface

media pointer

command

media surface

media surface

surface state

surface state

surface state

binding tables

flush command

Page 7: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

7

Execute Unit introductionExecute Unit introduction SIMD code (variable execute size up to 16) SIMD code (variable execute size up to 16)

with prediction and control mask.with prediction and control mask. Float and integer data typeFloat and integer data type Region based direct and indirect register Region based direct and indirect register

addressingaddressing Support scalar and immediate source Support scalar and immediate source

operandoperand

Page 8: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

8

EU RegistersEU Registers GRF (General Register File)GRF (General Register File)

– 256 bits per register (g0, g1, g2, gxx)256 bits per register (g0, g1, g2, gxx)

MRF (Message Register File)MRF (Message Register File)– 256 bits per register (m0, m1, m2, mx), write only,256 bits per register (m0, m1, m2, mx), write only,

– Used to pass payload from thread to shared Used to pass payload from thread to shared function unit.function unit.

ARF (Architecture Register File)ARF (Architecture Register File)– e.g null, ip and flag registere.g null, ip and flag register

Immediate Immediate – encoded in instructionencoded in instruction

Page 9: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

9

Register RegionRegister Region

6 5 012347

14

13

8910

11

12

15

g0 (256 bits)

Width=8

VertStride=16

HorzStride=2

Type=w

g5.2<16,8,2>w

123456789 0101112131415

g15.3<16,16,1>UB

origin regnum=5, subregnum=2

Regnum.Subregnum<VertStride,Width, HorzStride>Type

012

1

2

0

Page 10: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

10

Data operationData operation

W Z Y X X X X Xregister 0

register 1

register 2

register 3

W Z Y X

W Z Y X

W Z Y X

Y Y Y Y

Z Z Z Z

W W W W

Array of structure

( vertex shader)

Structure of array

( pixel shader and media code)

vecto

r

vector

Page 11: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

11

Instruction sampleInstruction sample

(f0) add.sat(16) g28.0<2>ub g3.0<16, 16, 1>f g10.0<16, 16, 1>w {align1}

execute size type

register number

subregister number

VertStride

HorizStrideWidth Access mode

prediction register

Page 12: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

12

Instruction setInstruction set Normal SIMD instructionsNormal SIMD instructions

– add, mul, avg, mov etcadd, mul, avg, mov etc

– dp3, dp4 etcdp3, dp4 etc

Branch control instructionsBranch control instructions– If,else, do, while, jmpi etcIf,else, do, while, jmpi etc

– branch is needed in media codebranch is needed in media code

Send instructionsSend instructions– communicate with shared function unitscommunicate with shared function units

– media kernel use it to control thread life cycle, read and media kernel use it to control thread life cycle, read and write into surfacewrite into surface

Page 13: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

13

Instruction exampleInstruction example

add.sat(16) g28.0<2>UB g3.0<16, 16, 1>f g10.0<16, 16, 1>W {align1}

X X X X X X XX X X X X X X XX

Y Y Y Y Y Y Y Y

+ + ++ + + ++ + + ++

Y Y Y Y Y Y Y Y

+ + ++

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Zg28

g3

g4

g10

Page 14: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

14

An example Input and outputAn example Input and outputpayload register passed from inline data, x, y, mv, field flags etc

input Y0-Y3

input U

input V

reference Y

reference U

reference V

tmp registers

Result registers, organized in YUV420 format

Indirect data payload

media read from reference surface

media write to destination surface

constant data

Page 15: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

15

Planar data vs Packed dataPlanar data vs Packed data Easy to handle by media kernelEasy to handle by media kernel Hard to apply some filtersHard to apply some filters Can not be directly used as a Can not be directly used as a

sampler source in hardware sampler source in hardware implementationimplementation

Page 16: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

16

Work flowWork flow

B

DCT Data

I

kernel

P P

forward reference

frame

backward reference

frame

kernel

kernel

I P

Indirect data

inline data

Media read message

Media write message

Destination surface

slice of macroblocks

Page 17: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

17

About XvMC APIAbout XvMC API Post processing missing in XvMC API Post processing missing in XvMC API

designdesign

Video output mixer.Video output mixer.

Page 18: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

18

High Level LanguageHigh Level Language Why a high level language for media Why a high level language for media

kernel is preferred ?kernel is preferred ?– Easy to debugEasy to debug– Easy to reuse codeEasy to reuse code– Hide platform details, easy to understand and Hide platform details, easy to understand and

maintainmaintain

Possible choicePossible choice– GLSL is not OKGLSL is not OK– Simple C extension ?Simple C extension ?

Page 19: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

19

H.264H.264 Kernels became much more complex Kernels became much more complex

because of difference MC and DCT because of difference MC and DCT size combination. size combination.

Not suitable on slice level API, Not suitable on slice level API, because of intra prediction.because of intra prediction.

Need schedule and dependency Need schedule and dependency control ability for media threads control ability for media threads because of intra predictionbecause of intra prediction

Page 20: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

20

VAAPIVAAPI picture level API picture level API cover mpeg2 h264 vc1 from different cover mpeg2 h264 vc1 from different

entry pointsentry points post processing and video output post processing and video output

mixer is missingmixer is missing

Page 21: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

21

TODOTODO IDCT code optimizeIDCT code optimize Mpeg2 XVMC VLD extensionMpeg2 XVMC VLD extension VAAPI for mpeg2VAAPI for mpeg2 VAAPI for AVCVAAPI for AVC Video post processing and mixerVideo post processing and mixer

Page 22: HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

22

Q&AQ&AThank You!Thank You!