Efficient Prediction Structure for Multi-view Video Coding

23
Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007

description

Efficient Prediction Structure for Multi-view Video Coding. Philipp Merkle , Aljoscha Smolic Karsten Müller , Thomas Wiegand CSVT 2007. Outline. Multi-view video coding (MVC) introduction Requirements and test conditions for MVC Prediction structures Experimental results Conclusion. - PowerPoint PPT Presentation

Transcript of Efficient Prediction Structure for Multi-view Video Coding

Page 1: Efficient Prediction Structure for Multi-view Video Coding

Efficient Prediction Structure for Multi-view Video Coding

Philipp Merkle, Aljoscha Smolic Karsten Müller,

Thomas Wiegand

CSVT 2007

Page 2: Efficient Prediction Structure for Multi-view Video Coding

OutlineMulti-view video coding (MVC) introductionRequirements and test conditions for MVCPrediction structuresExperimental resultsConclusion

2

Page 3: Efficient Prediction Structure for Multi-view Video Coding

MVC IntroductionMVC: Multi-view Video CodingMulti-view video (MVV): A system that uses

multiple camera views of the same scene is called.

Usage: 3DTV, free viewpoint video(FVV), etc.

3

Page 4: Efficient Prediction Structure for Multi-view Video Coding

Requirements for MVCTemporal random accessView random accessScalabilityBackward compatibilityQuality consistencyParallel processing

4

Page 5: Efficient Prediction Structure for Multi-view Video Coding

Temporal and inter-view correlation

5

T

T

T

temporal/inter-view mixed mode

Inter-view

temporal/inter-view mixed modeTemporal

Page 6: Efficient Prediction Structure for Multi-view Video Coding

Temporal and inter-view correlation analysis

6

H.264/AVC encoder was used with the following settings: Motion compensation block size of 16*16 Search range of ±32 pixels Lagrange parameter (λ) of 29.5

denotes the decrease of the average in comparison to temporal prediction only.J J

Page 7: Efficient Prediction Structure for Multi-view Video Coding

Simply including temporal and inter-view prediction modes

7

Temporal and inter-view correlation analysis (cont’d)

Page 8: Efficient Prediction Structure for Multi-view Video Coding

Lagrangian cost functionLagrangian cost function:

D denotes distortion.R denotes number of bits to transmit all components of

the motion vector.For each block in a picture, algorithm chooses

MV within a search rage that minimizes .

The distortion in the subject macroblock B is calculated by:

8

J D R (1)

argmin ( , ) ( , )i i im D S m R S m (2)

iS imM J

2

( , )

, ( , , ) ( , , )i x y tx y B

D S m s x y t s x m y m t m

(3)

Page 9: Efficient Prediction Structure for Multi-view Video Coding

1D camera: Ballroom, Exit, Rena, Race1, Uli, (line)

Breakdancers (arched) 2D camera: Flamenco2 (cross), AkkoKayo

(array)

Use 5 to 16 camera views Target high quality TV-type video (640*480

or 1024*768) then limited channel communication-type video.

9

Test data and test conditions

Page 10: Efficient Prediction Structure for Multi-view Video Coding

Knowledge – hierarchical B picture, QP cascadingHierarchical B picture, key picture, non-key

picture:

QP cascading : [1]

10

key picture key picture

1 ( 1?4 :1)k kQP QP k

[1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Page 11: Efficient Prediction Structure for Multi-view Video Coding

Knowledge – DPB sizeDecoded Picture Buffer (DPB) size is

increased to: [2]

11

2* _ _ _GOP length number of views

[2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Memory-efficient reordering of multi-view input for compression

Page 12: Efficient Prediction Structure for Multi-view Video Coding

Two tasks1. To adapt the multi-view prediction schemes

to the specific camera arrangements of the test data sets.

2. To adapt the prediction structures to the random access specification.

12

Page 13: Efficient Prediction Structure for Multi-view Video Coding

Prediction structureSimulcast coding structureTo allow synchronization and random access,

all key pictures are coded in intra mode.

13

Page 14: Efficient Prediction Structure for Multi-view Video Coding

Prediction structure (cont’d)The first view is called base view (remains

the I frame).

14

0S

Page 15: Efficient Prediction Structure for Multi-view Video Coding

Prediction structure (cont’d)Alternative structures of inter-view for key

pictures

15

KS_IPP KS_PIP KS_IBP

KS_IPP

KS_PIP

KS_IBP

Linear camera arrangement 2D Camera array

Page 16: Efficient Prediction Structure for Multi-view Video Coding

Prediction structure (cont’d)Inter-view prediction for key and non-key

pictures

16

AS_IPP mode

Page 17: Efficient Prediction Structure for Multi-view Video Coding

Experimental results – objective evaluation

17

Ballroom test result

Average coding gains compared with anchor coding

Page 18: Efficient Prediction Structure for Multi-view Video Coding

Experimental results – subjective evaluationDifferent bit-rates were selected for the

different data sets.

18

Ballroom test result

Race1 test result

Page 19: Efficient Prediction Structure for Multi-view Video Coding

Experimental results – subjective evaluationAS_IBP outperforms the anchors significantly.The gain decreases slightly with higher bit-rates.

19

Average results over all test sequences

Page 20: Efficient Prediction Structure for Multi-view Video Coding

Influence of camera densityUsing Rena sequence, and

consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras

Repeated for each shifted set of 9 adjacent cameras

The structure are applied to every time instance of the MVV sequence without temporal prediction.

20

Page 21: Efficient Prediction Structure for Multi-view Video Coding

Results of experiments on camera density

Coding gain increases with decreasing camera distance and decreasing reconstruction quality.

21

Page 22: Efficient Prediction Structure for Multi-view Video Coding

Results of experiments on camera density (cont’d)

Results of average per camera rate relative to the one camera case(→)

A larger QP value leads to a larger coding gain

22

Page 23: Efficient Prediction Structure for Multi-view Video Coding

ConclusionResulting multi-view prediction: achieving

significant coding gains and being highly flexible.

Parallel processing is supported by the presented sequential processing approach.

Problems:Large disparities between the different views

of multi-view video sequencesIllumination and color inconsistencies across

views

23