Download - Pré-analyse de la vidéo pour un codage adapté Application au codage de la TVHD en flux H.264 Olivier Brouard École Doctorale Sciences et Technologie de.

Pré-analyse de la vidéo pour un codage adapté

Application au codage de la TVHD en flux H.264

Olivier Brouard

École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée

20 juillet 2010Encadrants : Dominique Barba et Vincent Ricordel

Pre-analysis of video for its advanced coding

Application to the HDTV coding in H.264 streams

Olivier Brouard

École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée

July 20th 2010Supervisors : Dominique Barba and Vincent Ricordel

Motivations

Emergence of the HDTV New displays

SDTV: 720x576 pixels HDTV: 1920x1080 pixels

10 April 2023 Olivier Brouard

Introduction

better immersion for the users more pixels (5x)

Need for a new video coding standard H.264 (or MPEG-4 AVC)

From SDTV to HDTV

from 4% to 20% of the visual field

Slide 3/47

H.264

Advanced video coder (dissymetrical coding)

But short term decisions, « low level » signal based no coding consistency


Introduction

+ prediction modes richness+ advanced entropy coding

higher bit rate reduction (up to 50% MPEG-2)

Reference frames

Slide 4/47


Introduction

Human as the final observer

Needs

Control the perceptual quality

Ensure the coding temporal coherence of the objects

avoid the perceptible distortions

the rendering of an object has to be consistent temporally

- blocking effects- flickering effects

Slide 5/47

Objectives & proposals


Introduction

no such tools within the current encoders

Solution realize a video pre-analysis before the encoding step guide the encoder in its decisions

How to do ? medium/long term decisions « high level » considerations

Slide 6/47


Outline

1. Video pre-analysis

2. Applications: H.264 video coding

2.1 GOP structure adaptation

2.2 Adaptive quantization

1.1 Advanced motion estimation

1.2 Spatio-temporal segmentation

1.3 Visual attention modeling


Slide 7/47


1- Video pre-analysis

Based on HVS properties « high level » information to the encoder

Video pre-analysis

The Human Visual System (HVS)• Luminance perception• Color perception• Contrast sensibility• Masking effects

Visual Attention• Bottom-Up guided by the saliency • Top-Down guided by the tasks

Slide 8/47



Visual attention Attributes guiding the deployment of visual attention [Wolfe 04]

• Contrast, Motion, Color, Orientation, …

Visual attention modeling[Itti 01; Le Meur 07; Marat 10] based on the Koch and Ullman model [Koch 85]

Perceptually important regions most salient objects (physically and semantically)

Shapes of regions (saliency maps) shape of objects [Milanese 1993]

Slide 9/47

moving objects attract our visual attention



Video pre-analysis

Slide 10/47


1- Video pre-analysis – Advanced motion estimation

Spatio-temporal tube (1) Visual fixing time in the HVS ~ 200 ms Next generation of HDTV

1920x1080 in progressive mode at 50Hz

temporal segment of 9 frames: 180ms [Péchard 2007]

Assumption- uniform motion

coherence of the motion along a perceptually significant duration

spatio-temporal tube

motion vectors field more homogeneous

Slide 11/47


Spatio-temporal tube (2) Implementation

• spatial down-sampling• temporal down-sampling

- central frame current frame

- 4 reference frames

The spatio-temporal tube minimizes

=> MSEG

MSEk based on the 3 YUV components

with k = -4, -2, +2, +4

Slide 12/47

1- Video pre-analysis – Advanced motion estimation


1- Video pre-analysis – Spatio-temporal segmentation

Global motion Apparent motions due to

moving objects camera motion

Motion segmentation based on the residual motion

• a1, a2, a3, a4: deformation parameters• tx, ty: translation parameters• Vx, Vy: horizontal and vertical components of

each MV (spatio-temporal tube)

Affine model

Slide 13/47

2. Accumulation of the residual MVs (tubes)

2-D histogram (tx, ty)10 April 2023 Olivier Brouard


Global motion parameters estimation

Motion vectors fields parameters estimation[Coudray 2005]

Global motion estimation in 2 steps:1. For each MV (tube) calculation of the derivatives

• accumulation of the parameters assumptions

• localization of the main peak

Slide 14/47



Motion segmentation

2-D Histogram of the translation parameters residual MVs (tx, ty)

Each histogram peak => a moving object analysis of all the peaks

Iterative approach

1. Initialisation detection of the main peak greedy approach(local gradient)

2. Detection of the other peaks greedy approach

Main peak

Secondary peak Segmented space

Accumulation histogram

Slide 15/47



Motion segmentation – results

need of a spatial and temporal regularizationSlide 16/47



Video pre-analysis

Slide 17/47



Spatio-temporal regularization

Motion-based segmentation

some blocks are misclassified

more criteria to improve the segmentation

• connexity • color • texture• motion

Markovian approach

Slide 18/47



Markovian approach The Hammersley-Clifford theorem [Besag 1974]

Gibbs distribution Markov Random Field the optimal label configuration minimize a global energy function

• E: label field • O: observation field

Slide 19/47

Markovian property U(o, e): sum of potential functions defined on cliques

• site spatio-temporal tube

Texture features

• texture distributions 2 spatial gradients (Sobel filters) Bhattacharrya coefficient



Spatial regularization

Spatial connexity

• Segmented region locally homogeneous

Color features

• color distributions Bhattacharrya coefficient discrete densities

Slide 20/47



Temporal regularization Motion features

distance between the MVs

Regions tracking• criteria

- color, texture, recoveryvideo objects tracking

Temporal connexity• Segmented region

=> temporally homogeneous segmentation map of the previous temporal segment

Slide 21/47



Energy minimization

The global energy function

Sequential sites processing

stack of instability

- potential functions- weigthing factors

Slide 22/47



Results

motion segmentation only

regularized spatio-temporal segmentation

Slide 23/47



Video pre-analysis

Slide 24/47


1- Video pre-analysis – Visual attention modeling

Spatial saliency

Spatial saliency based on the color contrast [Aziz 2008]

color transformation: YUV to HSV

Spatial saliency: SSP => combination of these 7 features

• color features influencing the visual attention1- Saturation Contrast2- Intensity Contrast 3- Hue Contrast

4- Opponents Contrast5- Warm and Cold colors Contrast6- Dominance of the warm colors

7- Dominance of the luminance and saturation

Slide 25/47



Temporal saliency Temporal saliency based on the relative motion

• maximum velocity of smooth pursuit of the eye [Daly 1998]: => 80°/s

=> temporal saliency ST

• : MV of the site s• : dominant motion• : relative motion of s

=>

Slide 26/47



Spatio-temporal saliency

Fusion of the spatial saliency and temporal saliency maps

Observers => focus on the center of the screen [Le Meur 2005]

weighting by a 2-D gaussian function

Slide 27/47



Results

Slide 28/47



Possible applications Video pre-analysis

information- moving objects segmentation, objects tracking- color, texture- salient regions

applications- advanced video coding- video transmission with priority (saliency maps)- video summarization, indexation- …

ArchiPEG (ANR Project)- HD MPEG-4 AVC real-time compression- pre-analysis video resource

Slide 29/47


Outline



2.1 GOP structure adaptation

2.2 Adaptive quantization

1.1 Advanced motion estimation

1.2 Spatio-temporal segmentation

1.3 Visual attention modeling


Slide 30/47


2- Applications: H.264 video coding – GOP structure adaptation

GOP structure

Three kinds of frames: I, P, B

• GOP begins by a I frame intra coded• P frames at regular intervals predicted• B frames between P frames bi-predicted

Fixed interval between I frames• not adapted to changing scenes and temporal variations of the video => more bits

dynamic GOP size irregular I-frames insertion

Typically: number of B frames = 1 or 2 good trade-off between bitrate and quality• low motion or panning of the camera

increase the number of B-frames

Slide 31/47



B frames adaptation (1) Analysis of the video sequences

x264 encoder different fixed number of B frames: 0, 1, 2, 3

Video Sequence Optimal GOP configurationNew Mobile and Calendar 2 B frames

Night 2 B framesKnightshields 2 B frames

Crew 1 B framePark run 1 B framePark joy no B frameTractor no B frame

Umbrella no B frame

optimal number of B frames => content dependent

classify videos according to their content

Slide 32/47



B frames adaptation (2) Spatio-temporal characterization

For each temporal segment For the entire sequence

-> 2 indices to evaluate the spatio-temporal activity- IT: temporal activity => MVs - IS: spatial activity => MSEG

Slide 33/47



B frames adaptation (3)

Classification space function of IT and IS• classe Ci => i B frames between P-P or I-P frames IT constant between P-P or I-P frames same rule for IS

Slide 34/47



GOP size adaptation (1)

Changes detection within a video shot

• high motion significant changes reduce the interval

• low motion little variation increase the interval

• mid-range motion classical approach => fixed GOP size

2 thresholds to detect critical changes - sh => high motion- sb => low motion

Slide 35/47



GOP size adaptation (2) Analysis of IT evolution 3 cases

Mid-range motion High motion

Low motion

Slide 36/47



Performances

8 video sequences

4 different bitrates defined by an experts group

Comparison between

- x264 encoder: GOP size = 25, 2 B frames - a modified version

=> GOP structure adaptation

Slide 37/47



Results Rate – Distortion (PSNR) [Bjontegaard 2001]

Slide 38/47

Video Sequence Bitrate gain (%) PSNR gain (dB)New Mobile and Calendar 9.15 0.32

Night 2.45 0.09Knightshields 1.68 0.06

Park run -0.1 -0.01Umbrella 4.11 0.13Park joy 2.83 0.09Crew 4.5 0.13Tractor 10.94 0.48

Average 4.45 0.16



Subjective tests Setup

• display resolution 1920x1080 • normalized room [BT.500-11]• ~30 naïve observers• (72=8x4x2+8) video sequences

Methodology ACR• for each sequence

observers have to assess the quality

Slide 39/47



Results• QGOP: MOS modified coder • Qx264: MOS x264 coder

• sequences with a high IT value high motion GOP structure adaptation

Slide 40/47

Video SequenceNew Mobile and Calendar 0.31

Night -0.02Knightshields 0.24

Park run 0.04Umbrella -0.09Park joy 0.14Crew 0.48Tractor 0.33Average 0.18

Objective control the distribution of binaries resources

saliency maps increase the perceived visual quality

Modification of the saliency maps quantization and morphological filtering

Modification of the coder


2- Applications: H.264 video coding – Adaptive quantization

Adaptive quantization

Slide 41/47



Results (1) Rate – Distortion (PSNR) [Bjontegaard 2001]

Slide 42/47

Video Sequence Entire sequence Region of Interest

Bitrate gain (%) PSNR gain (dB) Bitrate gain (%) PSNR gain (dB)

New Mobile and Calendar -2.49 -0.09 -0.67 -0.03Night -3.38 -0.12 -0.39 -0.02

Knightshields -3.02 -0.12 -0.84 -0.03Parkrun -0.81 -0.03 0.25 0.01

Umbrella 2.34 0.07 4.17 0.14

Parkjoy 2.68 0.09 4.42 0.14

Crew -0.36 -0.01 2.74 0.09

Tractor 10.94 0.05 4.35 0.20

Average -0.52 -0.02 1.75 0.06



Subjective assessments Results

• QQA: MOS modified coder (adaptive quantization)• Qx264: MOS x264 coder

no specific content suitable unsuitable for coding and broadcasting of HDTV at high bitrate

overhead, linear law ?Slide 43/47

Video SequenceNew Mobile and Calendar -0.13

Night 0.09Knightshields -0.02

Park run -0.06Umbrella 0.06Park joy 0.17

Crew -0.06Tractor 0.04

Average 0.04


Conclusion

Conclusion (1) Video pre-analysis

• visual attention modeling saliency maps

• spatio-temporal segmentation detection of moving objects objects tracking

Applications

• advanced video coding• video transmission with priority based on the saliency maps [Boulos 2010]• video summarization, indexation• …

Slide 44/47


Conclusion

Conclusion (2)

Applications of the video pre-analysis

• GOP structure adaptation

- B frames dynamic variation temporal segment classification

IT and IS

- GOP size adaptation I frame insertion

change detection: IT

• Adaptive quantization based on the saliency maps

Slide 45/47


Conclusion

Conclusion (3)

Subjective quality assessment tests

• GOP structure adaptation

no significant differences +0.18 (on a scale of 1 to 5) well suited for sequences with high motion

• Adaptive quantization no clearly content suitability seems unsuitable for coding and broadcasting of HDTV at high bitrate

… adaptation law could be modified …

Slide 46/47

Conclusion

Perspectives

10 April 2023 Olivier Brouard Slide 47/47

Better performance evaluation of our visual attention model

eye-tracking experiments

Psychophysical experiments to optimize the model parameters

improve the fusion process [Marat 2010]

Add high-level visual information face, flesh hue, …

Thank you.

Questions ?

10 April 2023 Olivier Brouard Slide 48