Pré-analyse de la vidéo pour un codage adapté
Application au codage de la TVHD en flux H.264
Olivier Brouard
École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
20 juillet 2010Encadrants : Dominique Barba et Vincent Ricordel
Pre-analysis of video for its advanced coding
Application to the HDTV coding in H.264 streams
Olivier Brouard
École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
July 20th 2010Supervisors : Dominique Barba and Vincent Ricordel
Motivations
Emergence of the HDTV New displays
SDTV: 720x576 pixels HDTV: 1920x1080 pixels
10 April 2023 Olivier Brouard
Introduction
better immersion for the users more pixels (5x)
Need for a new video coding standard H.264 (or MPEG-4 AVC)
From SDTV to HDTV
from 4% to 20% of the visual field
Slide 3/47
H.264
Advanced video coder (dissymetrical coding)
But short term decisions, « low level » signal based no coding consistency
10 April 2023 Olivier Brouard
Introduction
+ prediction modes richness+ advanced entropy coding
higher bit rate reduction (up to 50% MPEG-2)
Reference frames
Slide 4/47
10 April 2023 Olivier Brouard
Introduction
Human as the final observer
Needs
Control the perceptual quality
Ensure the coding temporal coherence of the objects
avoid the perceptible distortions
the rendering of an object has to be consistent temporally
- blocking effects- flickering effects
Slide 5/47
Objectives & proposals
10 April 2023 Olivier Brouard
Introduction
no such tools within the current encoders
Solution realize a video pre-analysis before the encoding step guide the encoder in its decisions
How to do ? medium/long term decisions « high level » considerations
Slide 6/47
10 April 2023 Olivier Brouard
Outline
1. Video pre-analysis
2. Applications: H.264 video coding
2.1 GOP structure adaptation
2.2 Adaptive quantization
1.1 Advanced motion estimation
1.2 Spatio-temporal segmentation
1.3 Visual attention modeling
1. Video pre-analysis
Slide 7/47
10 April 2023 Olivier Brouard
1- Video pre-analysis
Based on HVS properties « high level » information to the encoder
Video pre-analysis
The Human Visual System (HVS)• Luminance perception• Color perception• Contrast sensibility• Masking effects
Visual Attention• Bottom-Up guided by the saliency • Top-Down guided by the tasks
Slide 8/47
10 April 2023 Olivier Brouard
1- Video pre-analysis
Visual attention Attributes guiding the deployment of visual attention [Wolfe 04]
• Contrast, Motion, Color, Orientation, …
Visual attention modeling[Itti 01; Le Meur 07; Marat 10] based on the Koch and Ullman model [Koch 85]
Perceptually important regions most salient objects (physically and semantically)
Shapes of regions (saliency maps) shape of objects [Milanese 1993]
Slide 9/47
moving objects attract our visual attention
10 April 2023 Olivier Brouard
1- Video pre-analysis
Video pre-analysis
Slide 10/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Advanced motion estimation
Spatio-temporal tube (1) Visual fixing time in the HVS ~ 200 ms Next generation of HDTV
1920x1080 in progressive mode at 50Hz
temporal segment of 9 frames: 180ms [Péchard 2007]
Assumption- uniform motion
coherence of the motion along a perceptually significant duration
spatio-temporal tube
motion vectors field more homogeneous
Slide 11/47
10 April 2023 Olivier Brouard
Spatio-temporal tube (2) Implementation
• spatial down-sampling• temporal down-sampling
- central frame current frame
- 4 reference frames
The spatio-temporal tube minimizes
=> MSEG
MSEk based on the 3 YUV components
with k = -4, -2, +2, +4
Slide 12/47
1- Video pre-analysis – Advanced motion estimation
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Global motion Apparent motions due to
moving objects camera motion
Motion segmentation based on the residual motion
• a1, a2, a3, a4: deformation parameters• tx, ty: translation parameters• Vx, Vy: horizontal and vertical components of
each MV (spatio-temporal tube)
Affine model
Slide 13/47
2. Accumulation of the residual MVs (tubes)
2-D histogram (tx, ty)10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Global motion parameters estimation
Motion vectors fields parameters estimation[Coudray 2005]
Global motion estimation in 2 steps:1. For each MV (tube) calculation of the derivatives
• accumulation of the parameters assumptions
• localization of the main peak
Slide 14/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Motion segmentation
2-D Histogram of the translation parameters residual MVs (tx, ty)
Each histogram peak => a moving object analysis of all the peaks
Iterative approach
1. Initialisation detection of the main peak greedy approach(local gradient)
2. Detection of the other peaks greedy approach
Main peak
Secondary peak Segmented space
Accumulation histogram
Slide 15/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Motion segmentation – results
need of a spatial and temporal regularizationSlide 16/47
10 April 2023 Olivier Brouard
1- Video pre-analysis
Video pre-analysis
Slide 17/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Spatio-temporal regularization
Motion-based segmentation
some blocks are misclassified
more criteria to improve the segmentation
• connexity • color • texture• motion
Markovian approach
Slide 18/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Markovian approach The Hammersley-Clifford theorem [Besag 1974]
Gibbs distribution Markov Random Field the optimal label configuration minimize a global energy function
• E: label field • O: observation field
Slide 19/47
Markovian property U(o, e): sum of potential functions defined on cliques
• site spatio-temporal tube
Texture features
• texture distributions 2 spatial gradients (Sobel filters) Bhattacharrya coefficient
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Spatial regularization
Spatial connexity
• Segmented region locally homogeneous
Color features
• color distributions Bhattacharrya coefficient discrete densities
Slide 20/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Temporal regularization Motion features
distance between the MVs
Regions tracking• criteria
- color, texture, recoveryvideo objects tracking
Temporal connexity• Segmented region
=> temporally homogeneous segmentation map of the previous temporal segment
Slide 21/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Energy minimization
The global energy function
Sequential sites processing
stack of instability
- potential functions- weigthing factors
Slide 22/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Spatio-temporal segmentation
Results
motion segmentation only
regularized spatio-temporal segmentation
Slide 23/47
10 April 2023 Olivier Brouard
1- Video pre-analysis
Video pre-analysis
Slide 24/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Visual attention modeling
Spatial saliency
Spatial saliency based on the color contrast [Aziz 2008]
color transformation: YUV to HSV
Spatial saliency: SSP => combination of these 7 features
• color features influencing the visual attention1- Saturation Contrast2- Intensity Contrast 3- Hue Contrast
4- Opponents Contrast5- Warm and Cold colors Contrast6- Dominance of the warm colors
7- Dominance of the luminance and saturation
Slide 25/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Visual attention modeling
Temporal saliency Temporal saliency based on the relative motion
• maximum velocity of smooth pursuit of the eye [Daly 1998]: => 80°/s
=> temporal saliency ST
• : MV of the site s• : dominant motion• : relative motion of s
=>
Slide 26/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Visual attention modeling
Spatio-temporal saliency
Fusion of the spatial saliency and temporal saliency maps
Observers => focus on the center of the screen [Le Meur 2005]
weighting by a 2-D gaussian function
Slide 27/47
10 April 2023 Olivier Brouard
1- Video pre-analysis – Visual attention modeling
Results
Slide 28/47
10 April 2023 Olivier Brouard
1- Video pre-analysis
Possible applications Video pre-analysis
information- moving objects segmentation, objects tracking- color, texture- salient regions
applications- advanced video coding- video transmission with priority (saliency maps)- video summarization, indexation- …
ArchiPEG (ANR Project)- HD MPEG-4 AVC real-time compression- pre-analysis video resource
Slide 29/47
10 April 2023 Olivier Brouard
Outline
1. Video pre-analysis
2. Applications: H.264 video coding
2.1 GOP structure adaptation
2.2 Adaptive quantization
1.1 Advanced motion estimation
1.2 Spatio-temporal segmentation
1.3 Visual attention modeling
2. Applications: H.264 video coding
Slide 30/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
GOP structure
Three kinds of frames: I, P, B
• GOP begins by a I frame intra coded• P frames at regular intervals predicted• B frames between P frames bi-predicted
Fixed interval between I frames• not adapted to changing scenes and temporal variations of the video => more bits
dynamic GOP size irregular I-frames insertion
Typically: number of B frames = 1 or 2 good trade-off between bitrate and quality• low motion or panning of the camera
increase the number of B-frames
Slide 31/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (1) Analysis of the video sequences
x264 encoder different fixed number of B frames: 0, 1, 2, 3
Video Sequence Optimal GOP configurationNew Mobile and Calendar 2 B frames
Night 2 B framesKnightshields 2 B frames
Crew 1 B framePark run 1 B framePark joy no B frameTractor no B frame
Umbrella no B frame
optimal number of B frames => content dependent
classify videos according to their content
Slide 32/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (2) Spatio-temporal characterization
For each temporal segment For the entire sequence
-> 2 indices to evaluate the spatio-temporal activity- IT: temporal activity => MVs - IS: spatial activity => MSEG
Slide 33/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (3)
Classification space function of IT and IS• classe Ci => i B frames between P-P or I-P frames IT constant between P-P or I-P frames same rule for IS
Slide 34/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
GOP size adaptation (1)
Changes detection within a video shot
• high motion significant changes reduce the interval
• low motion little variation increase the interval
• mid-range motion classical approach => fixed GOP size
2 thresholds to detect critical changes - sh => high motion- sb => low motion
Slide 35/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
GOP size adaptation (2) Analysis of IT evolution 3 cases
Mid-range motion High motion
Low motion
Slide 36/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
Performances
8 video sequences
4 different bitrates defined by an experts group
Comparison between
- x264 encoder: GOP size = 25, 2 B frames - a modified version
=> GOP structure adaptation
Slide 37/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
Results Rate – Distortion (PSNR) [Bjontegaard 2001]
Slide 38/47
Video Sequence Bitrate gain (%) PSNR gain (dB)New Mobile and Calendar 9.15 0.32
Night 2.45 0.09Knightshields 1.68 0.06
Park run -0.1 -0.01Umbrella 4.11 0.13Park joy 2.83 0.09Crew 4.5 0.13Tractor 10.94 0.48
Average 4.45 0.16
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
Subjective tests Setup
• display resolution 1920x1080 • normalized room [BT.500-11]• ~30 naïve observers• (72=8x4x2+8) video sequences
Methodology ACR• for each sequence
observers have to assess the quality
Slide 39/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – GOP structure adaptation
Results• QGOP: MOS modified coder • Qx264: MOS x264 coder
• sequences with a high IT value high motion GOP structure adaptation
Slide 40/47
Video SequenceNew Mobile and Calendar 0.31
Night -0.02Knightshields 0.24
Park run 0.04Umbrella -0.09Park joy 0.14Crew 0.48Tractor 0.33Average 0.18
Objective control the distribution of binaries resources
saliency maps increase the perceived visual quality
Modification of the saliency maps quantization and morphological filtering
Modification of the coder
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – Adaptive quantization
Adaptive quantization
Slide 41/47
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – Adaptive quantization
Results (1) Rate – Distortion (PSNR) [Bjontegaard 2001]
Slide 42/47
Video Sequence Entire sequence Region of Interest
Bitrate gain (%) PSNR gain (dB) Bitrate gain (%) PSNR gain (dB)
New Mobile and Calendar -2.49 -0.09 -0.67 -0.03Night -3.38 -0.12 -0.39 -0.02
Knightshields -3.02 -0.12 -0.84 -0.03Parkrun -0.81 -0.03 0.25 0.01
Umbrella 2.34 0.07 4.17 0.14
Parkjoy 2.68 0.09 4.42 0.14
Crew -0.36 -0.01 2.74 0.09
Tractor 10.94 0.05 4.35 0.20
Average -0.52 -0.02 1.75 0.06
10 April 2023 Olivier Brouard
2- Applications: H.264 video coding – Adaptive quantization
Subjective assessments Results
• QQA: MOS modified coder (adaptive quantization)• Qx264: MOS x264 coder
no specific content suitable unsuitable for coding and broadcasting of HDTV at high bitrate
overhead, linear law ?Slide 43/47
Video SequenceNew Mobile and Calendar -0.13
Night 0.09Knightshields -0.02
Park run -0.06Umbrella 0.06Park joy 0.17
Crew -0.06Tractor 0.04
Average 0.04
10 April 2023 Olivier Brouard
Conclusion
Conclusion (1) Video pre-analysis
• visual attention modeling saliency maps
• spatio-temporal segmentation detection of moving objects objects tracking
Applications
• advanced video coding• video transmission with priority based on the saliency maps [Boulos 2010]• video summarization, indexation• …
Slide 44/47
10 April 2023 Olivier Brouard
Conclusion
Conclusion (2)
Applications of the video pre-analysis
• GOP structure adaptation
- B frames dynamic variation temporal segment classification
IT and IS
- GOP size adaptation I frame insertion
change detection: IT
• Adaptive quantization based on the saliency maps
Slide 45/47
10 April 2023 Olivier Brouard
Conclusion
Conclusion (3)
Subjective quality assessment tests
• GOP structure adaptation
no significant differences +0.18 (on a scale of 1 to 5) well suited for sequences with high motion
• Adaptive quantization no clearly content suitability seems unsuitable for coding and broadcasting of HDTV at high bitrate
… adaptation law could be modified …
Slide 46/47
Conclusion
Perspectives
10 April 2023 Olivier Brouard Slide 47/47
Better performance evaluation of our visual attention model
eye-tracking experiments
Psychophysical experiments to optimize the model parameters
improve the fusion process [Marat 2010]
Add high-level visual information face, flesh hue, …
Thank you.
Questions ?
10 April 2023 Olivier Brouard Slide 48
Top Related