Coding for Virtual and Augmented Reality -...
Transcript of Coding for Virtual and Augmented Reality -...
Coding for Virtual and Augmented Reality
Philip A. Chou, Microsoft ResearchPacket Video Workshop, 15 July 2016
VR puts you in a Virtual World
AR puts virtual objects in your world
HMDs for Virtual Reality are Enclosed
Cardboard VR Gear VR Daydream
Playstation VRHTC ViveOculus Rift
HMDs for Augmented Reality are See-Through
HoloLens Meta Daqri
Mixed Reality: from AR to VR
Paul Milgram, Haruo Takemura, Akira Utsumi, Fumio KishinoAugmented reality: a class of displays on the reality-virtuality continuum Proc. SPIE Telemanipulator and Telepresence Technologies, Dec 1995
Eventual Merging of VR and AR
β’ Video-see-through will be on VR devices
β’ Full immersion (high contrast and FOV) will be on AR devices
Redirected Walking in VR
Razzaque, Kohn, Whitten (2001)
βCinematicβ Content for VR and AR
VR ARVR AR
VR Capture: βInside-Outβ
Lytro Immerge
Samsung Beyond
Nokia Ozo
GoPro Odyssey
Jaunt VR
Facebook Surround
AR Capture: βOutside-Inβ
1G: TELEPHONE (1876) 3G: HOLOPORTATION (2016)2G: TELEVISION (1926)
Immersive communication is the real time exchange of the natural social signals between people who are geographically separated, as if they were in co-located.
50 years 90 years
VR/AR: Third Generation of Immersive Communication
VR/AR: Fourth Generation Computing Platform(after PC, web, and mobile)
http://www.digi-capital.com/news/2016/01/augmentedvirtual-reality-revenue-forecast-revised-to-hit-120-billion-by-2020/
http://www.goldmansachs.com/our-thinking/pages/technology-driving-innovation-folder/virtual-and-augmented-reality/report.pdf
Goldman Sachs: $80B by 2025
Coding
Degree of Motion Parallax determines Coding
Virtual Reality (panoramic)β’ Motion Parallax is limited because
of limited movementβ’ Requires yaw, pitch, roll but not
(much) x, y, z translationβ’ Stationary camera with 360Β° view
makes sense (inside-out)β’ Representation can be spherical
video (possibly stereo, depth)β’ Coding: map spherical video to
rectangular video
Augmented Reality (volumetric)β’ Motion Parallax is unlimited
because roaming is allowedβ’ Requires x, y, z translation as well
as yaw, pitch, roll (6DoF)β’ Object capture makes sense
(outside-in)β’ Representation must be light field,
mesh, point cloud, etc.β’ Coding: novel compression
techniques
Omnidirectional Mappings
Equirectangular projection
- Simple, popular
- Large horizontal oversampling near poles
Equal-area projection- Decreased vertical sampling near
poles
Cube Map- Popular in graphics
community
Courtesy Hari Lakshman and MPEG/JPEG
Omnidirectional Mapping Application Framework (OMAF)β’ MPEG activity
β’ Phases
β’ Timeline
OMAF v1 (2017)
β’ Basic framework
OMAF v2 (2018)
β’ 3D Audio metadata
β’ New projection
β’ AR
OMAF v3 (2020)
β’ New video coding
Issues for VR
β’ Mapping
β’ Audio
β’ Video coding may need adjustment (e.g., motion comp)
β’ Evaluation criteria
β’ Streaming on demandβ’ Directional or spatial random access, ROI coding
β’ Interactivity (much more rapid than trick modes)
β’ Latency
β’ Broadcasting
Pipeline for Video / VR / AR
capture fuse encode decode render display
Video Video Digital video Digital video Video
VR Array of video
Digital video(may be stereo, may have depth)
Digital video(may be stereo, may have depth)
β’ Browserβ’ Phone, tablet appβ’ Low-end VR device (e.g., Cardboard VR, Gear VR)β’ High-end VR device (e.g., Oculus Rift, HTC Vive, Playstation VR)
AR Array of video + depth
Volumetricrepresentation
Volumetricrepresentation
β’ Browserβ’ Phone, tablet appβ’ Low-end VR device (e.g., Cardboard VR, Gear VR)β’ High-end VR device (e.g., Oculus Rift, HTC Vive, Playstation VR)β’ High-end AR device (e.g., HoloLens)
Volumetric Representations
β’ At present, not even the representation is agreed upon
β’ Some candidates:β’ Dynamic Meshes
β’ Advantages: typical pipeline, easy to interpolate as it represents a surfaceβ’ Disadvantages: does not handle noise well, or non-surfaces
β’ Dynamic Point Cloudsβ’ Advantages: easily processed in parallel, handles noise and non-surfacesβ’ Disadvantages: hard to interpolate
β’ Hybridsβ’ Light Fieldsβ’ Hologramsβ’ Dense volumetric functions and implicit surfaces
Meshes and Point Clouds: Irregular Domains
Graph Signal Processing (GSP):Framework for Signals on Irregular Domains
β’ Generalizes processing of real-valued signals defined on a regular domain (such as an image grid) to real-valued signals defined on a discrete graph.
β’ Applications to networks: social, sensor, communication, energy, transportation, neuronal; and geometry.
β’ Generalizes linear transform, impulse response, shift invariance, spectrum, frequency response, filtering, smoothing, interpolation, denoising, convolution, sampling, translation, etc.
Reference: Shuman, Narang, Frossard, Ortega, and Vendergheynst, βSignal Processing on Graphs,β IEEE Signal Processing Magazine, May 2013.
π£ β π
π₯(π£)
Domain π
Graph (π, πΈ)
Signal π₯
Adjacency Matrix as Shift Operator
β’ A graph (π, πΈ) is represented by an (optionally weighted) adjacency matrixπ΄ = πππ , with πππ > 0 being the weight on edge π, π β πΈ.
Example 1Left Shift
operator on N points on a circleπ΄ =
0 1 0 00 0 β± 00 0 0 11 0 0 0
Eigenvectors are ππππ/π
since π΄ ππππ/π =
ππππ/π ππππ(πππ/π)
Example 2Symmetric Random Walk
operator on N points on a circleπ΄ =
1
2
0 1 0 11 0 β± 00 β± 0 11 0 1 0
0 50 100 150-0.2
0
0.2
A. Sandryhaila and J. M. F. Moura, βDiscrete signal processing on graphs,β IEEE Transactions on Signal Processing, vol. 61, no. 7, pp.1644-1656, 2013
Linear Shift Invariant operators and the GFT
β’ {LSI operators} β {analytic functions π(π΄)} β {rational functions πβ1 π΄ π(π΄)} β {polynomials π π΄ = π0πΌ + π1π΄ +β―+ πππ΄
π}β’ π π΄ π΄ = βπππ΄
π π΄ = βπππ΄π+1 = π΄(βπππ΄
π) = π΄π π΄
β’ If shift operator π΄ has eigenvectors Ξ¨ and eigenvalues Ξ (i.e., π΄Ξ¨ = Ξ¨Ξ)then any LSI operator π(π΄) has eigenvectors Ξ¨ and eigenvalues π Ξ(i.e., π π΄ Ξ¨ = Ξ¨π Ξ )β’ π΄ = Ξ¨ΞΞ¨β1
β’ π π΄ = βπππ΄π = βππ Ξ¨ΞΞ¨β1 π = Ξ¨βππΞ
πΞ¨β1 = Ξ¨π Ξ Ξ¨β1
Therefore π π΄ π₯ = Ξ¨π Ξ Ξ¨β1π₯.
β’ Ξ¨β1π₯ is known as the Graph Fourier Transform of π₯.
Graph Laplacian
β’ Often shift π΄ is stochastic (rows or columns sum to 1)β’ πΏ = πΌ β π΄ is the Graph Laplacian (same eigenvectors as π΄, eigenvalues πΌ β Ξ)
β’ More generallyβ’ πΏ = π· β π΄ is the Graph Laplacian (π· = ππππ(ππ), ππ = βπ π΄ππ)
β’ Normalizationsβ’ πΏ = πΌ β π·β1π΄ is the βRandom Walkβ Graph Laplacian
β’ πΏ = πΌ β π·β1
2π΄π·β1/2 is the βNormalizedβ Graph Laplacian
β’ Frequently πΏ = Ξ¨ΞΞ¨β1 is the starting point instead of π΄, and real PSDβ’ Eigenvalues ππππ(Ξ) are real non-negative (the graph spectrum)
Mesh as a Graph
Eigenvectors of a Mesh
Eigenvalues (Spectrum) as βFrequenciesβ
0 2000 4000 6000 8000 10000 12000 140000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4 Zero-crossings of eigenvectors
0 2000 4000 6000 8000 10000 12000 14000-2
0
2
4
6
8
10
12Eigenvalues of L
Spectrum of Geometry as a Signal
0 2 4 6 8 10 12-500
-400
-300
-200
-100
0
100
200
300
400
500Spectrum of Z component as a signal
0 2 4 6 8 10 12-500
-400
-300
-200
-100
0
100
200
300
400
500Spectrum of Y component as a signal
0 2 4 6 8 10 12-500
-400
-300
-200
-100
0
100
200
300
400
500Spectrum of X component as a signal
Energy Compaction100% of coefficients 50% of coefficients 10% of coefficients 2% of coefficients
Spectral Domain Filtering / Denoising
Spectral Domain Filtering / Denoising
Impulse Response is Localized
Polynomials and Vertex-Domain Filtering
β’ Spectral filters do not generally have compact vertex-domain support.
β’ Spectral filters π(π) correspond to LSI transforms π πΏ = Ξ¨π Ξ Ξ¨π .
β’ Polynomial spectral filters have compact supportβ’ π¦ = π πΏ π₯ = βπ=0
π πππΏπ π₯
β’ If π₯ π = πΏ(π, π0) then π¦ π β 0 only if π π, π0 β€ π, where π is geodesic distance
β’ Polynomial filters can be efficiently evaluated in vertex domainβ’ Evaluate πΏπ₯ by message passing, then πΏ2π₯, β¦, and lastly πΏππ₯.
β’ Finally sum βπ=0π πππΏ
π π₯. All can be done in parallel (e.g., on GPU).
Prediction and Interpolation
β’ Problem: Find π₯π’πππππ€πsuch that
π₯ =π₯ππππ€ππ₯π’πππππ€π
minimizes the norm
πΏ Ξ€1 2π₯2= π₯ππΏπ₯
of the high-pass signal π¦ = πΏ1/2π₯(using high-pass filter π π = π1/2)
β’ Solution:π₯π’πππππ€π = πΏ22
β1πΏ21π₯ππππ€πβ’ Remark:
Same as πΈ ππ’πππππ€π πππππ€π = π₯ππππ€πif π βΌ π(0, πΏβ1)
Previous Frame
Front Back
Prediction and Interpolation
β’ Problem: Find π₯π’πππππ€πsuch that
π₯ =π₯ππππ€ππ₯π’πππππ€π
minimizes the norm
πΏ Ξ€1 2π₯2= π₯ππΏπ₯
of the high-pass signal π¦ = πΏ1/2π₯(using high-pass filter π π = π1/2)
β’ Solution:π₯π’πππππ€π = πΏ22
β1πΏ21π₯ππππ€πβ’ Remark:
Same as πΈ ππ’πππππ€π πππππ€π = π₯ππππ€πif π βΌ π(0, πΏβ1)
Current Frame
Front Back
Prediction and Interpolation
β’ Problem: Find π₯π’πππππ€πsuch that
π₯ =π₯ππππ€ππ₯π’πππππ€π
minimizes the norm
πΏ Ξ€1 2π₯2= π₯ππΏπ₯
of the high-pass signal π¦ = πΏ1/2π₯(using high-pass filter π π = π1/2)
β’ Solution:π₯π’πππππ€π = πΏ22
β1πΏ21π₯ππππ€πβ’ Remark:
Same as πΈ ππ’πππππ€π πππππ€π = π₯ππππ€πif π βΌ π(0, πΏβ1)
Warp of Previous to Current Frame
Front Back
Smoothing
β’ Smoothing is similar, with a loose constraint:
Find π₯ =π₯1π₯2
minimizing π₯1 β π₯ππππ€π2 + πΌπ₯ππΏπ₯
β’ Special case:
Find π₯ minimizing π₯ β π₯ππππ€π2 + πΌπ₯ππΏπ₯
β’ Solution:
π₯β =πΌ 00 0
+ πΌπΏβ1 π₯ππππ€π
0(equivalent to low-pass Tikhonov filteringwith π π =
1
1+πΌπin special case)
Perfect Reconstruction Critically Sampled2-Channel Filter Banks with Compact SupportNarang and Ortega, βCompact Support Biorthogonal Wavelet Filterbanks for Arbitrary Undirected Graphs,β TSP 2012
Graph Wavelet with Compact Support
π½π½ = ππππ π½π , π½π = +1/β1 if π β πΏ/π»
πΌ =1
2πΊ0 πΌ + π½π½ π»0 +
1
2πΊ1 πΌ β π½π½ π»1
=1
2πΊ0π»0 + πΊ1π»1 +
1
2πΊ0π½π½π»0 β πΊ1π½π½π»1
Design for Bipartite Graph
β’ π2βπ = π½π½ππ (spectral folding)
β’ Perfect Reconstruction ifβ’ π0 π β0 π + π1 π β1 π = 2
β’ π0 π β0 2 β π β π1 π β1 2 β π = 0
π0 π = β1(2 β π) π1 π = β0(2 β π)
Β½(πΌ + π½π½)
Β½(πΌ β π½π½)
Multi-Resolution Processing
Iterate Low-Pass Filtering Procedure
β’ At level πΏ, use some rule to color graph with only two colors( = low pass, = high pass)β’ Easy if graph is bi-partiteβ’ Otherwise separate graph into union
of bi-partite graphs
β’ Filter to get coeffs at L/H vertices
β’ At level πΏ β 1, use some rule to reconnect low pass vertices
β’ Repeat
Application to Mesh Geometry and Color CompressionNguyen, Chou, and Chen, βCompression of Human Body Sequences Using Graph Wavelet Filter Banks,β ICASSP 2014Anis, Chou, and Ortega, βCompression of Dynamic 3D Point Clouds using Subdivisional Meshes and Graph Wavelet Transforms,β ICASSP 2016
Quad Mesh Subdivision
Coarse quad mesh
Quad Mesh Subdivision
Quad Mesh Subdivision
Quad Mesh Subdivision
Quad Mesh Subdivision
Target quad mesh
Quad Mesh Subdivision
0
0 0
0
0
0
0
Lowpass subband
Highpass subband
Quad Mesh Subdivision
0
0 0
0
0
0
0
1
1
1
Lowpass subband
Highpass subband
Quad Mesh Subdivision
0
0 0
0
0
0
0
1
1
1
2
2
22
22
2
2
2
Lowpass subband
Highpass subband
Quad Mesh Subdivision
0
0 0
0
0
0
0
1
1
1
2
2
22
22
2
2
2
3
3 3
3
3
3
3
333
3 3
Lowpass subband
Highpass subband
Quad Mesh Subdivision
0
0 0
0
0
0
0
1
1
1
2
2
22
22
2
2
2
3 3
33
3 3
3
3
3
3
334 4
4
4
44
4
4 4
44
4
4
4
4
4
4
4
44
44
4
4
4 4
44
4
4
Lowpass subband
Highpass subband
0
0 0
0
0
0
Tri Mesh Subdivision
Lowpass subband
Highpass subband
0
0 0
0
0
0
1
1
1
1
1
11
1
1
1
Lowpass subband
Highpass subband
Tri Mesh Subdivision
0
0 0
0
0
0
1
1
1
1
1
11
1
1
12 2
2
2
2
2
2
2
2
2
2
2
22
22
2
2
2 2
22
2
422
2
2
2
2 2
2
22
2
222
Lowpass subband
Highpass subband
Tri Mesh Subdivision
Hybrid Predictive-Transform Coding
β’ Graph transform, uniform scalar quantization, entropy coding
β’ Temporal prediction removes the temporal redundancy, creates motion vectors
β’ Graph transform removes spatial correlation among the motion vectors
π πβ1
Context-Adaptive Arithmetic Coding
β’ Let π = 0,1,2,3,4 be the subband index
β’ If coefficient at node π is in subband π±π, model it as Laplacian with parameter
ππ = π½π,0 +
π=1
πβ1
π½π,πβπβπ©πβ©π±πβπ
π₯π
|π©π β© π±πβπ|
β1
depending on average of magnitudes |π₯π| of neighbors π β π©π β© π±πβπ in previous subbands
β’ Fit constants {π½π,π} to data
0
0 0
0
0
0
0
1
1
1
2
2
22
22
2
2
2
3 3
33
3 3
3
3
3
3
334 4
4
4
44
4
4 4
44
4
4
4
4
4
4
4
44
44
4
4
4 4
44
4
4
Rate-distortion Performance
Geometry Color
Application to Point Cloud Geometry and Color CompressionThanou, Chou, and Frossard, βGraph-based compression of dynamic 3D point cloud sequences,β TIP 2015Queiroz and Chou, βCompression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform,β TIP 2016Queiroz and Chou, βMotion-Compensated Compression of Dynamic Voxelized Point Clouds,β TIP 2016
Application to Point Cloud Geometry and Color CompressionThanou, Chou, and Frossard, βGraph-based compression of dynamic 3D point cloud sequences,β TIP 2015Queiroz and Chou, βCompression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform,β TIP 2016Queiroz and Chou, βMotion-Compensated Compression of Dynamic Voxelized Point Clouds,β TIP 2016
Octree Coding for (Static) Geometry
10010001
10010001 11001001 10010001
Octree Coding for Color?
221,136,255
255,153,255 255,102,255 153,153,255
Quantization, Entropy Coding, and Transmission
Haar Butterfly
π3,0
β2,0π2,0
π3,1 π3,2 π3,3 π3,4 π3,5 π3,6 π3,7
β2,1π2,1 β2,2
π2,2 β2,3π2,3
β1,0π1,0 β1,1
π1,1
β0,0π0,0
1/β2
1/β21/β2 β1/β2
β1/β21/β21/β2
1/β2
1/β21/β2
1/β2β1/β2
β0,0ΰ·π0,01/β2
β1/β21/β2
1/β2β1,0ΰ·π1,0 β1,1ΰ·π1,1
β2,0ΰ·π2,0 β2,1ΰ·π2,1 β2,2ΰ·π2,2 β2,3ΰ·π2,3
ΰ·π3,0 ΰ·π3,1 ΰ·π3,2 ΰ·π3,3 ΰ·π3,4 ΰ·π3,5 ΰ·π3,6 ΰ·π3,7
ππ,πβπ,π
=
1
2
1
2β1
2
1
2
ππ+1,2πππ+1,2π+1
ΰ·ππ+1,2πΰ·ππ+1,2π+1
=
1
2
β1
21
2
1
2
ΰ·ππ,πβπ,π
Tran
sfo
rmIn
vers
e
Tran
sfo
rm
Haar Tree
π3,0
π2,0
π3,1 π3,2 π3,3 π3,4 π3,5 π3,6 π3,7
β2,0
β0,0
ΰ·π3,0 ΰ·π3,1 ΰ·π3,2 ΰ·π3,3 ΰ·π3,4 ΰ·π3,5 ΰ·π3,6 ΰ·π3,7
Quantization, Entropy Coding, and Transmission
ππ,πβπ,π
=
1
2
1
2β1
2
1
2
ππ+1,2πππ+1,2π+1
ΰ·ππ+1,2πΰ·ππ+1,2π+1
=
1
2
β1
21
2
1
2
ΰ·ππ,πβπ,π
Tran
sfo
rmIn
vers
e
Tran
sfo
rm
π2,1 β2,1 π2,2 β2,2 π2,3 β2,3
π1,0 β1,0 π1,1 β1,1
π0,0 β0,0
β1,0ΰ·π1,0 β1,1ΰ·π1,1
β2,0ΰ·π2,0 β2,1ΰ·π2,1β2,2ΰ·π2,2 β2,3ΰ·π2,3
ΰ·π0,0
1/ 2 ππ,π
βπ,π
(ππ+1,2π, ππ+1,2π+1)
1/ 2β1/ 2
Region Adaptive Haar Transform (RAHT)
π3,0
π2,0
π3,1 π3,3 π3,6
β2,0
β0,0
ΰ·π3,0 ΰ·π3,1 ΰ·π3,3 ΰ·π3,6
Quantization, Entropy Coding, and Transmission
ππ,πβπ,π
=π πβπ π
ππ+1,2πππ+1,2π+1
ΰ·ππ+1,2πΰ·ππ+1,2π+1
=π βππ π
ΰ·ππ,πβπ,π
Tran
sfo
rmIn
vers
e
Tran
sfo
rm
π2,1 π2,3
π1,0 β1,0 π1,1
π0,0 β0,0
β1,0ΰ·π1,0 ΰ·π1,1
β2,0ΰ·π2,0 ΰ·π2,1 ΰ·π2,3
ΰ·π0,0
π =π€π+1,2π
π€π+1,2π + π€π+1,2π+1
π€3,0 = 1 π€3,1 = 1 π€3,3 = 1 π€3,6 = 1
π€2,0 = 2
π€1,0 = 3
π€0,0 = 4
π€2,1 = 1 π€2,3 = 1
π€0,0 = 4
π€1,0 = 3
π€1,1 = 1
π€1,1 = 1
π€2,0 = 2 π€2,1 = 1 π€2,3 = 1
π€3,0 = 1 π€3,1 = 1 π€3,3 = 1 π€3,6 = 1π =
π€π+1,2π+1π€π+1,2π + π€π+1,2π+1
π2 + π2 = 1
π
π
βπ π
(ππ+1,2π, ππ+1,2π+1)
ππ,πβπ,π
Scaled DC: π€π+1,2π + π€π+1,2π+1 ππ,π = π€π+1,2π ππ+1,2π + π€π+1,2π+1 ππ+1,2π+1
Tree specifies bi-partite graph
β’ Tree is a ruleβ’ At level πΏ, how to connect and color nodes to form a bi-partite graph
β’ At level πΏ β 1, how to reconnect low-pass nodes into a bi-partite graph
β’ Could be the way to generalize beyond 2-tap Haar using GSP
π3,0
π2,0
π3,1 π3,3 π3,6
β2,0 π2,1 π2,3
π1,0 β1,0 π1,1
π0,0 β0,0
Lowpass subband
Highpass subband
Feedforward Adaptive Arithmetic Coding
β’ Each final transformed data with the same weight is grouped in the same subband
β’ Typically we have about 1000 subbands
β’ Each subband is encoded using Arithmetic codingβ’ Probabilities according to Laplacian distribution
β’ Both encoder and decoder have to agree on the standard deviation π
β’ For each subband π, we send ππβ’ ππ are optimally quantized and encoded using Run-Length Golomb-
Rice
RAHT Results (1)
Santa Octopus
Comparison to Huang, Peng, Kuo,and Gopi, βA generic scheme forprogressive point cloud coding,βIEEE Trans. Vis. Comput. Graph., 2008
Results RAHT (2) Comparison to Zhang, Florencio, and Loop, βPoint cloud attribute compression with graph transform,β ICIP 2014
Results RAHT (3)
Andrew Phil
Ricardo Sarah
Comparison to Zhang, Florencio, and Loop, βPoint cloud attribute compression with graph transform,β ICIP 2014
Compression of Dynamic Point Clouds
Dorina Thanou, Philip A. Chou, and Pascal Frossard, Graph-Based Motion Estimation and Compensation for Dynamic 3D Point Cloud Compression, IEEE Trans. Image Processing, Apr. 2016
Ricardo L. de Queiroz and Philip A. Chou, Motion-Compensated Compression of Dynamic Voxelized Point Clouds, IEEE Trans. Image Processing, submitted 2016
RAHT 2.6 bpv MCIC 2.6 bpv(Intra coded) (Block motion comp)
Sparse motion estimation;Dense (interpolated) motion compensation
Rough sense of overall bitrates
β’ 2 Gbps: barely coded (holoportation demo) β 1 Mpixel color
β’ 450 Mbps: Intra-frame only, octree for geometry, uncoded 8:1:1 YUV
β’ 150 Mbps: Intra-frame only, octree for geometry, RAHT for YUV
β’ 45 Mbps: Inter-frame, hybrid (real time)
β’ 15 Mbps: poorly-coded mesh + H264-coded texture (siggraph paper, not real time)
β’ 8 Mbps: well-coded mesh + H265-coded texture
β’ 6 Mbps: well-coded mesh + periodic texture refresh
Future Issues
β’ Better motion estimation and compensation for point clouds
β’ Perceptually relevant distortion measures
β’ Hybrid mesh / point cloud coding β best of both
β’ Hybrid AR/VR coding β outside-in + inside-out
β’ Scalable codingβ’ Spatial as well as temporal random accessβ’ Object scalability, spatial resolution scalability, signal resolution (SNR) scalability
β’ Rate-distortion optimization
β’ Error resilience
β’ Rate control, buffer management
β’ Streaming architectureβ’ Low-end vs high-end devices and rendering location
Conclusion
β’ VR/AR are 3G immersive comm, 4G platform, to be ~$100 billion / year
β’ VR is closer in deployment and technology (similar to video coding); AR is later but may subsume VR (will need new coding tools)
β’ GSP provides an extension of classic tools necessary to process AR signals
β’ AR coding analogous to video coding circa 1980: coding paradigm not commonly agreed upon. History of video coding gives us a roadmap for development of AR coding.
ThanksSpecial thanks go to my interns Ha Nguyen, Dorina Thanou, Amir Anis, and Eduardo Pavel Carvelli; visiting researcher Ricardo de Queiroz; colleagues Dinei Florencio, Cha Zhang, Charles Loop, Qin Cai, and the I3D group; and university collaborators Pascal Frossard, Antonio Ortega, Gene Cheung, and Minh Do.