Post on 14-Dec-2015
View-Casting: View-based Stream Dissemination and Control for Multi-
Party 3D Tele-immersive Environments
Klara Nahrstedt University of Illinois at Urbana-Champaign
Joint work with UC Berkeley: Prof. Ruzena Bajcsy, Dr. Gregorij Kurillov,
Dr. Sang-Jung Hack (3D Video) Illinois: Dr. Zhenyu Yang, Dr. Bin Yu, Dr. Jin Liang, Dr. Yi
Cui , Wanmin Wu (QoS Management), NAIST Japan: Dr. Morihiko Tamai, Prof. Keiichi Yasumoto
(Wii Interfaces),UC Berkeley/Illinois: Lisa Wymore, Renata Sheppard
(Dance Choreography)
Outline
Motivation and Goals Challenges TEEVE Solution Details
View Concept View Casting: view selection, forest-casting, BW
allocation, delay-sensitive streaming Experimental Results
Conclusion
Tele-immersive Environment
site 1
site 2
site 3
Photo courtesy of Prof. Ruzena Bajcsy.
Tele-Immersive Environment (TI) Abstraction
TI Application Model
3D capturing3D camera
transmission
1
5
7
3
8
4
6 2
3D rendering
user view
1
5
7
3
8
4
6 2
3D Camera Characteristics and their Challenges
for Internet and Multimedia Systems
3D Camera RGB and Depth yield 5 Bytes per pixel Frame Size: 640x480 pixels Real-Time Capturing and Content
Creation (Currently 10-15 fps, Desired 30-60 fps)
3D Camera needs one PC Need FireWire Bus for Cameras to get
high throughput 3D Multi-camera Environment
10-20 3D Cameras per Room Camera Spacing Issue
Different spacing for physical activity than collaborative conferencing
180 or 300 or 360 degree camera coverage
Data Model 3D frame for camera i at time
t: fi,t Each pixel in the frame carries
color+depth data and can be independently rendered
Stream for camera i VSi = { fi,t1 fi,t2 … }
Macro-frame Ft = { f1,t f2,t … fn,t }
TI stream TIsite
Camera Orientation Oi
…
…
1 n
f1,t1 fn,t1Ft1
…f1,t2 fn,t2Ft2
VS1 VSn
TIUIUC
Ft1
Ft2
Ft3VS1
VS2 VS3
Implications: 4D Content Challenge
Deal with Huge TI Frame Size Ft
Macro-frame (e.g., 10 3D Frames in one time instance) TI Frame rate
Applications desire from 15-60 Macro-frames per second Huge bandwidth uncompressed
One 3D stream Si: 640x480 (pixels/frame) x 5 (bytes/pixel) x15 (frames/sec) = 23,040,000 bytes/second (approx. 23MBytes/sec) = 184 Mbps
One TI stream TIroom (e.g., 10 3D Streams): 230,040,000 bytes/sec (approx. 230MBytes/sec = 1.84 Gpbs)
If we increase frame-rate, explosion of bandwidth !!!!! Slow 3D Reconstruction, 3D Modeling, 3D Rendering
71ms per frame in 3D reconstruction, … Need 3D Compression Need Parallel Processing
D CG service gateway display camera
C
C
C
G
D
D
C
C
C
G
D
GD
D
G
D
CG
D
UIUC
UC Berkeley
Multi-Party 3D Tele-Immersive System
Network Characteristics and their Challenges on Internet and Multimedia Systems
Heterogeneous networks/mismatch in bandwidth (LAN/WAN) LANs with 100 Mbps Ethernet cards and Gigabit Ethernet
switch WAN Internet2 network with a 100 Mbps bottleneck link
between UIUC and UC Berkeley Varying Internet2 bandwidth depending on the time
of the day TCP/IP Protocol Stack due to large packet sizes,
hence smaller context switching overhead No Real-time/no QoS capabilities in Internet2 or OS
running TCP/IP stack
Implications: Timing Challenges
InternetWAN
LAN
LAN
1
2
N2
N
1
Tsend(i)
FiNeed Multi-stream Coordination • Synchronization of multiple stream • Synchrony at frame capturing time• Spacing at Fj’s sending time • Jitter control at Fj’s receiving time• Skew control at rendering/display time
Deal with • limited buffering at IP routers• TCP/IP retransmission delays• different end-to-end (LAN/WAN) delays
Implications: Distributed Operating System Challenges
Deal withLarge scale of cameras/displays
Camera/Display PCs are distributed over LAN and WAN networks PCs are multi-core
Real-Time Processing (3D Reconstruction, Rendering) Consider between 5-15 3D streams with frame rate of 10 fps and render them
into one 4D stream with 10 fps Run rendering station at 50-150 fps with 640x480 pixels per frame
Need Configuration/coordination of various 3D/4D services in a flexible and
effective manner Software synchronization protocols with approximate hardware accuracy Portable and easy deployable distributed TI/OS services Low maintenance OS services Real-Time OS services on multi-core architectures
Implications: Networking ChallengesDeal with
Large Bandwidth Demand One 3D stream: 23MBytes/sec One TI (e.g., 10 3D Streams): 230MBytes/sec)
Real-Time Communication for Interactivity 80-100 millisecond interactive delay
NeedMulti-stream Coordination
due to large bursts at routers, need packet spacing Smart Delay/Bandwidth Management over Best Effort InternetSynchronization of Multiple Streams
need to send all streams within a constrained time interval so that rendered frame can be done in timely fashion
Dealing with Scale multi-party / many-to-many multicast forest protocols (camera arrays,
participants, microphone arrays, …)
TEEVE Architecture
Network Model
D C3D display 3D camera
C
C
C
D
D
C
C
C
D
D
D
C
D
Internet
D
CC
SG
SGSG
SG
SG
SG service gateway
sessioncontroller
TEEVE Service Middleware
Stream Selection
Bandwidth Allocation/View-Casting
Streaming Control
Service Middleware
Networking
Tele-immersive Application
1
…
N
…f1,tm fN,tmFtm
3D macro-frame
Service Middleware Layer
…f1,t1 fN,t1
3D renderer
Networking Layer (Internet2)
…f1,t2 fN,t2Ft2
…f1,t1 fN,t1Ft1
……f1,t2 fN,t2
…f1,tm fN,tm
…
3D stream
O1 ON
Ou
Data Input and Output
user viewcamera view
network
Stream
ing con
trol
frame rate feedback
Ban
dw
idth
allocation
View-Cast
Bandwidth Allocation
Streaming Control
Streaming Control
Renderer
bandwidth estimation
Stream Selection
Stream
selection
bandwidth allocation scheme
selected streams
Semantic Multi-Camera Stream Management
View-based Forest Dissemination and Control
Multi-Camera Stream Selection (1) Determine which streams are of interest given the
current user view
The color portion of a macro-frame
(Photo courtesy of Sang-Hack Jung.)
Multi-Camera Stream Selection (2)
View-based approach Free view selection is a key feature of 3D
video User selects view Reasonable to use the user view as the major
QoS control feedback Adaptation based on the contribution of each
3D stream to the current user view Metric: Contribution factor based on the
angle between the camera and the user view
View-aware Stream Differentiation (3)
3D capturing
1
5
7
3
8
4
6 2
3D camera
transmission
1
5
7
3
8
4
6 2
3D rendering
user viewstreams contributingmore to user view
less important streams
user viewing
direction
cameradirection
source: http://zing.ncsl.nist.gov/~gseidman/vrml/
Angle θ
Stream Selection (4)
user viewing
direction
cameradirection
source: http://zing.ncsl.nist.gov/~gseidman/vrml/
Angle θ
View Model
s.fov
v.w
View-aware Stream Differentiation
Differentiation function
df(s, v) = s.fov · v.w (dot product of unit vectors)
When s.fov and v.w are unit vectors, the dot product gives cos where is the angle between s.fov and v.w
Therefore, if s.fov and v.w are close, df(s, v) is close to 1. Otherwise, df(s, v) goes to -1.
Select streams with df > T (T being threshold)
of – optimization function for selected view across multiple parties of(S’,v) = sS’ df(s,v)
Differentiating Factor Metric (df)
3D capturing
1
5
7
3
8
4
6 2
3D camera
transmission
1
5
7
3
8
4
6 2
3D rendering
User/node’s view request
v.w
streams contributingmore to user view
s.fov for S4s.fov for S5
less important streams
df(S4,v) = s.fov (S4)*v.w df(S5,v) = s.fov(S5)*v.wof(S,v) = df(S4,v)+df(S5,v)
ViewCast (VC) – Problem Formulation
Minimum Quality Problem Satisfy vi V of(Ri , vi)
Subject to system constraints NP-hard
Maximum Quality Problem Maximize i of(Ri,vi)
Subject to system constraints NP-hard
VC – General Guideline
Minimum quality guarantee: each renderer
should receive a minimum set of streams if
possible
View change resilience: when user viewis changed, the impact on other affected nodes should be minimized
V2
V3
V4
V1
U2.wU3.w
U4.w
S 4, S
5
S2, S3
S 2, S
3
U2
sessioncontroller
U3
S4
U4
S7, S8
U3.w
victim
Why view change is a problem?Research: On Going
streamselection
bandwidthallocation
VS1 fs A1
VS2
VSi fs Ai'
fs Am
…
VSj
…
…VSN
TFS
quality
Bandwidth Allocation (1)
Bandwidth Allocation (2)
Frame size allocation TFS : target macro-frame size estimated by the
underlying streaming control Given macro-frame Ft and TFS, allocate the
frame size (Ai) for each frame (fi,t) Priority scheme based on CF
Bandwidth Allocation (3)
Frame size allocation Ai is allocated in descending order of CF fs: frame size of 3D frame
If
else
Streaming Control (1)
Basic end-to-end protocol sender
receives bandwidth allocation generates token = ((VS1 , A1), (VS2 , A2), …, (VSN , AN)) passes token along camera clusters
receiver maintains periodic timer for rendering at fixed interval monitors the deadline of each macro-frame when the last frame of current macro-frame received at
T calculate the difference between T and deadline send the difference back to sender
1
2
3
4
5
Network
1
2
3
4
5 deadline
Tmacro-frame(after adaptation)
streamingcontrol
streamingcontrol e(t) = deadline T
bandwidthallocation
token
10
( ) ( )t
t p ij
TFS k e t k e j
Experimental Examples of Tele-immersive Environments
TI in UIUC (Prof. Nahrstedt) TI in UC Berkeley (Prof. Bajcsy)
Internet 2
Experimental Setup (TEEVE) Metrics:
Overall throughput of macro-frame Fj
Completion time interval for macro-frame Fj
Individual throughput of stream i End-to-end Delay of a macro-frame Fj
Experimental parameters of remote testbed Number of sender gateways 2 Number of receiver gateways 2 Number of 3D streams 12 UIUC-UCB
Equipment Dell precision 450 (Dual Xeon processor with 1 GBytes memory),
running Fedora Core 2 LAN 100 Mbps and Internet 2 between UCB and UIUC
Timing Performance Validation
Macro-Frame Delay (End-to-end Delay – UIUC/UCB)
Macro-frame Completion Interval at the receiver side
Sender rate of each cluster camera – 5-8fps
Continuity Test
average: 70.99 ms(frame rate: 14 Hz)
Performance of 3D Reconstruction
Resolution of each 3D frame is 320x240 pixels
Bandwidth Performance between UIUC/UCB on Internet2 during Dance Experiment
average: 44.5 Mbps
Evaluation
TFS reduces
Dancer’s Perceptual Evaluation
Experiment SetupSimulated 3DTI session
topology mesh (connected)
# of vertices (n) 5, 6,…, 10
# of edges (CR) 25%, 50%, 75%, 100% (n(n-1))
degree bound (Ii,Oi) 12, 24, 36
streams/vertex (|Si|) 8
max. streams/view 4
simulation time 200 minutes
view change interval normal distribution ( = 60 s, = 10 s)
view change pattern (a) random walk with average degree
change of 30, (b) Zipf (n = 10) delay bound (B) 200 ms
edge delay normal distribution ( = 80 ms, = 20 ms)
Experiment Setup
Counterpart for comparison, MC–n A multicast-based scheme where n
streams (n =1, 2, 3, or 4) must be retrieved to serve a view. Otherwise, the view request is rejected.
No preemption.
Rejection Ratio
CR = 100%, Ii (Oi) = 12
MC–4
MC–3
MC–2
MC–1
VC
Rejection Ratio
CR = 100%, Ii (Oi) = 24
MC–4
MC–3
MC–2
MC–1
VC
Streams/View
CR = 100%, Ii (Oi) = 24
average 3.2 better than MC–3 performance but with 22% less rejection ratio
Conclusion
Question: Shall we dance …. across thousands of miles?
Answer: We can waltz, but not tango yet!!!
References
Z. Yang, K. Nahrstedt, Y. Cui, B. Yu, J. Liang, S-J. Hack, R. Bajcsy “ TEEVE: The Next Generation Architecture for Tele-immersive Environments”, IEEE International Symposium on Multimedia (ISM 2005), Irvine, CA, 2005
Z. Yang, B. Yu, K. Nahrstedt, R. Bajcsy “ A Multi-stream Adaptation Framework for Bandwidth Management in 3D Tele-Immersion”, ACM International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2006), New York, NY, 2006
M. Tamai, W. Wu, K. Nahrstedt, K. Yasumoto, “A View Control Interface for 3D Tele-Immersive Environments”, IEEE International Conference on Multimedia and Expo (ICME 2008), Hannover, Germany, 2008
Acknowledgement
This work is being continuously supported by the National Science Foundation
Streaming Control (2)
Bandwidth estimation Lots of schemes proposed (e.g., work by Dovrolis) Bandwidth estimation based on PI-controller
(deadline T ) is used as feedback e(t) in PI-controller to derive the suitable target frame size (i.e., TFS ).
Program Flow
1. Initializing data structures, PTG cameras, sockets2. Establishing connections with trigger and gateway3. Grabbing background images4. Main grabbing and processing loop (new code using OpenMP) 4.1. Waiting for trigger 4.2 Grabbing images 4.3. Pre-process: median filter, background subtraction (new code using OpenCV functions, e.g., cvBacSub) 4.4 Computing depth (new code computing triangulation) 4.5 Building data packets and send to gateway