Synchronized Multi-Stream Transport of Stereoscopic … · Synchronized Multi-Stream Transport of...

Master Thesis WS 2011/2012Telecommunications Lab

SynchronizedMulti-Stream Transport of

Stereoscopic HDTV

Hanjo Viets

Submitted: 19th Mar 2012Reviewer: Prof. Dr.-Ing. Thorsten Herfet

2nd Reviewer: Prof. Dr. Antonio KrügerAdvisors: Manuel Gorius

Goran Petrovic

Saarland University

Faculty of Natural Sciences and Technology I

Department of Computer Science

Universität des Saarlandes

Postfach 15 11 50, 66041 Saarbrücken

UNIVERSITÄT DES SAARLANDES

Lehrstuhl für Nachrichtentechnik

FR Informatik

Prof. Dr. Th. Herfet

Universität des Saarlandes Campus Saarbrücken C6 3, 10. OG 66123 Saarbrücken

Telefon (0681) 302-6541 Telefax (0681) 302-6542

www.nt.uni-saarland.de

Master’s Thesis for Hanjo Viets

Synchronized Multi-Stream Transport of Stereoscopic HDTV

3D-HDTV is considered to be the next milestone in the evolution of digital video transmission and might only be the beginning of the era of multi-view video services. Therefore, the efficient transport and delivery of such media services over Internet Protocol is an active research topic. As a part of the ITC3D1 project, an efficient delivery and rendering system for 3D-HDTV is being designed and developed.

Flexibility is desirable in transport and delivery strategies for stereoscopic IPTV in order to keep the service distribution scalable. For instance, bottlenecks in the network bandwidth might require the service to be downgraded to a single view. On the other hand, it depends on the userʼs preference whether content should be received in 3D. Obviously there is a strong requirement for seamless coexistence of 2D and 3D services.

In order to benefit from the efficiency of IP multicast, the ITC3D project considers the independent transport of both components of a stereoscopic video stream as well as their potential delivery over multiple network routes. Since route diversity leads to different propagation delays, proper synchronization is required at the receiver.

The scope of this master's thesis is the design of a synchronized and backward-compatible 3D video transport scheme. In particular, the tasks to be solved are the following:

• Evaluate avaliable network synchronization protocols with respect to their applicability to synchronized streaming servers.

• Design and implement a suitable synchronization architecture for component-wise streaming and reception of stereoscopic video streams over multiple network routes.

• Design and implement a transport and multiplexing scheme for the backward-compatible delivery of stereoscopic video.

• Demonstrate the solution in a suitable environment. The video transport scheme should rely on the Predictably Reliable Real-time Transport Protocol (PRRT2).

Tutor: Supervisor:

Manuel Gorius, Goran Petrovic Prof. Dr.-Ing. Th. Herfet

1 http://www.intel-vci.uni-saarland.de/en/projects/internet-transport-and-coding-for-3d-hd-tv.html 2 http://www.nt.uni-saarland.de/projects/prrt

—

–

II

Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.

Statement in Lieu of an Oath

I hereby confirm that I have written this thesis on my own and that I have not used any other media or materials than the ones referred to in this thesis.

Einverständniserklärung Ich bin damit einverstanden, dass meine (bestandene) Arbeit in beiden Versionen in die Bibliothek der Informatik aufgenommen und damit veröffentlicht wird.

Declaration of Consent I agree to make both versions of my thesis (with a passing grade) accessible to the public by having them added to the library of the Computer Science Department. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)

III

Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass die vorliegende Arbeit mit der elektronischen Version übereinstimmt.

Statement in Lieu of an Oath

I hereby confirm the congruence of the contents of the printed data and the electronic version of the thesis. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)

IV

Abstract

Current technologies for IP based 3D-TV are neither using media-aware transport proto-

cols that implement error correction and timely delivery nor are they backward-compatible

in an efficient manner. Therefore a protocol is needed that on the one hand provides Pre-

dictable Reliability under Predictable Delay (PRPD) and on the other hand is backward

compatible in a way that bandwidth is not wasted. To achieve this, multicast is needed and

server and client synchronization is essential. This Thesis presents a solution where each

view is sent by its own server to serve geographic demand in mesh networking. PRRT is

the network protocol providing PRPD and multicast. The video codec MVC provides good

compression and backward compatibility and MPEG2-TS deals with synchronizing the

receivers where NTP is used for synchronizing the server clocks. Base view and enhance-

ment view are transmitted from different servers, using different routes and remultiplexed

at the receiver side to a stereoscopic 3D video, without loss of resolution.

V

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Basic Concept 7

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Essential Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Network Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Proposed Method 28

3.1 Transport Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Application Layer: Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Application Layer: Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 TS analysis & streaming . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.3 Two Server TS streaming . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.4 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.5 MVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.6 MVC-Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Experimental Evaluation 42

4.1 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 TS Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Two Server TS Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 MVC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.5 Nonuniform Round Trip Times . . . . . . . . . . . . . . . . . . . . . . . 52

5 Conclusion 58

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

VI

Appendix 60

A References 60

B List of Figures 63

C List of Tables 64

D List of Listings 64

E CD 65

VII

1 Introduction

Three dimensional television is television which is enhanced by one or more additional

views to create an illusion of depth to the viewer. To create this illusion, multi-view video

is used to deliver the same scene from two or more view perspectives. Three dimensional

television can be transported using the same channels as the classical television. One

way are the broadcast media DVB-T, DVB-C or DVB-S which are digital transmission

methods for terrestrial, cable or satellite communication. Another way is using the Inter-

net. On the Internet the predominant transmission mode is unicast, but multicast is also

used. Television on IP (IPTV) uses multicast streaming. Receivers need to subscribe to

a channel, that means a constant push data transmission of video data from a server to

its subscribers.

This Thesis is evaluating the possibilities to create a 2D-backward-compatible 3D-Tele-

vision service on IP based Networks using a protocol that offers Predictable Reliability

under Predictable Delay (PRPD) and implements error correction to avoid unnecessary

bandwidth consumption or quality loss introduced because of lossy network paths such

as a wireless connection at the receiver’s side. Backward-compatibility is a tool to save

bandwidth in the way that the base view only has to be sent once to all receivers –

meaning 2D and 3D receivers. In the case of a 3D receiver only the enhancement view

has to be sent additionally. After evaluating available protocols and techniques the most

promising setup is implemented and evaluated in detail. The goal is to set up a multi-

cast distribution network where each receiver can subscribe to either one or two camera

perspectives, depending on what it can process (one for 2D, two for plano-stereoscopic1

3D). Each server is delivering a single view using multicast. To achieve synchronicity,

it is necessary that all servers share a common time among each other. NTP and PTP

(IEEE 1588) are two possible solutions to this problem. The MVC standard can be used

to achieve a higher compression efficiency compared to independently encoding the views.

1In Plano-Sterescopic 3D two cameras in the distance of the human’s eyes capture a scene twice tocreate the illusion of depth for the viewer. The viewpoint is static, the viewer cannot decide to watchthe scene from another angle. From now on this is just referred to as “stereo”.

1

A solution that transmits the two views of stereo television separately has multiple

advantages. One advantage is, that it is possible to save bandwidth for those users that

cannot or do not want to watch 3D television by omiting the enhancement view, without

having to create two versions of the same television channel, one in 2D and one in 3D.

Another advantage is that it is possible for providers to create a flexible billing system for

2D and 3D television, where it is easy to upgrade from 2D to 3D. For the infrastructure

of IPTV it is beneficial, because it can better adapt to serve geographic demand in mesh

networking.

1.1 Background

The following list briefly describes some protocols and techniques that are currently

used in 3D video transmission and are important references for better understanding.

This list is not exhaustive and the protocols and techniques are not described in full

detail, but it should help in following the author’s thoughts.

Forward Error Correction (FEC)

Forward Error Correction is used in DVB-T, -C and -S. It is used, because it allows

receivers to correct errors that occurred during the transmission. As a retransmis-

sion of corrupted data packets is not feasible in broadcasting, FEC is a way to

accommodate for transmission errors. Also for the suggested approach it is use-

ful, as multicast transmissions profit from FEC in the same way that broadcast

transmission do.

Automatic Repeat reQuest (ARQ)

ARQ is an error correction mechanism which is used to retransmit lost or corrupted

data, either by sending a positive or a negative acknowledgement. In contrast to

FEC, it usually does not make sense to use ARQ on broadcast or multicast trans-

mission, because the number of ACKs or NACKs and the required retransmissions

would exceed the servers resources. This of course depends on the size of the mul-

ticast group. In a small multicast group with only a few receivers and a network

with very little packet loss or corruption the efficiency of ARQ can be better than

FEC.

2

Transmission Control Protocol (TCP)

TCP is one of the two most used protocols on IP. It is connection-oriented and

reliable and guarantees ordered delivery. Currently it is used for most of the video

transmission on the internet, because the reliable character of it is ideal for video-on-

demand websites like YouTube or movie stores like iTunes that have to serve users

within a heterogeneous landscape in terms of Internet access speed and quality. It

is ideal to use when on-time delivery is not important.

User Datagram Protocol (UDP)

UDP is the other of the two most used protocols on IP. It is connectionless and un-

reliable and does not guarantee ordered delivery. It allows multicast and broadcast.

Currently it is used for IPTV that is offered by some Internet providers for their

users or for transmission of important events on the Internet such as the World

Cup. UDP can be used for on-time delivery, but once again it is not reliable.

Stream Control Transmission Protocol (SCTP)

SCTP is a connection-oriented, reliable transport protocol with sequenced delivery.

SCTP can aggregate different paths. Unlike TCP it allows the sequenced delivery

to be discontinued, e.g. when a packet is lost. Then the delivered packets will

not be delayed until the replacement has arrived. This is useful for Voice over IP,

which it has been designed for. SCTP is currently not used in any notable video

transmission technique on the Internet. SCTP is listed for the sake of completeness.

Datagram Congestion Control Protocol (DCCP)

DCCP on the other hand is (like UDP) a real-time datagram service, but it is

connection-oriented and adds congestion control. Unlike SCTP it does not offer

sequenced delivery. DCCP is currently also not used in any notable video trans-

mission technique on the Internet. DCCP is listed for the sake of completeness.

MPEG2 Transport Stream (TS)

The MPEG2 Transport Stream is defined in [13]. It is used to encapsulate audio and

video and other services important to TV into a constant stream of data consisting

of packets of equal size (188 bytes) to allow efficient transmission. Its timestamps

are important for clock synchronization and playback speed. It is used in DVB-T,

3

-C and -S as well as in IPTV. MPEG2-TS is used in the implementation of this

Thesis.

Real-Time Transport Protocol (RTP)

RTP is a packet format that is also used to encapsulate audio and video and other

services important to TV into a constant stream of data. It offers sequence numbers

to restore a total order and timestamps. In addition it can provide some additional

services such as monitoring Quality of Service (QoS) with the help of the RTP

Control Protocol (RTCP). It is sometimes used in IPTV, but it is not used in this

Thesis, as the features that it offers are not necessary or are implemented in other

protocols used already.

As the effects of jitter and disorder that are common on IP networks do not exist

on DVB-T, -C or -S, RTP is not used there.

Advanced Video Coding (AVC)

AVC is a standard for video compression. As of today, it is one of the most used

video codecs. It uses independently decodable frames (I-frames), frames that require

previous frames (P-frames) and frames that require previous and next frames (B-

frames) for decoding. AVC is often used for IPTV and video-on-demand and also

for many mobile devices and in DVB. Many graphic cards have hardware support

for decoding AVC. AVC is used in this work due to compatibility reasons.

Multiview Video Coding (MVC)

MVC is an addition to AVC which allows multiview (3D) video to be encoded

more efficiently. The views are stored inside the same stream. The frames of

the enhancement views can also depend on frames of different views which saves

bandwidth, because nearby views are generally alike. This is explained in more

detail in Section 2.2.2.2. Next to AVC, MVC is used in the implementation to

transmit multi-view video.

Network Abstraction Layer (NAL-) Unit

A NAL-Unit is the smallest unit inside an AVC/MVC bitstream. The NAL-Unit

header carries information to which view perspective the payload belongs to and

4

can therefore be used to decide on which channel it should be sent. This is explained

in more detail in Section 2.2.2.2.

Service Compatible Mode

In 3D video transmission service compatible mode means that the 2D component

of the video can be played by devices that do not support 3D playback. The 3D

enhancement view is coded such that it does not appear on a 2D player. Modern

3D-BluRay Movies use this technique. Service compatible mode is implemented in

MVC, which is used in this Thesis.

Frame Compatible Mode

In 3D video transmission frame compatible mode means that the 3D video is

“squeezed” into a 2D video. The two view perspectives are transmitted in a sin-

gle 2D video stream where the two videos are either side-by-side or top-bottom

or aligned by some other means. This method looses video resolution and is used

for 3D broadcast television, because of the possibility to use existing transmission

protocols. However this work uses service compatible mode.

1.2 State of the Art

At the time of writing, not many 3D IPTV services have been commercially available.

A big provider of IPTV is the Deutsche Telekom GmbH (German Telecom) which

offers its internet landline customers a payed IP television service called “Entertain”2,

given that their bandwidth is large enough. Entertain uses DVB-IPI with RTP on multi-

cast for public channels3 ; private channels are transmitted using a proprietary protocol.

Since august 2011 a 3D channel is also available4. As the Telekom is internet and service

provider at the same time, it can prioritize the multimedia data on the internet connec-

tion. This is done by creating two VLANs, one for Entertain and one for everything else5.

2more information at http://www.telekom.de/entertain3from http://www.ard-digital.de/Empfang--Technik/IPTV/Software-Download/Software-Download

4see http://www.netzwelt.de/news/87712-entertain-telekom-startet-3d-tv-ab-august.html5described in http://www2.lancom.de/kb.nsf/1275/637DAC48142CA0F8C1257646004D6848?OpenDocument

5

http://www.telekom.de/entertain

http://www.ard-digital.de/Empfang--Technik/IPTV/Software-Download/Software-Download

http://www.ard-digital.de/Empfang--Technik/IPTV/Software-Download/Software-Download

http://www.netzwelt.de/news/87712-entertain-telekom-startet-3d-tv-ab-august.html

http://www2.lancom.de/kb.nsf/1275/637DAC48142CA0F8C1257646004D6848?OpenDocument

http://www2.lancom.de/kb.nsf/1275/637DAC48142CA0F8C1257646004D6848?OpenDocument

As described in 6 and 7 Japan’s national public broadcasting organization “Japan

Broadcasting Corporation” NHK (Nippon Hoso Kyokai) has setup a hybrid broadcasting

solution, where the primary 2D view is transmitted using classical broadcast technologies

like DVB-C or DVB-S and the enhancement view is transmitted using an IP connection.

The exact technology behind remains undocumented, however the reason to create this

solution for the NHK engineers was to be able to transmit the two views in full resolution

compared to the approach currently in use, where only half the resolution is available for

each eye.

6see http://www.boljour.de/2011/05/broadcast-und-internet-in-3d-vereint/7see http://techon.nikkeibp.co.jp/english/NEWS_EN/20110526/192128/

6

http://www.boljour.de/2011/05/broadcast-und-internet-in-3d-vereint/

http://techon.nikkeibp.co.jp/english/NEWS_EN/20110526/192128/

2 Basic Concept

In this section the main problem will be stated, the complete system architecture will

be presented and the underlying techniques for the network protocol, the multiplexing

and the synchronization will be explained and compared in detail.

2.1 Problem Statement

For various reasons explained in more detail in this section the currently available

approach to transmitting 3DTV on IP has some disadvantages. 3DTV can for example

be transmitted using existing IPTV techniques such as DVB-IPTV, which is a multicast

solution, or on YouTube, which is a unicast solution and provides video-on-demand.

There are two major problems that are not addressed: First the current solutions do not

really deal with backward compatibility; when frame compatible mode is used for example

the video can still be played by old receivers, but there are noteworthy constraints: either

the viewer has to live with a divided view when his receiver does not understand the

proper upscaling or when it does do the upscaling the resolution is diminished. Second it

either uses unicast TCP when HTTP is used which results in a whole lot of unnecessary

bandwidth usage or multicast UDP (often together with RTP, e.g. DVB-IPTV) which

results in problems when it comes to packet loss or corruption. An example for unicast

HTTP/TCP is YouTube as video hosting platform or iTunes as video-on-demand (VoD)

service. VoD services are due to their characteristics always unicast. An example for

multicast UDP is the Entertain IPTV service by Deutsche Telekom.

Therefore a new technique to transmit 3DTV on IP is needed which offers backward-

compatibility to devices that can only represent 2D video as well as transmission with

Predictable Reliability under Predictable Delay (PRPD). Predictable Delay means that

every data packet reaches its destination after some fixed amount of time with no or

negligible small jitter. Predictable Reliability means that at most a certain amount of

packets gets lost or cannot be transmitted on time. The aim of a protocol that implements

PRPD is to minimize the error rate under a given delay.

The technology to transmit 3DTV on IP has some noticeable drawbacks. The following

is typical for current 3D television on IP:

7

• Multiple views are transmitted on the same channel sent by a single server which

means that devices that can only render 2D video either waste bandwidth (when

service-compatible mode is used, e.g. in MVC) by having to receive the extended

view or are not being able to play the video at all.

• In cases where only a 2D transmission technique is available 3D video is transmitted

using the regular 2D video coding where two frames are “squeezed” into one, either

side-by-side or top-bottom (frame-compatible mode, e.g. in AVC). This not only

reduces the spatial resolution by 1/2, but 2D receivers can not really watch the

channel, at least not nicely [20]. Even when options are set that order the 2D

receiver to scale the base view up to the whole screen it is not guaranteed that

every receiver (especially old ones) have it implemented correctly.

• Unfortunately unicast is sometimes used even in cases where broadcast television

is transmitted or for video-on-demand or download services. Unicast transmission

has the advantage that it is possible to use reliable protocols such as TCP where

the data is guaranteed to reach its destination (as it is done in Internet TV over

HTTP), but one has to send the data n times for n receivers.

• In some scenarios multicast is already used (e.g. in IPTV) which allows to send the

data once for n receivers, however it is then not guaranteed that the data reaches

all receivers in the full extent. RTP often extends multicast transmission to provide

support for some additional features such as Quality of Service (QoS).

• TCP and UDP - the two major IP based protocols - have disadvantages that make

them hard to use for multimedia transport. Audiovisual applications have strong

constraints on reliability and delivery time. TCP uses positive acknowledgements

which allows for a reliable connection, but cannot provide a given delivery time

and makes multicast transmissions infeasible. In contrast, UDP allows for timely

delivery, but does not provide reliable connections, because it does not have ac-

knowledgements at all. [16]

8

• The Stream Control Transmission Protocol (SCTP) is a connection oriented reliable

transport protocol with sequenced delivery. It uses positive acknowledgements like

TCP and therefore does not support multicast. [16]

• DCCP on the other hand is (like UDP) a real-time datagram service (without

sequenced delivery), but it adds congestion control. However it also uses positive

acknowledgements which again makes multicast infeasible. [16]

To sum this up the main disadvantages of the current technologies are the ineffectual

backward-compatibility which requires unnecessary bandwidth consumption or affects

the spatial resolution as well as the use of non-media-aware transport protocols such as

TCP or pure UDP without any extension.

A possible solution to these disadvantages is to split the 3D-video into two parts, where

one is the base view only and the other is the enhancement view only. The two views

can then be sent via distinct multicast groups (preferably from distinct servers) to the

receivers and each receiver can decide (depending on his display and bandwidth capa-

bilities and personal preferences) to subscribe to the base view only or to the base and

enhancement view. In addition a protocol can be used that offers Predictable Reliability

under Predictable Delay (PRPD). Distinct Servers should be used for various reasons.

One reason is to distribute the load. Another reason is that it allows to position the

servers depending on the geological needs. In an area where no 3D receivers are located

but only 2D receivers it is not necessary to set up a secondary server for the enhancement

view.

Multiple problems exist when implementing this 2D-backward-compatible 3D-Tele-

vision service on IP based Networks . One problem is how to split the video stream into

two streams and restore the same total order at the receiver. Another problem is how to

synchronize the two servers such that they work absolutely isochronic. A third problem

is how to transmit the video to many receivers at once in a backward compatible and

media friendly way (i.e. low delay) while minimizing transmission cost. These problems

have not been answered in this context so far and will be solved in the following sections.

9

2.1.1 System Architecture

The System to be implemented consists of two servers. One server is the base server.

It is responsible for streaming the base view. The other server is an enhancement server

which is responsible for streaming the enhancement view. The servers are needed to

have the same local time, such that corresponding frames of the different views are sent

to the receivers at the same time. To sync the two servers, a common time server will

be considered. Multicast is used for the transmission because multicast allows multiple

receivers to connect without having to send the data multiple times. Each view has its

own multicast address, such that receivers can connect to the view(s) that they want.

In addition the system should be able to compensate for errors during transmission and

provide a predictable delay such that the two views are received contemporaneously, no

matter what the routes have been. The receivers run a service which receives either

one (the base stream only) or both (base and enhancement) streams and constructs a

standard MVC stream which can be displayed by any MVC or AVC capable player. Also

see Figure 2.1.

10

3D 2D3D

Common Time Source

BaseView

EnhancementView

PCR

PRRT PRRT

Figure 2.1: Architecture

2.2 Essential Techniques

As explained before, this section deals with the technical backgrounds of parts that will

be used in this thesis. Most of the information is extracted from international standards

or has been researched by others. The linkage of the tools in this section will be explained

in the next section.

The following techniques listed here and explained on the next pages are essential for

the realization of this project:

Network Protocol: For the transmission a protocol is needed that is capable of using

multicast and at the same time allows for error correction. This predictable reliabil-

11

ity needs to be achievable under predictable delay (PRPD). Section 2.2.1 explains

a protocol that has these features.

Multiplexing: For 3D video a code is needed that is backward compatible and easy to

demultiplex and remultiplex. Also a format for transmission is needed which takes

care of the challenges of media transmission. This is explained in Section 2.2.2.

Server Synchronization: As explained earlier it is also very important to keep the me-

dia servers synchronous, such that corresponding frames arrive at the receivers at

(almost) the same moment in time. For this Section 2.2.3 explains the necessary

tools.

2.2.1 Network Protocol

Media transport has very specific demands for the transport protocol. First it needs

to support multicast such that multiple receivers can be served at once. It also needs to

strive to minimize data loss (that all the data actually reaches the receivers) (reliability)

and also that the transmission delay is constant to a certain degree. From the presented

protocols, UDP is the only one that allows multicast, but it has no error correction.

2.2.1.1 PRRT

In [7] a protocol called Predictably Reliable Real-time Transport (PRRT) has been

developed that addresses the issues with media transport that TCP/UDP and SCTP/D-

CCP have8. It uses an Adaptive Hybrid Error Correction (AHEC) scheme that provides

Predictable Reliability under Predictable Delay (PRPD) and combines the advantages of

Forward Error Correction (FEC) and Automatic Repeat reQuest (ARQ).

More information and examples on this protocol can be found in [6–9, 15, 16].

What is needed for the protocol described is especially the predictable reliability which

is essential to provide a quasi error free (QEF) video playback. Predictable reliability

is therefore very important. For the timely reception of the video data also predictable

delay is needed, because all receivers are using a buffer to compensate for jitter, but the

8More information can be found at http://www.nt.uni-saarland.de/projects/prrt/

12

http://www.nt.uni-saarland.de/projects/prrt/

larger the buffer the longer a user has to wait when he tunes in or switches a channel.

Predictable delay gives an upper bound for the buffer.

PRRT fulfills these necessary constraints to provide an IP based multicast backward

compatible 3D transmission service.

2.2.1.1.1 Functional Description

PRRT is a protocol that efficiently supports multimedia services such as three dimen-

sional television, but also other services such as tele-presence. Its hybrid error correction

allows both proactive (FEC) and reactive (ARQ) reliability mechanisms. The architects

strived to create an optimal trade-off between those mechanisms while achieving the

desired delay constraint [8].

This is done, because pure retransmissions can only guarantee partial reliability when

a target delay is in effect. The number of retransmissions is limited by the round-trip

time (RTT). Predictable reliability can be achieved by using an optimal ratio of proactive

and reactive redundancy under the given time constraint [9].

Packet LossPacket-levelFEC Encoder

Packet-levelFEC Decoder

NACKParameter CalculationData

SourceDataSink

D target

k k

Figure 2.2: AHEC Architecture [8, p. 2]

Figure 2.2 shows the AHEC architecture that PRRT uses. k is the FEC information

length and Dtarget is the desired total transmission delay. The Figure also depicts that the

ARQ uses negative acknowledgements (NACKs) and that the FEC is adaptive. Lost or

erroneous data packets can be corrected by FEC independently of the connections round-

trip time (RTT) or by explicitly flagging the packets as lost using the ARQs NACKs.

13

In addition to the information length k the FEC code takes a parity distribution vector# »

Np as an argument. The parity packets are split into portions of size Np[w]. Each of these

parity portions are transmitted in their corresponding transmission cycle w.

# »

Np = (Np[0], Np[1], . . . , Np[r])

This vector defines “the portions of parity for r+1 cycles, where# »

Np[0] refers to the initial

transmission immediately after the k data packets. The portions for 1 ≤ w ≤ r are

sent upon receiving a negative acknowledgment (NACK) from the receiver, indicating

that additional parity is required to recover lost data packets. The properties of optimal

erasure codes ensure the recovery of all data packets as soon as arbitrary k packets from

the same block are available at the receiver.” ([8, p. 3]).

After the encoding the data and Np[0] parity packets are sent in advance. Note that

Np[0] can also be zero, in which case no parity packets are sent in the first round. Parity

packets for the upcoming rounds need to be stored at the sender in a local packet cache

depending onDtarget. The receiver can check for completeness by using a sequence number

gap detection. Received parity packets can be used to recover lost data; however if the

available parity is insufficient a NACK can be sent to request more parity, at most r

times. These NACK messages not only trigger more parity to be sent by the sender,

but are also used to adapt the AHEC parameters together with the RTT to better suit

the network conditions [8]. The packet erasure rate Pe, the number of receivers R in a

multicast group and the average source packet interval Ts (or data rate depending on the

packet size) are also used to define the parameters [9].

Experimental data of optimal values for different network transmissions can be seen in

Table 1 in [8, p. 4]. A key conclusion of those experiments however was, that the optimal

parameters are strongly RTT dependent.

2.2.2 Multiplexing

For this project there exist two noteworthy media types: Stereo video (also known as

3D video or 3DV) and free viewpoint video (FVV). Multi-view video coding (MVC) is

used for wrapping the 3D video stream.

14

2.2.2.1 3DV / FVV

3DV is considered to be 3D television these days. Two cameras in the distance of the

human’s eyes capture the scene to create the illusion of 3D for the viewer. However the

viewpoint is static, the viewer cannot decide to watch the scene from another angle. In

3DV there is one base view that is a common 2D video stream and an enhancement view

which is not only coded on a temporal base, but also on a view base. Instead of storing

two scenes (base and enhancement) it is also possible to store a scene (base) and a depth

or disparity map (enhancement). [1, 20, 21]

For the first solution the video for the left eye is compressed as it would be for con-

ventional 2D video. The frames in the video for the right eye then are compressed with

reference not only to preceding and succeeding frames, but also to the corresponding

frames of the video for the left eye, because they are very similar. For the second solu-

tion, a video plus depth format can be used. The video for the left eye is again compressed

as it would be for 2D video. Then an additional depth map is saved, that means another

video which is grayscale where dark means close and white means far away (or the other

way around). An advantage of this compared to the conventional way is that it is possi-

ble to alter the disparity at will. It can again be compressed using conventional codecs

[21, 23].

15

Figure 2.3: Top: Base and Enhancement View;Bottom: Base View and Depth Map9

Free viewpoint video on the other hand enables the viewer to choose from where he

wants to view the scene. Many cameras are needed to shoot a scene in FVV. This has

the advantage that the viewer can choose its viewpoint, either by choosing which two

streams (one for each eye) to decode, or if a device is capable of displaying more streams

by moving itself. In the first case it is problematic to decode views when the base view

is not included, because it is always needed for decoding any enhancement views. If the

enhancement views are too far away even the intermediate enhancement views are also

needed. Therefore in FVV it is not possible (without extra work) to receive a number

of arbitrary channels. When using depth maps for FVV one also has to consider that at

some point there are not enough picture information available to reconstruct the image.

Therefore when using depth maps, it might be necessary to insert additional enhancement

or base views as scenes. [1, 20, 21]

9Sources: http://www.flickr.com/photos/57605784@N06/5437070989/ and http://mangtrithuc.net/173566-Chuy%E1%BB%83n-t%E1%BB%AB-2D-sang-3D-:-nh%E1%BA%ADn-%C4%91%C3%A0o-t%E1%BA%A1o-v%C3%A0-chuy%E1%BB%83n-giao-c%C3%B4ng-ngh%E1%BB%87.html

16

http://www.flickr.com/photos/57605784@N06/5437070989/

http://mangtrithuc.net/173566-Chuy%E1%BB%83n-t%E1%BB%AB-2D-sang-3D-:-nh%E1%BA%ADn-%C4%91%C3%A0o-t%E1%BA%A1o-v%C3%A0-chuy%E1%BB%83n-giao-c%C3%B4ng-ngh%E1%BB%87.html



Figure 2.4: Free viewpoint video, interactive selection of virtual viewpoint and viewingdirection within practical limits [21, p. 319]

2.2.2.2 MVC / AVC

Multi-view video coding (MVC) is a way to encode “a set of N temporally synchro-

nized video streams” [21, p. 314]. 3DV means N = 2, where FVV has N > 2. MVC is a

new profile of the H.264/AVC standard. It not only uses interframe prediction, but also

interview prediction. It is backward compatible to the AVC base view; in other words a

device that does not understand MVC but AVC can play the (2D) base view. The other

views are called enhancement views. The media data in H.264/AVC is organized in so

called network abstraction layer (NAL) units. For the enhancement views in MVC a new

NAL unit type is used, which is just ignored by pure AVC decoders. The view (base or

enhancement) can be identified by the corresponding NAL unit header [1, 20]. The NAL

unit header information can therefore be used to demultiplex the MVC stream. Also see

Figure 2.5.

17

Figure 2.5: MVC NAL unit interface [20, p. 673]

A challenge when using FVV is the complex prediction structure used in MVC. When

3DV is used, frames from the base view only depend on other frames of the base view

and the frames from the one enhancement view depend on frames from the base view or

the enhancement view. This can be seen in Figure 2.6.

I b B b B b B b I

B b B b B b B b B

b B b

b B b

T 11T 10T 9T 8T 7T 6T 5T 4T 3T 2T 1T 0

S1

S0

Figure 2.6: Typical MVC prediction structure for 3DV [1, 24]

However when it comes to FVV the prediction structure is more complex and frames

from an enhancement view do not solely depend on frames from the base view and the

same enhancement view, but also on frames from other (multiple) enhancement views.

This can be seen in Figure 2.7.

18

I b B b B b B b I

B b B b B b B b B

P b B b B b B b P

B b B b B b B b B

P b B b B b B b P

B b B b B b B b B

P b B b B b B b P

P b B b B b B b P

b B b

b B b

b B b

b B b

b B b

b B b

b B b

b B b

T 11T 10T 9T 8T 7T 6T 5T 4T 3T 2T 1T 0

S7

S6

S5

S4

S3

S2

S1

S0

Figure 2.7: Typical MVC prediction structure for FVV [1, p. 3]

2.2.2.3 MPEG2-TS

MPEG-2 Transport Stream (TS) is the common transmission format for DVB systems.

It is not only used in DVB-T/-T2 (terrestrial), -C/-C2 (cable) and -S/-S2 (satellite)

television, but also in DVB-IPTV.

AnMPEG transport streammultiplexes multiple packetized elementary streams (PES).

This can for instance be a video and an audio stream, but also multiple of those. The

TS deals with synchronization in two ways: First it makes sure that corresponding audio

and video streams are played out concurrently to achieve “lip-sync” and second it makes

sure that the stream is played back at the same speed as the sender to avoid buffer over-

flow or underflow. Two types of timestamps are provided by an MPEG TS. One is the

presentation timestamp (PTS) which is used for synchronization among two PESs. It

defines the exact moment when an audio or video frame should be played. The other is

the decoding timestamp DTS. It is used to restore the original order inside a PES and

thus assures that the buffer does not fill up or run empty.

As the local clocks in sender and receiver are usually not running at the same speed,

it is necessary to sync them as well to avoid buffer over- or underruns. These drifts

are corrected by the program clock reference (PCR) which is also in the TS: In periodic

intervals a combination of a 90KHz (33 bits) and 27MHz (9 bits) resolved timestamp is

19

included in the stream. The 33 bits resolved in 90KHz are the base where the 9 bits

resolved in 27MHz extend the base. As the resolution is higher the extension allows for

a higher precision between two base timestamps. [12, 13]

In ISO/IEC 14496-10:2004 [14, Annex B] it is described how to code NAL-Units in a

bytestream:

The first NAL-Unit in the bytestream is preceded by a four byte sequence which is

0x00000001. Every other NAL-Unit is preceded by a three byte sequence which is

0x000001. In addition each NAL-Unit can be succeeded by one or more null bytes (0x00).

As explained before, the NAL-Unit header is one byte, but it can be three additional

octets for MVC NAL-Units. The first byte of the header consists of a zero bit (a bit

which is always zero), the next two bits indicate whether the data in the NAL-Unit is

used by upcoming frames or if it can be discarded after decoding. The five remaining bits

in the one byte NAL-Unit header is the NAL-Unit type. A value of 14 or 20 indicates

an extended header for MVC. That means that whenever a NAL-Unit header has one of

those two types that NAL-Unit can be ignored for decoding when only 2D-playback is

desired.

The resulting bytestream is a AVC/MVC Elementary Stream (ES). It needs to be

packetized and encapsulated once again into a Packetized Elementary Stream (PES) as

described in ISO/IEC 13818-1:2000 [13, Section 2.4.3.7]:

Sync & Header

Payload

Elementary Stream

Packetized Elementary Stream

Transport Stream

}NAL-Unit

Figure 2.8: NAL-Units in the Elementary Stream in the Packetized Elementary Stream in theTransport Stream

20

The header of a PES-packet begins also with a three byte sequence of 0x000001 which is

followed by a one byte stream id. The next two bytes define the packet length. Then there

can either be the data of the elementary stream or additional header values depending

on the stream id. See [13] for a complete list. The optional PES header can end with up

to 32 stuffing bytes with value 0xFF to meet the requirements of the channel (e.g. the

transport stream).

These PES packets can now be encapsulated into TS packets. The PES packet can

at most be 184 bytes, because the TS packet header is at least 4 bytes but can also

be longer and a TS packet is always exactly 188 bytes in length. This is described in

ISO/IEC 13818-1:2000 [13, Section 2.4.3.2]:

The header of a transport stream starts with a sync byte which is always 0x47. Then

three bits that indicate transport error, payload unit start and transport priority followed

by 13 bits that hold the PID. Then two bits control the transport scrambling and the

next two bits indicate if this transport stream packet contains an adaption field and/or

payload. The next four bits are a continuity counter, followed by the optional adaption

field and payload. If an adaption field is present the first byte of it contains the whole

length of the adaption field. It can also be succeeded by stuffing bytes with value 0xFF .

2.2.2.4 Analysis

The implementation is going to focus on stereo video (3DV) and the backward-com-

patibility to 2D video. It is necessary to go down in the protocol stack through the

Transport Stream and the Packetized Elementary Stream inside the Elementary Stream

to get to the NAL-Units. The NAL-Units inside the TS-Packets decide whether one TS-

Packet belongs to the base view and is sent over the primary server or belongs to the

enhancement view and is sent over the secondary server. A possible solution to bring the

TS-Packets sent over two different routes into the correct original total order is to add an

identifier (e.g. a sequence number) in front of every TS-Packet before transmitting them.

The same solution could also provide FVV and at the same time be compatible with

3DV and 2DV [19]. MPEG2-TS is going to be used to take care of the transportation on

PRRT/IP, especially for providing a clock for the receivers.

21

2.2.3 Synchronization

PRRT claims to be able to compensate for the different transmission delays; minor

delays in the transmission that PRRT does not handle can be compensated for by a

small local client buffer. Nevertheless it is necessary that all servers share a common

time, such that associated frames are being sent to the receivers at the same moment.

Of course the transmission delay to the receivers can be different for each receiver and

for each of the streams, but this cannot be addressed by the servers but needs to be

handled by the receivers, e.g. by using a buffer. This buffer can compensate for minor

differences that are caused by the transmission delays and also different server times.

However, if the server clocks are off by tens of seconds or even minutes the buffer has to

be correspondingly large. This would result in the fact that a customer has to wait that

amount of time when he first switches to a channel until the buffer is filled.

Deeths and Brunette [2] gave an example to illustrate this necessity: if one server’s

seconds are 0.04 milliseconds longer than another servers seconds the two clocks will

deviate by 21 minutes after one year. For an extended movie of 3 hours length the

difference would be 432ms or 11 frames (at 25fps). Therefore it is necessary that the

server clocks deviate as little as possible to minimize the size of the buffer and therefore

the delay for the user.

On the next pages two protocols that can be used for synchronizing the servers will be

explained. The first is the Network Time Protocol (NTP), the second is the Precision

Time Protocol (PTP) as specified in IEEE 1588.

2.2.3.1 NTP

The Network Time Protocol is one of the most popular time synchronization protocols

that is used in the Internet. It is important to note, that it has been designed for WANs.

It typically achieves accuracy down to a few tens of milliseconds (or less in a WAN) and

can be as good as sub-milliseconds in LANs. [2, 4]

Although there are servers that are directly connected to high-precision clocks it is not

exclusively a master-slave system. In NTP peers synchronize to the best clock, where

clients synchronize to a server (or to multiple servers). [2]

22

NTP is also helpful for the accuracy of the clock when the connection to the server is

disturbed. A constant drift can be corrected from the history. [2]


The offset is determined by considering the whole path from the server to the client

for the delay measurement assuming that both directions have the same delay. [22]

As it can be seen in Figure 2.9 the client initiates the communication in NTP. At time

T1 the client sends a request to the server which receives it at time T2. At time T3 the

server sends the two values T2 and T3 to the client which receives it at time T4. With

these four timestamps the client can now calculate the local clock offset and the round

trip delay. For this two intermediate values are calculated:

a = T2 − T1 and b = T4 − T3

The clock offset is then calculated by

O =a− b

2

and the round trip delay by

D = a+ b

θ0

T1 T4

T2 T3B

A

Figure 2.9: Measuring Delay and Offset [18, p. 3];A = Client, B = Server

23

Once again it is important to note that the accuracy is directly dependent on the as-

sumption that the delay is symmetrical in both directions. As the delay is usually not

symmetrical the uncertainty is ≤ D/2. [18, 22]

NTP is usually configured to use more than one server which is more robust, but to

achieve the best accuracy at multiple devices it is better to use a single server. NTP

daemons usually have a polling interval of 1024s or they are self adjusting between 64s

and 1024s. This can however also be larger or as small as 16s. [22]

2.2.3.2 PTP

The Precision Time Protocol as specified in IEEE 1588 is another time synchronization

protocol. It can correct clocks up to an accuracy of less than one microsecond [25].

However it is only designed for LANs, unlike NTP which is designed for WANs.

PTP has a strict Master-Slave approach. The best clock in the network is the Master,

all other clocks are slaves which synchronize to the Master; first the offset and then the

drift. [25]

One fact to consider when nano-second accuracy should be achieved is that the algo-

rithm expects the transit delay in both directions to be the same. Even in Ethernet LAN

this is not the case, because the wires in an Ethernet LAN cable do not all have the same

length. [3, 25]

Unlike NTP, PTP uses multicast and can synchronize multiple computers at once.

Therefore it is – technically – possible to use PTP not only on Ethernet, which it has

been designed for. [25]

As in NTP the offset determination can consider the whole path for the delay measure-

ment, but as PTP claims to achieve at least microsecond or even nanosecond accuracy

one has to consider that every hop (that includes hubs, switches, gateways, etc.) intro-

duces delay. A hub typically introduces network jitter in the area of 300-400ns. A switch

400ns plus 2-10µs processing time. One also has to consider that these devices have a

queue in which data packets are processed and a single very long packet can delay the

transmission by up to 122µs. Therefore PTP introduces the concept of Boundary Clocks.

A Boundary Clock can be included in any of the just mentioned devices, e.g. a switch.

24

It acts as a slave to the side of the network where the master is and as a master to the

side of the network where the slaves are. That means that it syncs to the master and all

slaves that are connected to the switch sync to the switch which is then their master and

not to the original master. This way the internal latencies of the device will not affect

the synchronization accuracy. This is also shown in Figure 2.10. [3, 25]

Figure 2.10: Network topology and clock types [25, p. 5]


In PTP timestamps can be taken close to the hardware, ideally directly in the network

card (if supported). This way PTP does not have to wait for the processor, as it can be

busy which would delay the process and diminish the accuracy. [3, 25]

As it can be seen in Figure 2.11 the server initiates the communication in PTP. At

time t1 the server sends a sync message to the client which receives it at time t2. In a

follow up message the server sends the previously acquired timestamp t1 to the client.

After reception the client sends a message at time t3 to the server which receives it at

time t4 and sends this timestamp to the client. With these four timestamps the client

25

can now calculate the local clock offset and the delay. For this two intermediate values

are calculated:

a = t2 − t1 and b = t4 − t3

The clock offset is then calculated by

O =a− b

2

and the delay by

D =a+ b

2

Once again it is important to note that the accuracy is directly dependent on the

assumption that the delay is symmetrical in both directions [3, 25].

Master Clock Time Slave Clock Time

Data atSlave Clock

Follow_Up messagecontaining value of t1

Delay_Resp messagecontaining value of t4

t1

t2

Sync message

Delay_Req message

t2

t1, t2

t3

t4

t1, t2, t3

t1, t2, t3, t4

t2m

t3m

time

Figure 2.11: Offset and Delay Measurement [3, p. 19]

26

In PTP the sync interval is a lot smaller than in NTP, namely only 2s by default.

[3, 25]

2.2.3.3 Analysis

Although PTP promises the better accuracy it is also more complex and more difficult

to setup. It has the huge advantage that it is not necessary to take the delay from the

server to the client as one large path, but it can sync the time on smaller hops. This

however is only possible with hardware support. With respect to the implementation the

accuracy of NTP seems to be good enough, because it promises to achieve an accuracy

down to a few tens of milliseconds or less in a WAN. Considering the frame rate of 25fps

meaning that two frames are 40ms apart already a very small buffer is able to compensate

for lag which is more than half the gap between two frames (20ms). Therefore in the

implementation experiments are going to show if NTP is good enough and if it is not

PTP can still be used to provide the desired accuracy.

In addition it is important that clients run at the same speed as the servers do (al-

though they don’t need to have the exact same time). Otherwise the client buffer could

become empty if the client’s clock runs faster or it fills up, if it is slower. Fortunately

the PCR in MPEG2-TS takes care of this already. The receiver uses the PCR values

to recover the senders system clock by putting the values through a phase-locked loop

(PLL). The Presentation Time Stamp (PTS) and Decoding Time Stamp (DTS) that are

in the elementary stream are aligned towards the original system clock recovered through

the PLL. For more details see [13, Annex D].

27

3 Proposed Method

The idea behind this work is to create a protocol which is able to provide 3D television

over an IP network while keeping the necessary bandwidth consumption as low as possible;

both in terms of redundant data transmission (i.e. using multicast instead of unicast) as

well as in backward compatibility (i.e. not having to transmit enhancement view(s) to

receivers that can only display 2D). The main problem of this is to keep the delay as low

as possible. To have a more effective load distribution in mesh networking such as the

Internet it is advantageous to distribute the views among different servers. This however

introduces new problems. It is necessary to synchronize the servers such that they begin

transmitting the views of the video at the same point in time and in addition they need

to maintain the same pace, such that corresponding frames from different views can be

received concurrently (with the limitation of different transmission delays due to different

network routes).

To achieve this the protocol consists of three components where one part depends on

the two others:

3.1 Transport Layer

Nowadays television is mostly transmitted using broadcast media like satellite, cable or

terrestrial. However television on IP is getting more and more important. 3D television

will be the next important step in broadcast television. For current 3D television on IP

it is typical that multiple views are transmitted on the same channel (service-compatible

mode, e.g. in MVC) sent by a single server. Another common method is to use only a

single video stream which carries two videos with half the original resolution, either side-

by-side or top-bottom (frame-compatible mode, e.g. in AVC). IPTV often uses multicast,

but sometimes also unicast; especially for fast channel switching within a multicast envi-

ronment. Video-on-demand or download services always use unicast. For IPTV UDP is

used in most cases; for video-on-demand or download services it is in almost all cases TCP.

28

To take a look at the protocol stacks that are usually used Figure 3.1 shows typical

protocol stacks for internet television where Figure 3.2 shows the actual protocol stacks

of some DVB implementations, especially DVB-IPTV.

Figure 3.1: Streaming protocol stacks [11, p. 698]

Figure 3.2: Digital TV delivery protocol stacks (DVB as an example for underlying layertechnologies) [20, p. 674]

In Section 2 some protocols have been discussed. TCP uses positive acknowledgements

to achieve reliable data transmission, but it is achieved using pure retransmissions only,

which is infeasible for many clients. In addition, it cannot provide a maximum delivery

time, due to its retransmission nature. Also it does not support multicast. UDP on the

other hand allows multicast and also provides timely delivers, but it has no error correc-

tion mechanism. Lost or corrupted data cannot be recovered as neither acknowledgements

29

nor parity is available. SCTP has the same set of problems as TCP: It also uses positive

acknowledgements and therefore does not support multicast. In contrast DCCP does not

provide sequenced delivery (as TCP and SCTP do) and is therefore a real-time protocol.

However it still sticks to positive acknowledgements which again makes multicast infea-

sible. As these protocols all have drawbacks for multimedia transport, another protocol

has been evaluated:

As explained in detail in Section 2.2.1 PRRT is a network protocol that allows multi-

cast and guarantees Predictable Reliability under Predictable Delay (PRPD). PRRT is

at the transport layer level and transmits all data between the servers and the receivers.

At the time of conducting the experiments the PRRT protocol was not yet able to

derive the optimal parameters autonomously. Therefore it was necessary to set them

manually for the experiments. As all experiments have been carried out on LANs it was

possible to assume a rather small Dtarget of 100ms. In addition the tests were done using

a single client only which makes it possible to set k to 1, making the code a repetition

code.# »

Np is set to (0, 1, 1, 1) allowing three rounds of repetition.

3.2 Application Layer: Time

As explained in detail in Section 2.2.3 NTP is an industry-standard solution for clock

synchronization in distributed systems. NTP will be applied to synchronize the servers.

It is also recommended to synchronize the receivers using NTP, but it is not absolutely

necessary, as the MPEG2 Transport Streams deals with the synchronization between

sender and receiver anyway – without setting the local clock though.

To achieve the best possible accuracy with NTP the polling has been set to the smallest

possible value of four, which means the daemon polls every 24s = 16s. The result of this

can be seen in Section 4.1.

30

3.3 Application Layer: Data

At the time of writing no MVC video data nor MVC encoder or decoder were available.

Therefore to get some interim results a classical setup with two full AVC streams has been

considered.

This section explains how to implement the necessary tools using the techniques de-

scribed in Section 3.1 and 3.2.

3.3.1 Time analysis

Time

Server Client

NTP NTP

UDP... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...

Figure 3.3: Synchronized server/client architecture that sends timestamps using UDP

The first activity was to write an application that evaluates the delay of network trans-

mission and the usability of NTP. To get started a simple UDP server/client architecture

was implemented. The two hosts were synchronized to a common time server as described

above in Section 3.2. To be more realistic to the actual use case the UDP packet size was

extended to the size of a typical TS packet, namely 7 ∗ 188byte = 1316byte. These first

experiments already indicated that NTP is adequate for the desired application; detailed

experimental results can be seen in Section 4.1.

31

Time

Server Client

NTP NTP

PRRT... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...

Figure 3.4: Synchronized server/client architecture that sends timestamps using PRRT

The next step was to implement the very same not on UDP, but on PRRT to find out

the effects of PRRT on the delay of the transmission. Again the packet size was extended

to the size of a typical TS packet. The Dtarget of PRRT was set to 100ms; that means

that every data packet should be at the receiver after 100ms. Detailed experimental

results can be seen in Section 4.1.

Time

Server 1

Client

NTP NTP

PRRT... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...

Server 2 ... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...PRRT

Figure 3.5: Synchronized dualserver/client architecture that sends timestamps using PRRT

In a third step the described architecture was taken into account and the delay for two

servers was measured. To achieve this a multi-threaded receiver was written which is

capable of receiving two concurrent streams from two distinct servers. For this detailed

statistics on the distribution of the data packets in terms of delay and loss were recorded.

They can be seen in Section 4.1.

32

3.3.2 TS analysis & streaming

The next step was to write a Transport Stream parser which is able to extract the

PCR value which appears every few packets10, because the PCR value is important for

the server to determine the pace in which to send data in the stream.

Server ClientUDP

... 〉 Payload 〉 Payload+PCR 〉 Payload 〉 Payload 〉 Payload+PCR 〉 ...Transport Stream

File Player

Figure 3.6: Synchronized UDP server/client transport streaming

A tool which is able to stream a MPEG2-TS in the correct speed (which can be deter-

mined by the PCR) over the network to another host had to be written. To minimize the

source for errors in the implementation the first version used UDP, such that a standard

software player such as VLC can play the video. A problem with the PCR had to be

faced: One cannot count on the first PCR value starting at zero for a video. This seems

to be clear for all kinds of videos such as TV recordings (which are in TS format already)

or excerpts from DVDs (newly encapsulated in TS format) and is even more clear when

thinking about broadcast, but even when wrapping a video file in TS, common tools

(such as FFmpeg) do not necessarily start at zero. Therefore it has to be considered that

the PCR can start at any number.

10At least every 100ms according to MPEG; at least every 40ms according to DVB

33

3.3.3 Two Server TS streaming

Server ClientPRRT


File File

Play

erU

DP

Figure 3.7: Synchronized PRRT server/client transport streaming

After having proven all the individual components to work, it was now time to cre-

ate applications that glue all the above together. As before a one server solution has

been written first, which is basically the same as the one with UDP, but using PRRT

instead. The receiver has been created twice: one version writes the received data onto

the harddisk, the other version forwards the data that has been transmitted with PRRT

using a UDP socket. This can be any host, but most likely is localhost. This way a

standard player (again such as VLC) can be used to play the video by just listening on

the loopback interface where it is very unlikely that errors occur, when the system is not

under heavy load.

Server 1 ClientPRRT


File

File

Play

erU

DP

PID

Server 2

File

PID

Multi-plexer

Recv

2Re

cv1

PRRT... 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 ...

TS-packets

Star

ttim

e

TCP

Figure 3.8: Synchronized PRRT dualserver/client transport streaming

34

In the Timestamp experiment in Section 3.3.1 the concept of multithreading was al-

ready introduced. The receiver application (which again exists in two versions: as a file

writer and as a UDP relay) needs to use two threads to receive two streams from two

servers and at the same time a third thread needs to act as a multiplexer which sorts the

packets into the correct order again.

It is of course necessary to set some time limit to this; otherwise the receiver would

become stuck if only a single packet is missing or in the wrong order. To be able to

bring the TS-packets in the correct order some additional logic is necessary. PRRT does

guarantee that the order of all data packets sent is maintained, however this is of course

only true for a single PRRT channel, not for two. Therefore to bring the packets back

into the same order some kind of identifier needs to exist. As the TS-packets do not carry

any unique identifiers in their header it was necessary to come up with an independent

approach. A four byte identifier was inserted in front of every TS packet. It can be used

to restore the total order of all TS packets at the receiver. Four byte means 4,294,967,296

values before this identifier wraps around. Even with a datarate of 20Mb/s it would take

three and a half days before this identifier repeats. Therefore one can assume that two

packets in the receiving buffer with the same identifier are actually the same packet (in

case that ever occurs) and one only has to deal with the wrap-around. That also means

that every TS packet is now 188byte + 4byte = 192byte, every PRRT packet is now

192byte ∗ 7 = 1344byte. The servers need to insert this identifier in front of every TS

packet before sending it.

Another possible solution instead of the identifier would be to use the M2TS approach.

M2TS also has four bytes in front of every TS packet, but it is not an identifier, but

two bits for copy protection and 30 bits timestamp. However this timestamp cannot be

used to restore the exact order of all packets (as multiple TS packets can have the same

timestamp) and is therefore not examined any further.

The multiplexer itself is the third thread in the receiving application. The two receiving

threads are passing up to ten whole data packets received at once to the multiplexer. The

multiplexer then divides the data packets containing up to seven TS packets plus identifier

into the single TS packets and stores them in a map, where the identifier is the key and

35

the TS packet is the value. In another map the time of reception of each TS packet is

stored to be able to identify delayed packets.

Depending on the receiver being in 2D or 3D mode an endless loop sorts the packets

into the correct order. In the 2D mode everything in the map is just output in the order of

the identifiers and then removed from the map. This can be done, because PRRT makes

sure the order of the data is maintained. In the 3D mode it is evident that the packets

from the two servers are sorted into the correct total order. To achieve this in the loop a

method looks for the next packet (the packet that has an identifier which is exactly one

larger than the current packet). If it is found it is attached to the output routine and

removed from the map. If it is not found the next step depends on the state of the map.

If the map is empty, the thread is being put to sleep for a predefined amount of time –

the buffertime. During this time the two receiving threads can receive more data. When

they do, they wake the multiplexer thread. When they do not, the multiplexer thread is

woken after the buffertime. In the other case, when the map is not empty, the maximum

waittime is limited by the buffertime and the receiving time of the next TS packet in the

map. The thread waits and continues to run at latest at receiving time + buffertime.

When the TS packet appears during this time in the map it is directly sent to the output

and the normal operation continues. If it does not occur in the map during this time the

packet is considered to be lost and the normal operation continues.

As explained in the beginning of this section no MVC material was available at this

point in time. Therefore the 3D stereoscopic video material used is two AVC streams in

one transport stream. It is easy to distinguish between the two views, as each of them

gets its own PID in the header of each TS packet. The implication of this is, that the

primary server (for the base view) just needs to send everything from the TS data but

the TS packets with the PID for the enhancement view and the secondary server (for

the enhancement view) just needs to send only those TS packets with the PID for the

enhancement view. As the two servers’ clocks run at the same speed thanks to the use

of the NTP protocol it is not necessary to synchronize the actual playback between the

two servers; it is sufficient to agree upon a common point in time when to start the

transmission. To do this a single TCP packet is sent from the primary to the secondary

36

server which contains a timestamp when the transmission should start, the PID which the

secondary server should transmit and a video source. TCP is used because it guarantees

reliability. The two servers then begin the transmission at the same moment in time and

are sending the data using PRRT such that they should appear in almost the correct

order at the receiver. As written before, the receiver only needs to restore the final order

by looking at the identifiers. If the secondary server for the enhancement view cannot

be contacted within one second (which is plenty of time for TCP retransmissions) the

primary server starts without the secondary server and provides the base view only.

3.3.4 Player

As stated before VLC is a software player that is capable of playing UDP streams. A

feature of VLC useful for this project is, that it can in addition show two videos that are as

an elementary stream in a Transport Stream concurrently within perfect sync (as opposed

to other Players, such as Xine). Unfortunately VLC is not prepared for 3D playback (as

of version 1.1.11); it can only display the two videos in two separate windows. There is

a filter for VLC available (its name is “Mosaic”) that allows to prepare the output for

3D display (e.g. using line-by-line, side-by-side or top-bottom, as supported by most

3D displays). This of course reduces the spatial resolution by 50%. However as this

module has not been designed for this it only supports software processing (no hardware

processing) and in tests even a very fast state-of-the-art computer was not able to play

the 3D video properly.

One possibility to display the video using VLC anyway was to trick VLC into crushing

the videos such that they are only half as wide and moving the windows into the correct

position11 to achieve a side-by-side representation.

This shows the general feasibility of the protocol implementation, however current free

software players do not have the necessary tools to show the 3D video properly out of

the box.

11“Devil’s Pie” can be used for that: http://burtonini.com/blog/computers/devilspie/

37

http://burtonini.com/blog/computers/devilspie/

3.3.5 MVC

The preceding sections showed the general feasibility of this approach. However a very

important part could not be investigated yet - the MVC coding. At the beginning of

the work no MVC data was available and could therefore not be tested. To get interim

results another approach with two AVC streams was considered, this however required

more bandwidth compared to MVC, because a MVC stream is more efficient for 3D video

than two AVC streams.

When MVC data was available for testing the first issue to consider was how to get down

to the NAL-Units which control to what view perspective the subsequent data packets

belong. The answer to this problem is to follow the trail described in Section 2.2.2.3:

The NAL-Units are encapsulated in a Packetized Elementary Stream. The individual

PES packets are then encapsulated into the Transport Stream. To get to the NAL-Units

it is necessary to analyze the headers of the packets from the different levels to find the

corresponding payloads which carry the subjacent level’s packet:

The payload unit indicator in the header of the TS-packet gives valuable information:

It describes if the payload present in the current TS-packet is the beginning of a PES-

packet. The next important information from the TS-header is the actual beginning

position of the payload. It can be calculated by determining the presence and if so the

length of the adaption field which is always in front of the payload. Now there are two

paths that need to be considered:

The first is when the TS-packet transports the beginning of a PES-packet (which is

indicated by the payload unit indicator value of one) one can check for the correct sync

byte sequence (0x000001) and read the stream ID and packet length of the PES packet.

The stream ID is important because it tells us if the packetized elementary stream (PES)

contains video or audio or other program related content (e.g. subtitles). As only the

NAL-units in MVC are of interest for the problem the analysis is restricted to the stream

ID 0xE0 which is the one for AVC/MVC. The packet length can either be non-zero which

means that the whole PES-packet is exactly of that length or it can be zero. In the latter

case the length is undetermined and the actual length can only be identified by listening

for the next payload unit indicator in TS-packets of the same TS PID. To find the first

NAL-Unit it is crucial to find the beginning of the PES payload which means one has to

38

find the end of the PES header. The length of the header is dependent on the stream

ID. In the case of AVC/MVC the PES header always has an extended header which has

a dynamic length. However the length is transmitted within the header, so it can easily

be read out. If the packet length is non-zero the complete length can now be calculated

and taken as a “carryover” parameter for the next TS-packet. If the beginning of the

PES payload has been found it is then possible to search for the sync bytes of the NAL-

Units as used in the elementary stream (ES). The sync bytes are either 0x00000001 or

0x000001 depending on the position inside the ES. This however is not important for the

problem and the short form can be used. A PES-packet is carried inside a TS-packet.

The payload carried by the PES-packets forms the ES. As multiple NAL-Units can be

inside the part of the ES that is inside a single PES-packet it is necessary to loop through

the whole payload. The header of each NAL-Unit contains a forbidden bit that may never

be one, two bits that indicate if the data can be dumped after decoding (not of interest

for this problem) and five bits that represent the NAL-Unit type. This is the important

information for this solution. If only a single NAL-Unit type is not of value 14 or 20 (the

two values used in MVC) the TS-packet must be sent using the primary server, because

it contains information for the base view. In the case of a subsequent PES-packet this

NAL-Unit type would now be used for deciding if the TS-packet is sent using the primary

or the secondary server. However as this PES-packet is the first PES-packet and contains

the header it always needs to be sent using the primary server. As one cannot be sure,

that the next PES-packet starts with a NAL-Unit (it could also be a continuation of the

last NAL-Unit in the current PES-packet) the type of the current NAL-Unit is preserved

and considered for the next PES-packet.

The other is when the TS-packet transports subsequent PES-packets. This is indicated

by a null value fo the payload unit indicator. As before it is necessary to search for the

sync bytes of NAL-Units. This is basically the same process as described before but

there is one important difference. The NAL-Units in the first PES-packet are always sent

using the primary server. Here in subsequent PES-packets the TS-packet is sent using the

secondary server which serves the enhancement view whenever there are only NAL-Units

of type 14 or 20 inside. In any other case the TS-packet is sent using the primary server

and the data is allocated to the base view.

39

Server 1 ClientPRRT


File

File

Play

erU

DP

2D

Server 2

File

3D

Multi-plexer

Recv

2Re

cv1

PRRT... 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 ...

TS-packets

Star

ttim

e

TCP

Figure 3.9: Synchronized PRRT dualserver/client MVC transport streaming

After the analysis of this rather complicated encapsulation of protocols it was time to

use the code in an actual implementation. This was easy to accomplish: The solution

presented in Section 3.3.3 was copied and the server application was adapted to use the

above concept to decide which server needs to send which TS-packets instead of the pure

PID-based solution before. The results can be seen in Section 4.4.

3.3.6 MVC-Player

Although MVC claims to be backward compatible to AVC it is not possible to play

either of the transmitted variants (base and enhancement view vs. base view only)

in most software players. Only Xine is able to play the both streams in 2D. This is

however most likely a signaling problem in the structure of the Transport Stream. As

the transmission of the base view only played just as good in Xine as the transmission

of both base and enhancement view it can be assumed that the omission of NAL-Units

works just as expected.

To get a precise answer to this question, other players were evaluated: At the time of

testing there were only very little software players available that had support for MVC.

VLC is not capable of decoding MVC as of version 1.1.11. Also no Linux player could

be found that supported MVC. The only player that could be used for testing is the

40

“Stereoscopic Player” by Peter Wimmer12. It has two major drawbacks: First it only

runs on Windows which makes it impossible to use it together with the tools written,

because of their dependency on PRRT which is only available as a Linux kernel module.

The other is that it also does not accept UDP streams which would be one way to use it

in conjunction with a Linux computer that acts as a relay between PRRT and UDP. An

attempt to redirect the UDP-Stream to a Named Pipe on Windows which is then read by

the Player failed, because Windows Named Pipes can only be read by applications that

use the Win32 API CreateFile() function to open the file and apparently this player

uses some other method. Nevertheless the main functionality to play MVC coded files is

fulfilled.

Anyhow it has been shown before, that the speed in which the data is transmitted

over the presented tools is correct and allows proper playback. Therefore it was only

necessary to check if the transmitted data could be rendered in the Stereoscopic Player

in both variants - base view only as well as base and enhancement view. To test this

the File-Version of the Multiplexer described before has been used to capture the data

into a file. Two files have been recorded: For the first file the multiplexer acted as a 3D-

receiver, for the second file it acted as a 2D-Receiver. The first file (with full 3D stereo

content) could be played in stereoscopic mode as well as in monoscopic mode without

any problems. The second file (without the enhancement view) played exactly like the

first one in monoscopic mode. This shows that the proposed method for de-multiplexing

and re-multiplexing works.

12http://3dtv.at/

41

http://3dtv.at/

4 Experimental Evaluation

In this section all the steps that are explained in Section 3 are evaluated in experiments.

Various experiments exist that analyze the synchronization capabilities (Section 4.1), the

transport stream tools (Section 4.2) and the final two server transport stream application

(Section 4.3). In addition, also the solution with MVC is tested in Section 4.4 and

furthermore the effects of nonuniform round trip times have been evaluated in Section 4.5.

4.1 Time Analysis

As explained in Section 3.2 and Section 3.3.1 as the first step it is necessary to evaluate

the efficiency of NTP. To achieve the best possible accuracy with NTP the polling is set

to the smallest possible value of four, which means the daemon polls every 24s = 16s. In

Listing 4.1 some typical NTP statistics for the two servers are shown.

1 remote refid st t when poll reach delay offset jitter2 ==============================================================================3 *ptbtime1.ptb.de .PTB. 1 u 7 16 377 20.385 -0.027 0.2904

5 remote refid st t when poll reach delay offset jitter6 ==============================================================================7 *ptbtime1.ptb.de .PTB. 1 u 2 16 377 20.291 -0.014 0.127

Listing 4.1: NTP peers statistics

For the measurements a publicly available timeserver is chosen. In this case the first

timeserver of the “Physikalisch-Technische Bundesanstalt” is used. The PTB is the Ger-

man national organization that is responsible by law to distribute the exact time. Their

server ntp1.ptb.de is a stratum 1 NTP server13.

Lines 1–3 in Listing 4.1 represent the NTP statistics of the master server where Lines

5–7 represent the NTP statistics of the slave server. The values for delay, offset and

jitter are milliseconds. It can be seen that the delay of the two servers to the NTP

server is almost identical, as are offset and jitter. In this particular example the offset is

(−0.027ms) − (−0.014ms) = (−0.013ms) which means that the local clocks of our two

servers are (at this particular point in time) only 13µs apart. As the polling interval is as

13A stratum 0 device is a precise clock such as an atomic clock. A server directly connected to a stratum 0device is called a stratum 1 device. A device connected via a network to a stratum 1 device is calleda stratum 2 device, and so on.

42

small as possible with NTP (16s) the two clocks are corrected almost four times a minute

and cannot drift away from each other. Typically the difference in the offset are between

1 and 50 microseconds. A series of measurements over one hour has shown this. Every

16 seconds the data has been compared (250 times in total) and these are the results:

Delay Offset Jitter

Min 19.981ms −0.072ms 0.044ms

Max 20.459ms 0.136ms 8.384ms

∆ 0.478ms 0.208ms 8.340ms

∅ 20.276ms −0.000ms 0.737ms

σ 0.129ms 0.035ms 1.671ms

Table 4.1: Master Server statistics

Delay Offset Jitter

Min 20.052ms −0.079ms 0.019ms

Max 20.458ms 0.162ms 0.557ms

∆ 0.406ms 0.241ms 0.536ms

∅ 20.286ms 0.001ms 0.212ms

σ 0.118ms 0.039ms 0.139ms

Table 4.2: Slave Server statistics

Delay Offset JitterMin 0.001ms 0.000ms 0.000msMax 0.209ms 0.115ms 8.312ms∆ 0.208ms 0.115ms 8.312ms∅ 0.046ms 0.028ms 0.624msσ 0.035ms 0.024ms 1.605ms

Table 4.3: Difference between Master and Slave Server statistics

Table 4.1 and Table 4.2 show the minimum and maximum values, the difference between

those two, the average and the standard deviation of the 250 measurements for delay,

offset and jitter for the master and slave server. Table 4.3 shows the difference between

the master and the slave server. Note that the difference is taken between measurements

taken at the same time and not the difference between Table 4.1 and Table 4.2. From

Table 4.1 and Table 4.2 one can see that the delay is pretty constant with ≈ 20ms and

a standard deviation of only ≈ 120µs. In fact Table 4.3 shows that the delay is on

average 46µs apart with a standard deviation of 35µs. Another interesting fact is that

the difference in offset (what the clocks are actually apart from each other) is only 28µs

with a standard deviation of 24µs. These small values show that the tolerance is a lot

smaller than a frame interval and therefore verify that NTP is adequate in serving as a

protocol to synchronize the two servers.

43

The first experiment is the one described in Section 3.2. These very basic experiments

showed that the time that it takes for the data to be transported over UDP to the

destination is negligible:

1 1316028531 000345: "1316028531 000179"2 1316028532 000256: "1316028532 000077"3 1316028533 000296: "1316028533 000119"4 1316028534 000156: "1316028534 000011"5 1316028535 000279: "1316028535 000113"6 1316028536 000256: "1316028536 000112"

Listing 4.2: UDP Timestamp test

The data in Listing 4.2 is an excerpt of a test run which sends a timestamp from a

server to a client. The two clients are synchronized using NTP as described before. In

Listing 4.2 the first value (before the colon) is the timestamp of reception where the first

block are seconds and the second block are microseconds. The second value (after the

colon) is the timestamp of sending. Although this is only a very brief extract one can see

that the differences between the two values are never larger than 200µs and this already

includes transmission delay and clock inaccurateness.

The second experiment is also described in Section 3.2. This is basically the same

experiment as the one before, but it uses PRRT and sends at a higher rate:

1 1316029750 750509: "1316029750 592138 00000000000000000394"2 1316029750 752212: "1316029750 592581 00000000000000000395"3 1316029750 752227: "1316029750 593023 00000000000000000396"4 1316029750 752239: "1316029750 601124 00000000000000000412"5 1316029750 752252: "1316029750 601598 00000000000000000413"6 1316029750 752266: "1316029750 602067 00000000000000000414"7 1316029750 754434: "1316029750 602534 00000000000000000415"8 1316029750 754457: "1316029750 603211 00000000000000000416"

Listing 4.3: PRRT Timestamp test

The data in Listing 4.3 is an excerpt of a test run which sends a timestamp from a

server to a client. The two clients are synchronized using NTP as described before. In

Listing 4.3 the first value (before the colon) is the timestamp of reception where the first

block are seconds and the second block are microseconds. The second value (after the

colon) is the timestamp of sending (the third parameter is a sequence number). Although

this is only a very brief extract one can see that the differences between the two values are

44

between 150ms and 160ms, although the Dtarget of PRRT was set to 100ms. As this is

not the case with UDP one has to assume that this is due to the current implementation

of PRRT, which has difficulties to guarantee the desired Dtarget, maybe due to scheduling

reasons at the host.

One could argue that this is due to the higher rate, but even when sending at the same

rate as in the experiment before this effect is visible:

1 1316080212 118389: "1316080212 000046 00000000000000000014"2 1316080213 118388: "1316080213 000078 00000000000000000015"3 1316080214 118420: "1316080214 000085 00000000000000000016"4 1316080215 118395: "1316080215 000087 00000000000000000017"5 1316080216 118403: "1316080216 000114 00000000000000000018"6 1316080217 118386: "1316080217 000075 00000000000000000019"7 1316080218 118389: "1316080218 000105 00000000000000000020"

Listing 4.4: PRRT Timestamp test at lower rate

Although the values in Listing 4.4 show only a delay of 118ms, which is a lot less than

in the example before, the goal of Dtarget at 100ms is clearly missed. However as this

is static it still is predictable and therefore not a problem for the remainder of the project.

The third time related experiment is again described in Section 3.2. In this experiment

not one but two servers are used concurrently. The receiver receives the data from both

servers (2000 packets/second, each 1316 bytes) and measures the delay, the number of

lost and the number of late packets:

1 1: 2569.134440 kB/s2 1: 104 -105=9 , 105 -106=6000 , 106 -107=6004 , 107 -108=5987 , 108 -109=6012 , 109 -110=6004 ,

110 -111=6000 , 111 -112=5980 , 112 -113=6028 , 113 -114=6019 , 114 -115=5999 , 115 -116=5969 ,116 -117=6067 , 117 -118=6046 , 118 -119=6074 , 119 -120=5908 , 120 -121=6308 , 121 -122=6790 ,122 -123=5748 , 123 -124=5672 , 124 -125=5314 , 125 -126=2 , 126 -127=2 , 127 -128=2 ,128 -129=1 ,

3 1: Packets = 119945 , Lost = 0, Lossrate = 0.000000% , Late = 0, Laterate = 0.000000%4 =======================================5 2: 2569.648503 kB/s6 2: 104 -105=287 , 105 -106=5942 , 106 -107=6591 , 107 -108=6101 , 108 -109=5389 , 109 -110=6067 ,

110 -111=6783 , 111 -112=5818 , 112 -113=5469 , 113 -114=5975 , 114 -115=7157 , 115 -116=5510 ,116 -117=5356 , 117 -118=6245 , 118 -119=7259 , 119 -120=5146 , 120 -121=5526 , 121 -122=6540 ,122 -123=6739 , 123 -124=5276 , 124 -125=4793 ,

7 2: Packets = 119969 , Lost = 0, Lossrate = 0.000000% , Late = 0, Laterate = 0.000000%

Listing 4.5: PRRT Timestamp test with two servers

The data shown in Listing 4.2 is a short test over ten seconds. It shows the datarate

of 2569kB/s and lists the delay for the packets shown in buckets, each of them being

45

one millisecond. It can be seen that most packets are between 105ms and 125ms, a few

are 104ms or between 125ms and 129ms, but there are no packets faster than 104ms or

slower than 129ms. In Listing 4.2 it is also denoted that there has been neither packet

loss nor have packets been late. This is because the physical bandwidth of the ethernet

has not been exceeded and no route with high probability of packet loss (such as a wireless

LAN) has been used. To make the data more clear a graphical representation is helpful.

However as a ten second measurement can easily be marked as non-representative data

has been recorded over one hour (7.2 million packets in total at the same data rate of

2569kB/s) as can be seen in Figure 4.1.

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

1

2

Figure 4.1: PRRT Timestamp test with two servers over one hour;blue shows the primary server, red the secondary server;

x = time/ms, y = number of packets

The results are very alike. One can also see that all packets have been received after at

least 103ms and at most 125ms. It is interesting to see that the two streams are almost

46

cyclic and shifted. At this point it is unknown why this is the case, but it does not matter

anyway, because 100% of all packets are received within the range of 22ms and thus all

ongoing experiments can take this fact into account.

The experiments in this section showed that NTP is pretty accurate and that it is more

than adequate in serving as a background service that synchronizes the two servers. They

also showed that PRRT can provide predictable delay, however it does not achieve the

Dtarget that it takes as a parameter, but needs about 25–50% longer. This can however

easily be accumulated for by using a small buffer, which is needed anyway to restore the

total order of the packets from the two servers.

4.2 TS Analysis

As explained in Section 3.3.2 a program is written that extracts the PCR values from

a transport stream file. It is necessary to extract the PCR values to be able to stream

the video data from the server to the receiver in the correct speed. An excerpt of all the

PCRs in a transport stream can be seen in Listing 4.6:

1 # 3: PID = 0x100 , PCR = 766669 us2 # 11: PID = 0x100 , PCR = 866669 us3 # 33: PID = 0x100 , PCR = 966669 us4 # 39: PID = 0x100 , PCR = 1066669 us5 # 63: PID = 0x100 , PCR = 1166669 us6 # 83: PID = 0x100 , PCR = 1266669 us

182 # 65402: PID = 0x100 , PCR = 18666669 us183 # 65696: PID = 0x100 , PCR = 18766669 us184 # 66102: PID = 0x100 , PCR = 18866669 us185 # 66372: PID = 0x100 , PCR = 18966669 us

411 #165085: PID = 0x100 , PCR = 41466669 us412 #165751: PID = 0x100 , PCR = 41566669 us413 #166369: PID = 0x100 , PCR = 41666669 us414 #167102: PID = 0x100 , PCR = 41766669 us

Listing 4.6: PCR values in a TS file

The first parameter shows the number of the TS-packet, the second argument is the

PID of the TS-packet with the PCR and the third value is the actual PCR value in

microseconds. In Listing 4.6 line 1 the first PCR is in the third TS-packet, the PID is

256 (the decimal value of hexadecimal 0x100) and the value is about 767ms. Once again,

note that this very first PCR value in the transport stream is not zero. In this example

47

every 100ms one PCR value occurs. It is also visible that they only occur in the first

PID, although there are four in total in the tested transport stream.

As written in Section 3.3.2 a program that is able to stream a transport stream over

UDP in the correct speed depending on the PCR values is necessary. To test this a

video of exactly 100s length was wrapped into a transport stream (the same video as

before). As just seen the first PCR in this video is 766669µs, the last is 100666669µs.

The difference between these two PCR values is 99900000µs = 99.9s with a little bit of

data before the first and after the last TS-packet with a PCR value that makes up for

the missing 0.1s. A testrun with a small application that measures the time between

the first received and the last received UDP packet ended with a total transmission time

of 99902731µs which is only 2.7ms longer than what has been expected. That is an

error of 0.0027% and thus neglectable. The test with a standard media player such as

VLC also showed that the streaming is done in the correct speed. Another interesting

experiment is to test if no packets have been lost during the transmission. On a local

(wired) network with no packet loss this should always be the case. To test this, the

transport stream received is written into a file and compared to the transport stream file

of the sender. In a first attempt the filesize was different, which indicates that packets

have been lost, although the network introduces no packet loss. In addition no visible

artifacts could be seen in the test with a media player (see above). The problem here was

the poorly written helper application that listens to the UDP stream and writes it into a

file, which could not read the packets from the systems internal UDP packet buffer fast

enough. Increasing this buffer as shown in Listing 4.7 solved the problem and the filesize

of the received transport stream was equal to the one of the source. In addition also the

checksum is compared and it is also the same, which means that the transport stream

has been transmitted without packet loss or reordering at the correct speed.

48

1 $ sysctl -a | grep net.core.rmem2 net.core.rmem_max = 1126403 net.core.rmem_default = 1126404 $ sudo sysctl -w net.core.rmem_max =6680005 net.core.rmem_max = 6680006 $ sudo sysctl -w net.core.rmem_default =6680007 net.core.rmem_default = 6680008 $ sysctl -a | grep net.core.rmem9 net.core.rmem_max = 668000

10 net.core.rmem_default = 668000

Listing 4.7: Increasing the UDP kernel receive buffer on Linux

4.3 Two Server TS Analysis

As explained in Section 3.3.3 the same application as just explained is written again,

but this time it uses PRRT instead of UDP. The two variations with PRRT (stream using

UDP, write a file) do work just exactly like the version that uses UDP. The version that

takes PRRT and streams UDP (e.g. to localhost) can be used to watch the transport

stream with a software player such as VLC. The version that takes PRRT and writes the

file yields the same results as before, the filesize and checksum are the same as for the

source file. Interestingly, the time that it takes for the whole stream to be transmitted

is a little longer than before. This is most likely due to the differences in delay that the

single packets have in PRRT as explained in Section 4.1. It takes 99919720µs where the

pure UDP based transmission takes 99902731µs which is 17ms longer and almost 20ms

in total. That equates to one frame delay (assuming a 50fps video), however the software

player itself has yet another buffer than can easily compensate for this minor delay.

Also explained in Section 3.3.3 is what needs to be done to use not one server but two

servers. When the two servers start sending, a multiplexer in the receiver sorts the TS

packets back into their original order and either writes them to a file or forwards them as a

UDP stream (e.g. to localhost). The first version that writes the data into a file produces

the same result as in the experiments before: both filesize and checksum are equal to the

original source file. This is also true for the UDP variant of the application. During the

experiment also the time for the full transmission was measured again. This time it was

shorter again, namely 99900514µs compared to one server UDP with 99902731µs and

one server PRRT with 99919720µs. It has not been expected, that the accuracy would

49

be better using a multiplexer (which introduces extra complexity) than it is with the two

other solutions.

4.4 MVC Analysis

As explained in Section 3.3.5 for MVC it is necessary to go a few layers deeper than the

MPEG2-TS to get to the NAL-Units which indicate to what view perspective the payload

belongs. To test this the two-server application from above is rewritten to consider the

NAL-Unit type to decide which channel should be used to send the data. Once again

a test video has been taken and transmitted. As before the application is capable of

reconstructing the original file from the two streams. Also the transmission with only a

single stream is successful. The video length of the tested file is 11701633µs, the complete

transmission (in both cases) takes 11730053µs which is 30ms longer than expected. This

however is not visible in the player.

50

0

20

40

60

80

100

120

140

160

ms

ms

Figure 4.2: MVC transmission test;blue shows the primary server, red the secondary server;

x = time/ms in the video, y = time/ms for the data packet from the sender to the receiver

In Figure 4.2 the exact time that each data packet needs to travel from one of the two

servers to the client has been recorded. It shows that the delay is stable between 100

and 125ms, although there are a few peaks that exceed these numbers. This is due to

a significantly higher data rate at these moments, as it can also be seen by the higher

number of dots in the graph.

51

0

5000

10000

15000

20000

25000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Figure 4.3: MVC transmission test datarate;x = time/ms in the video, y = number of TS data packets accumulated over 100ms

Figure 4.3 shows the data rate of the video transmission. It can be seen that the data

rate is ten times as high during the peaks described above compared to the rest of the

transmission.

4.5 Nonuniform Round Trip Times

As the two servers can be in different geographical locations this section looks at the

effects of nonuniform round trip times (RTTs). In the experiments above the RTT was

the same for the two servers within the range of ±1ms. In this experiment the same

measurement as in Section 4.4 is executed, but this time a delay of 40ms in a first

experiment and a delay of 60ms in a second experiment is added to the RTT of the

secondary server (which streams the enhancement view). To execute a realistic scenario

a delay is added to both directions of the transmission of the same size. For an additional

52

40ms in the RTT, 20ms is added to the sending path and 20ms is added to the receiving

path. For an additional 60ms in the RTT, 30ms is added to the sending path and 30ms

is added to the receiving path.

This can be achieved running a network emulation. netem is included in current linux

distributions running Kernel 2.6 and can be controlled by the tc-command which is part

of the iproute2 package. The tc-command can be used to alter the traffic control settings.

Listing 4.8 shows the commands to add a delay for all outgoing data:

1 $ tc qdisc add dev eth0 root netem delay 20ms2 $ tc qdisc show3 qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 20.0ms4 $ tc qdisc del dev eth0 root netem delay 20ms

Listing 4.8: Adding and deleting 20ms of delay for outgoing data on eth0.

Adding a delay for incoming data is more complicated. It is necessary to create an

Intermediate Functional Block device in Linux14, where traffic control settings can be

applied to. This can be done as shown in Listing 4.9.

1 $ modprobe ifb2 $ ip link set dev ifb0 up3 $ tc qdisc add dev eth0 ingress4 $ tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 1:1 action

mirred egress redirect dev ifb05 $ tc qdisc add dev ifb0 root netem delay 20ms6 $ sudo tc qdisc show7 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------8 qdisc netem 8001: dev ifb0 root refcnt 2 limit 1000 delay 20.0ms9 $ tc qdisc del dev ifb0 root netem delay 20ms

Listing 4.9: Adding and deleting 20ms of delay for incoming data on eth0.

To verify the added delay an ICMP ECHO_REQUEST can be used via the ping-

command as shown in Listing 4.10.

14More information can be found here: http://www.linuxfoundation.org/collaborate/workgroups/networking/ifb

53

http://www.linuxfoundation.org/collaborate/workgroups/networking/ifb

http://www.linuxfoundation.org/collaborate/workgroups/networking/ifb

1 $ sudo tc qdisc show2 qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1

1 13 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------4 qdisc pfifo_fast 0: dev ifb0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1

1 15

6 $ ping -c 5 10.10.10.107 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.8 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =1.47 ms9 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =0.159 ms

10 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =0.222 ms11 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =0.190 ms12 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =0.170 ms13 --- 10.10.10.10 ping statistics ---14 5 packets transmitted , 5 received , 0% packet loss , time 4001ms15 rtt min/avg/max/mdev = 0.159/0.443/1.478/0.518 ms16

17 $ sudo tc qdisc add dev eth0 root netem delay 20ms18 $ sudo tc qdisc add dev ifb0 root netem delay 20ms19 $ sudo tc qdisc show20 qdisc netem 8005: dev eth0 root refcnt 2 limit 1000 delay 20.0ms21 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------22 qdisc netem 8006: dev ifb0 root refcnt 2 limit 1000 delay 20.0ms23

24 $ ping -c 5 10.10.10.1025 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.26 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =40.2 ms27 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =40.3 ms28 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =40.2 ms29 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =40.2 ms30 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =40.2 ms31 --- 10.10.10.10 ping statistics ---32 5 packets transmitted , 5 received , 0% packet loss , time 4005ms33 rtt min/avg/max/mdev = 40.212/40.276/40.309/0.182 ms34

35 $ sudo tc qdisc change dev eth0 root netem delay 30ms36 $ sudo tc qdisc change dev ifb0 root netem delay 30ms37 $ sudo tc qdisc show38 qdisc netem 8005: dev eth0 root refcnt 2 limit 1000 delay 30.0ms39 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------40 qdisc netem 8006: dev ifb0 root refcnt 2 limit 1000 delay 30.0ms41

42 $ ping -c 5 10.10.10.1043 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.44 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =60.2 ms45 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =60.2 ms46 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =60.2 ms47 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =60.1 ms48 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =60.1 ms49 --- 10.10.10.10 ping statistics ---50 5 packets transmitted , 5 received , 0% packet loss , time 4005ms51 rtt min/avg/max/mdev = 60.169/60.237/60.284/0.045 ms

Listing 4.10: Verifying the RTT

54

It can be seen that the RTT increases as expected to 40ms and 60ms. netem can

also be used to create packet loss, packet duplication, packet corruption and packet

reordering. As these events are handled by PRRT, they are not analyzed here. PRRT

also takes care of different RTTs, however the aim of this experiment is to find out the

effects of nonuniform RTTs where the RTT is smaller than the pre-configured delay of

Dtarget = 100ms aim of PRRT.

55

0

20

40

60

80

100

120

140

160

180 ms

ms

Figure 4.4a: secondary server with +40ms RTT

0

20

40

60

80

100

120

140

160

180

ms

ms

Figure 4.4b: secondary server with +60ms RTT

Figure 4.4: MVC transmission test;blue shows the primary server, red the secondary server;

x = time/ms in the video, y = time/ms for the data packet from the sender to the receiver

56

The results of the first experiment can be seen in Figure 4.4a. Compared to the

experiment in Figure 4.2 there are no conspicuous differences. This is due to the fact

that PRRT can still achieve the aim of Dtarget when there is a delay in the RTT of

40ms. Looking at the average delay time for each packet however a difference can be

seen. Where the average is 120ms for the experiment in Figure 4.2, it is 131ms for this

experiment with 40ms additional RTT as shown in Figure 4.4a.

In Figure 4.4b the results of the second experiment can be seen. Compared to the

experiment in Figure 4.2 and also to the experiment with 40ms additional RTT in Fig-

ure 4.4a there are again no striking differences. This is also due to the fact that PRRT

can still achieve the aim of Dtarget when there is a delay in the RTT of 60ms. Again,

looking at the average delay time for each packet a difference can be seen. For the ex-

periment with 60ms additional RTT as shown in Figure 4.4b the average delay for each

packet is 127ms, which is less than for 40ms additional RTT, but still more than in the

experiment shown in Figure 4.2.

Despite the fact that the whole transmission is delayed, the resulting video is still

working perfectly, as PRRT could balance this delay without any problems. With higher

RTTs it is of course necessary to configure the PRRT protocol correspondingly.

57

5 Conclusion

This Thesis evaluated the possibilities to create a 2D-backward-compatible 3D-Tele-

vision service on IP based Networks using an AHEC-protocol. The technological back-

ground has been explained and evaluated and the applicable technologies have been

selected. The analysis resulted in the combination of NTP, PRRT and MVC inside an

MPEG2-TS and suggested that these can offer a solution to the described problem.

The written prototype applications and the evaluation of their results affirmed the

initial assumption that the combination of the above technologies can provide a set of

applications that are capable of achieving a system that allows 2D and 3D video trans-

mission in a backward compatible and bandwidth saving manner.

In addition the first apporach led to a solution that uses two AVC streams to achieve

3D appearance and at the same time is more bandwidth friendly than other solutions

due to its backward compatibility.

Altogether this Thesis showed that it is possible to create a 2D-backward-compatible

3D-Television service which is bandwidth efficient on IP based Networks using modern

media-aware technologies and an AHEC-protocol.

5.1 Future Work

This section gives some ideas for future features and applications of this solution.

The solution presented only considers stereo video with one base view and one enhance-

ment view. The concept however can also be extended to FVV with one base view and

many enhancement views. Each enhancement view would be a distinct multicast group.

This however is not easy to implement, as it is a challenge to have a receiver subscribed

to the correct multicast groups, as the views’ dependencies are way more complex than

in the stereo case.

To protect the base view better than the enhancement view unequal error protection

can be used (e.g. the base view has a stronger protection than the enhancement view(s)).

As each view has its own channel, this can be easily implemented by using different PRRT

parameters for the base view than for the enhancement view.

58

In Section 1.2 the concept of hybrid broadcasting implemented by NHK was already

introduced. The solution presented here could als be used in a hybrid environment by

sending the base view on a classical broadcast technology like DVB-S2 and the enhance-

ment view on IP. This can be achieved by minor changes to the protocol, especially in

the area of huge differences of round trip times and the different error correction schemas

for the different transportation routes.

59

Appendix

A References

[1] Y. Chen, Y.K. Wang, K. Ugur, M.M. Hannuksela, J. Lainema, and M. Gabbouj.

The emerging mvc standard for 3d video services. EURASIP Journal on Applied

Signal Processing, 2009:8, 2009.

[2] D. Deeths and G. Brunette. Using ntp to control and synchronize system clocks-part

i: Introduction to ntp. Sun BluePrintsTMOnLine-July, 2001.

[3] J. Eidson. Ieee-1588 standard for a precision clock synchronization protocol for

networked measurement and control systems. Systems-Test and Measurement Ap-

plications, 10:10–13, 2005.

[4] J. C. Eidson. Measurement, Control, and Communication Using IEEE 1588 (Ad-

vances in Industrial Control). Springer-Verlag New York, Inc., Secaucus, NJ, USA,

2006. ISBN 1846282500.

[5] ETSI. Digital Video Broadcasting (DVB); Second generation framing structure,

channel coding and modulation systems for Broadcasting, Interactive Services, News

Gathering and other broadband satellite applications (DVB-S2) (ETSI EN 302 307

V1.2.1). European Telecommunications Standards Institute, 2009.

[6] M. Gorius, J. Miroll, and T. Herfet. Service compatible efficient 3d-hdtv delivery.

In 14th ITG Conference on Electronic Media, 2011.

[7] M. Gorius, J. Miroll, M. Karl, and T. Herfet. Predictable reliability and packet

loss domain separation for ip media delivery. In IEEE International Conference on

Communications (ICC), 2011.

[8] M. Gorius, G. Petrovic, and T. Herfet. Predictably reliable real-time transport over

large bandwidth-delay product networks. 2011.

60

[9] M. Gorius, Y. Shuai, and T. Herfet. Predictably reliable media transport over wire-

less home networks. 2011.

[10] A. Gotchev, G.B. Akar, T. Capin, D. Strohmeier, and A. Boev. Three-dimensional

media for mobile devices. Proceedings of the IEEE, 99(4):708–741, 2011.

[11] C.G. Gurler, B. Gorkemli, G. Saygili, and A.M. Tekalp. Flexible transport of 3-d

video over networks. Proceedings of the IEEE, 99(4):694–707, 2011.

[12] T. Herfet. Future media internet: Video- & audiotransport – a new paradigm.

Manuscript, 2011.

[13] ISO/IEC 13818-1:2000(E). Information technology – Generic coding of moving pic-

tures and associated audio information: Systems. ISO, Geneva, Switzerland, 2000.

[14] ISO/IEC 14496-10:2004(E). Information technology – Coding of audio-visual objects

– Part 10: Advanced Video Coding. ISO, Geneva, Switzerland, 2004.

[15] M. Karl, M. Gorius, and T. Herfet. A distributed multilink media transport ap-

proach. In International Conference on Intelligent Network and Computing (ICINC

2010), Kuala Lumpur, November 2010.

[16] M. Karl, M. Gorius, and T. Herfet. Routing: Why less intelligence sometimes is

more clever. In IEEE International Symposium on Broadband Multimedia Systems

and Broadcasting (ISBMSB), Shanghai, March 2010.

[17] D.L. Mills. Internet time synchronization: The network time protocol. Communica-

tions, IEEE Transactions on, 39(10):1482–1493, 1991.

[18] D.L. Mills. Precision synchronization of computer network clocks. ACM SIGCOMM

Computer Communication Review, 24(2):28–43, 1994.

[19] H.M. Ozaktas and Levent. Onural. Three-dimensional television. Springer-Verlag

Berlin Heidelberg.

61

[20] T. Schierl and S. Narasimhan. Transport and storage systems for 3-d video using

mpeg-2 systems, rtp, and iso file format. Proceedings of the IEEE, (99):1–13, 2011.

[21] A. Smolic, P. Merkle, K. Müller, C. Fehn, P. Kauff, and T. Wiegand. Compression of

multi-view video and associated data. Three-Dimensional Television, pages 313–350,

2008.

[22] V. Smotlacha et al. One-way delay measurement using ntp synchronization. In Proc.

TERENA Networking Conf, page 2003.

[23] A.M. Tekalp and M.R. Civanlar. Efficient transport of 3dtv. Three-Dimensional

Television, pages 351–369, 2008.

[24] A. Vetro, T. Wiegand, and G.J. Sullivan. Overview of the stereo and multiview video

coding extensions of the h. 264/mpeg-4 avc standard. Proceedings of the IEEE, 99

(4):626–642, 2011.

[25] H. Weibel. High precision clock synchronization according to ieee 1588 implementa-

tion and performance issues. Proc. Embedded World 2005.

[26] S. Wenger. H. 264/avc over ip. Circuits and Systems for Video Technology, IEEE

Transactions on, 13(7):645–656, 2003.

[27] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the h.

264/avc video coding standard. Circuits and Systems for Video Technology, IEEE

Transactions on, 13(7):560–576, 2003.

[28] L. Zhang, Z. Liu, H. Xia, et al. Clock synchronization algorithms for network mea-

surements. In INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE

Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 160–

169. IEEE, 2002.

62

B List of Figures

2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 AHEC Architecture [8, p. 2] . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Top: Base and Enhancement View; Bottom: Base View and Depth Map15 16

2.4 Free viewpoint video, interactive selection of virtual viewpoint and viewing

direction within practical limits [21, p. 319] . . . . . . . . . . . . . . . . . 17

2.5 MVC NAL unit interface [20, p. 673] . . . . . . . . . . . . . . . . . . . . 18

2.6 Typical MVC prediction structure for 3DV [1, 24] . . . . . . . . . . . . . 18

2.7 Typical MVC prediction structure for FVV [1, p. 3] . . . . . . . . . . . . 19

2.8 NAL-Units in the Elementary Stream in the Packetized Elementary Stream

in the Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.9 Measuring Delay and Offset [18, p. 3]; A = Client, B = Server . . . . . . 23

2.10 Network topology and clock types [25, p. 5] . . . . . . . . . . . . . . . . . 25

2.11 Offset and Delay Measurement [3, p. 19] . . . . . . . . . . . . . . . . . . 26

3.1 Streaming protocol stacks [11, p. 698] . . . . . . . . . . . . . . . . . . . . 29

3.2 Digital TV delivery protocol stacks (DVB as an example for underlying

layer technologies) [20, p. 674] . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Synchronized server/client architecture that sends timestamps using UDP 31

3.4 Synchronized server/client architecture that sends timestamps using PRRT 32

3.5 Synchronized dualserver/client architecture that sends timestamps using

PRRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6 Synchronized UDP server/client transport streaming . . . . . . . . . . . 33

3.7 Synchronized PRRT server/client transport streaming . . . . . . . . . . . 34

3.8 Synchronized PRRT dualserver/client transport streaming . . . . . . . . 34

3.9 Synchronized PRRT dualserver/client MVC transport streaming . . . . . 40

4.1 PRRT Timestamp test with two servers over one hour; blue shows the

primary server, red the secondary server; x = time/ms, y = number of

packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 MVC transmission test; blue shows the primary server, red the secondary

server; x = time/ms in the video, y = time/ms for the data packet from

the sender to the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

63

4.3 MVC transmission test datarate; x = time/ms in the video, y = number

of TS data packets accumulated over 100ms . . . . . . . . . . . . . . . . 52

4.4 MVC transmission test; blue shows the primary server, red the secondary

server; x = time/ms in the video, y = time/ms for the data packet from

the sender to the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4a secondary server with +40ms RTT . . . . . . . . . . . . . . . . . . 56

4.4b secondary server with +60ms RTT . . . . . . . . . . . . . . . . . . 56

C List of Tables

4.1 Master Server statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Slave Server statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Difference between Master and Slave Server statistics . . . . . . . . . . . 43

D List of Listings

4.1 NTP peers statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 UDP Timestamp test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 PRRT Timestamp test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.4 PRRT Timestamp test at lower rate . . . . . . . . . . . . . . . . . . . . 45

4.5 PRRT Timestamp test with two servers . . . . . . . . . . . . . . . . . . . 45

4.6 PCR values in a TS file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.7 Increasing the UDP kernel receive buffer on Linux . . . . . . . . . . . . . 49

4.8 Adding and deleting 20ms of delay for outgoing data on eth0. . . . . . . 53

4.9 Adding and deleting 20ms of delay for incoming data on eth0. . . . . . . 53

4.10 Verifying the RTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

64

E CD

This CD contains this thesis as a PDF-file, the LATEX source code of this thesis, images

and other included files.

65

Synchronized Multi-Stream Transport of Stereoscopic … · Synchronized Multi-Stream Transport of...

Documents

Transcript of Synchronized Multi-Stream Transport of Stereoscopic … · Synchronized Multi-Stream Transport of...