Synchronized Multi-Stream Transport of Stereoscopic … · Synchronized Multi-Stream Transport of...
Transcript of Synchronized Multi-Stream Transport of Stereoscopic … · Synchronized Multi-Stream Transport of...
Master Thesis WS 2011/2012Telecommunications Lab
SynchronizedMulti-Stream Transport of
Stereoscopic HDTV
Hanjo Viets
Submitted: 19th Mar 2012Reviewer: Prof. Dr.-Ing. Thorsten Herfet
2nd Reviewer: Prof. Dr. Antonio KrügerAdvisors: Manuel Gorius
Goran Petrovic
Saarland University
Faculty of Natural Sciences and Technology I
Department of Computer Science
Universität des Saarlandes
Postfach 15 11 50, 66041 Saarbrücken
UNIVERSITÄT DES SAARLANDES
Lehrstuhl für Nachrichtentechnik
FR Informatik
Prof. Dr. Th. Herfet
Universität des Saarlandes Campus Saarbrücken C6 3, 10. OG 66123 Saarbrücken
Telefon (0681) 302-6541 Telefax (0681) 302-6542
www.nt.uni-saarland.de
Master’s Thesis for Hanjo Viets
Synchronized Multi-Stream Transport of Stereoscopic HDTV
3D-HDTV is considered to be the next milestone in the evolution of digital video transmission and might only be the beginning of the era of multi-view video services. Therefore, the efficient transport and delivery of such media services over Internet Protocol is an active research topic. As a part of the ITC3D1 project, an efficient delivery and rendering system for 3D-HDTV is being designed and developed.
Flexibility is desirable in transport and delivery strategies for stereoscopic IPTV in order to keep the service distribution scalable. For instance, bottlenecks in the network bandwidth might require the service to be downgraded to a single view. On the other hand, it depends on the userʼs preference whether content should be received in 3D. Obviously there is a strong requirement for seamless coexistence of 2D and 3D services.
In order to benefit from the efficiency of IP multicast, the ITC3D project considers the independent transport of both components of a stereoscopic video stream as well as their potential delivery over multiple network routes. Since route diversity leads to different propagation delays, proper synchronization is required at the receiver.
The scope of this master's thesis is the design of a synchronized and backward-compatible 3D video transport scheme. In particular, the tasks to be solved are the following:
• Evaluate avaliable network synchronization protocols with respect to their applicability to synchronized streaming servers.
• Design and implement a suitable synchronization architecture for component-wise streaming and reception of stereoscopic video streams over multiple network routes.
• Design and implement a transport and multiplexing scheme for the backward-compatible delivery of stereoscopic video.
• Demonstrate the solution in a suitable environment. The video transport scheme should rely on the Predictably Reliable Real-time Transport Protocol (PRRT2).
Tutor: Supervisor:
Manuel Gorius, Goran Petrovic Prof. Dr.-Ing. Th. Herfet
1 http://www.intel-vci.uni-saarland.de/en/projects/internet-transport-and-coding-for-3d-hd-tv.html 2 http://www.nt.uni-saarland.de/projects/prrt
—
–
II
Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Statement in Lieu of an Oath
I hereby confirm that I have written this thesis on my own and that I have not used any other media or materials than the ones referred to in this thesis.
Einverständniserklärung Ich bin damit einverstanden, dass meine (bestandene) Arbeit in beiden Versionen in die Bibliothek der Informatik aufgenommen und damit veröffentlicht wird.
Declaration of Consent I agree to make both versions of my thesis (with a passing grade) accessible to the public by having them added to the library of the Computer Science Department. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)
III
Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass die vorliegende Arbeit mit der elektronischen Version übereinstimmt.
Statement in Lieu of an Oath
I hereby confirm the congruence of the contents of the printed data and the electronic version of the thesis. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)
IV
Abstract
Current technologies for IP based 3D-TV are neither using media-aware transport proto-
cols that implement error correction and timely delivery nor are they backward-compatible
in an efficient manner. Therefore a protocol is needed that on the one hand provides Pre-
dictable Reliability under Predictable Delay (PRPD) and on the other hand is backward
compatible in a way that bandwidth is not wasted. To achieve this, multicast is needed and
server and client synchronization is essential. This Thesis presents a solution where each
view is sent by its own server to serve geographic demand in mesh networking. PRRT is
the network protocol providing PRPD and multicast. The video codec MVC provides good
compression and backward compatibility and MPEG2-TS deals with synchronizing the
receivers where NTP is used for synchronizing the server clocks. Base view and enhance-
ment view are transmitted from different servers, using different routes and remultiplexed
at the receiver side to a stereoscopic 3D video, without loss of resolution.
V
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Basic Concept 7
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Essential Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Network Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Proposed Method 28
3.1 Transport Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Application Layer: Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Application Layer: Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 TS analysis & streaming . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 Two Server TS streaming . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.4 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.5 MVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.6 MVC-Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Experimental Evaluation 42
4.1 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 TS Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Two Server TS Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 MVC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Nonuniform Round Trip Times . . . . . . . . . . . . . . . . . . . . . . . 52
5 Conclusion 58
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
VI
Appendix 60
A References 60
B List of Figures 63
C List of Tables 64
D List of Listings 64
E CD 65
VII
1 Introduction
Three dimensional television is television which is enhanced by one or more additional
views to create an illusion of depth to the viewer. To create this illusion, multi-view video
is used to deliver the same scene from two or more view perspectives. Three dimensional
television can be transported using the same channels as the classical television. One
way are the broadcast media DVB-T, DVB-C or DVB-S which are digital transmission
methods for terrestrial, cable or satellite communication. Another way is using the Inter-
net. On the Internet the predominant transmission mode is unicast, but multicast is also
used. Television on IP (IPTV) uses multicast streaming. Receivers need to subscribe to
a channel, that means a constant push data transmission of video data from a server to
its subscribers.
This Thesis is evaluating the possibilities to create a 2D-backward-compatible 3D-Tele-
vision service on IP based Networks using a protocol that offers Predictable Reliability
under Predictable Delay (PRPD) and implements error correction to avoid unnecessary
bandwidth consumption or quality loss introduced because of lossy network paths such
as a wireless connection at the receiver’s side. Backward-compatibility is a tool to save
bandwidth in the way that the base view only has to be sent once to all receivers –
meaning 2D and 3D receivers. In the case of a 3D receiver only the enhancement view
has to be sent additionally. After evaluating available protocols and techniques the most
promising setup is implemented and evaluated in detail. The goal is to set up a multi-
cast distribution network where each receiver can subscribe to either one or two camera
perspectives, depending on what it can process (one for 2D, two for plano-stereoscopic1
3D). Each server is delivering a single view using multicast. To achieve synchronicity,
it is necessary that all servers share a common time among each other. NTP and PTP
(IEEE 1588) are two possible solutions to this problem. The MVC standard can be used
to achieve a higher compression efficiency compared to independently encoding the views.
1In Plano-Sterescopic 3D two cameras in the distance of the human’s eyes capture a scene twice tocreate the illusion of depth for the viewer. The viewpoint is static, the viewer cannot decide to watchthe scene from another angle. From now on this is just referred to as “stereo”.
1
A solution that transmits the two views of stereo television separately has multiple
advantages. One advantage is, that it is possible to save bandwidth for those users that
cannot or do not want to watch 3D television by omiting the enhancement view, without
having to create two versions of the same television channel, one in 2D and one in 3D.
Another advantage is that it is possible for providers to create a flexible billing system for
2D and 3D television, where it is easy to upgrade from 2D to 3D. For the infrastructure
of IPTV it is beneficial, because it can better adapt to serve geographic demand in mesh
networking.
1.1 Background
The following list briefly describes some protocols and techniques that are currently
used in 3D video transmission and are important references for better understanding.
This list is not exhaustive and the protocols and techniques are not described in full
detail, but it should help in following the author’s thoughts.
Forward Error Correction (FEC)
Forward Error Correction is used in DVB-T, -C and -S. It is used, because it allows
receivers to correct errors that occurred during the transmission. As a retransmis-
sion of corrupted data packets is not feasible in broadcasting, FEC is a way to
accommodate for transmission errors. Also for the suggested approach it is use-
ful, as multicast transmissions profit from FEC in the same way that broadcast
transmission do.
Automatic Repeat reQuest (ARQ)
ARQ is an error correction mechanism which is used to retransmit lost or corrupted
data, either by sending a positive or a negative acknowledgement. In contrast to
FEC, it usually does not make sense to use ARQ on broadcast or multicast trans-
mission, because the number of ACKs or NACKs and the required retransmissions
would exceed the servers resources. This of course depends on the size of the mul-
ticast group. In a small multicast group with only a few receivers and a network
with very little packet loss or corruption the efficiency of ARQ can be better than
FEC.
2
Transmission Control Protocol (TCP)
TCP is one of the two most used protocols on IP. It is connection-oriented and
reliable and guarantees ordered delivery. Currently it is used for most of the video
transmission on the internet, because the reliable character of it is ideal for video-on-
demand websites like YouTube or movie stores like iTunes that have to serve users
within a heterogeneous landscape in terms of Internet access speed and quality. It
is ideal to use when on-time delivery is not important.
User Datagram Protocol (UDP)
UDP is the other of the two most used protocols on IP. It is connectionless and un-
reliable and does not guarantee ordered delivery. It allows multicast and broadcast.
Currently it is used for IPTV that is offered by some Internet providers for their
users or for transmission of important events on the Internet such as the World
Cup. UDP can be used for on-time delivery, but once again it is not reliable.
Stream Control Transmission Protocol (SCTP)
SCTP is a connection-oriented, reliable transport protocol with sequenced delivery.
SCTP can aggregate different paths. Unlike TCP it allows the sequenced delivery
to be discontinued, e.g. when a packet is lost. Then the delivered packets will
not be delayed until the replacement has arrived. This is useful for Voice over IP,
which it has been designed for. SCTP is currently not used in any notable video
transmission technique on the Internet. SCTP is listed for the sake of completeness.
Datagram Congestion Control Protocol (DCCP)
DCCP on the other hand is (like UDP) a real-time datagram service, but it is
connection-oriented and adds congestion control. Unlike SCTP it does not offer
sequenced delivery. DCCP is currently also not used in any notable video trans-
mission technique on the Internet. DCCP is listed for the sake of completeness.
MPEG2 Transport Stream (TS)
The MPEG2 Transport Stream is defined in [13]. It is used to encapsulate audio and
video and other services important to TV into a constant stream of data consisting
of packets of equal size (188 bytes) to allow efficient transmission. Its timestamps
are important for clock synchronization and playback speed. It is used in DVB-T,
3
-C and -S as well as in IPTV. MPEG2-TS is used in the implementation of this
Thesis.
Real-Time Transport Protocol (RTP)
RTP is a packet format that is also used to encapsulate audio and video and other
services important to TV into a constant stream of data. It offers sequence numbers
to restore a total order and timestamps. In addition it can provide some additional
services such as monitoring Quality of Service (QoS) with the help of the RTP
Control Protocol (RTCP). It is sometimes used in IPTV, but it is not used in this
Thesis, as the features that it offers are not necessary or are implemented in other
protocols used already.
As the effects of jitter and disorder that are common on IP networks do not exist
on DVB-T, -C or -S, RTP is not used there.
Advanced Video Coding (AVC)
AVC is a standard for video compression. As of today, it is one of the most used
video codecs. It uses independently decodable frames (I-frames), frames that require
previous frames (P-frames) and frames that require previous and next frames (B-
frames) for decoding. AVC is often used for IPTV and video-on-demand and also
for many mobile devices and in DVB. Many graphic cards have hardware support
for decoding AVC. AVC is used in this work due to compatibility reasons.
Multiview Video Coding (MVC)
MVC is an addition to AVC which allows multiview (3D) video to be encoded
more efficiently. The views are stored inside the same stream. The frames of
the enhancement views can also depend on frames of different views which saves
bandwidth, because nearby views are generally alike. This is explained in more
detail in Section 2.2.2.2. Next to AVC, MVC is used in the implementation to
transmit multi-view video.
Network Abstraction Layer (NAL-) Unit
A NAL-Unit is the smallest unit inside an AVC/MVC bitstream. The NAL-Unit
header carries information to which view perspective the payload belongs to and
4
can therefore be used to decide on which channel it should be sent. This is explained
in more detail in Section 2.2.2.2.
Service Compatible Mode
In 3D video transmission service compatible mode means that the 2D component
of the video can be played by devices that do not support 3D playback. The 3D
enhancement view is coded such that it does not appear on a 2D player. Modern
3D-BluRay Movies use this technique. Service compatible mode is implemented in
MVC, which is used in this Thesis.
Frame Compatible Mode
In 3D video transmission frame compatible mode means that the 3D video is
“squeezed” into a 2D video. The two view perspectives are transmitted in a sin-
gle 2D video stream where the two videos are either side-by-side or top-bottom
or aligned by some other means. This method looses video resolution and is used
for 3D broadcast television, because of the possibility to use existing transmission
protocols. However this work uses service compatible mode.
1.2 State of the Art
At the time of writing, not many 3D IPTV services have been commercially available.
A big provider of IPTV is the Deutsche Telekom GmbH (German Telecom) which
offers its internet landline customers a payed IP television service called “Entertain”2,
given that their bandwidth is large enough. Entertain uses DVB-IPI with RTP on multi-
cast for public channels3 ; private channels are transmitted using a proprietary protocol.
Since august 2011 a 3D channel is also available4. As the Telekom is internet and service
provider at the same time, it can prioritize the multimedia data on the internet connec-
tion. This is done by creating two VLANs, one for Entertain and one for everything else5.
2more information at http://www.telekom.de/entertain3from http://www.ard-digital.de/Empfang--Technik/IPTV/Software-Download/Software-Download
4see http://www.netzwelt.de/news/87712-entertain-telekom-startet-3d-tv-ab-august.html5described in http://www2.lancom.de/kb.nsf/1275/637DAC48142CA0F8C1257646004D6848?OpenDocument
5
As described in 6 and 7 Japan’s national public broadcasting organization “Japan
Broadcasting Corporation” NHK (Nippon Hoso Kyokai) has setup a hybrid broadcasting
solution, where the primary 2D view is transmitted using classical broadcast technologies
like DVB-C or DVB-S and the enhancement view is transmitted using an IP connection.
The exact technology behind remains undocumented, however the reason to create this
solution for the NHK engineers was to be able to transmit the two views in full resolution
compared to the approach currently in use, where only half the resolution is available for
each eye.
6see http://www.boljour.de/2011/05/broadcast-und-internet-in-3d-vereint/7see http://techon.nikkeibp.co.jp/english/NEWS_EN/20110526/192128/
6
2 Basic Concept
In this section the main problem will be stated, the complete system architecture will
be presented and the underlying techniques for the network protocol, the multiplexing
and the synchronization will be explained and compared in detail.
2.1 Problem Statement
For various reasons explained in more detail in this section the currently available
approach to transmitting 3DTV on IP has some disadvantages. 3DTV can for example
be transmitted using existing IPTV techniques such as DVB-IPTV, which is a multicast
solution, or on YouTube, which is a unicast solution and provides video-on-demand.
There are two major problems that are not addressed: First the current solutions do not
really deal with backward compatibility; when frame compatible mode is used for example
the video can still be played by old receivers, but there are noteworthy constraints: either
the viewer has to live with a divided view when his receiver does not understand the
proper upscaling or when it does do the upscaling the resolution is diminished. Second it
either uses unicast TCP when HTTP is used which results in a whole lot of unnecessary
bandwidth usage or multicast UDP (often together with RTP, e.g. DVB-IPTV) which
results in problems when it comes to packet loss or corruption. An example for unicast
HTTP/TCP is YouTube as video hosting platform or iTunes as video-on-demand (VoD)
service. VoD services are due to their characteristics always unicast. An example for
multicast UDP is the Entertain IPTV service by Deutsche Telekom.
Therefore a new technique to transmit 3DTV on IP is needed which offers backward-
compatibility to devices that can only represent 2D video as well as transmission with
Predictable Reliability under Predictable Delay (PRPD). Predictable Delay means that
every data packet reaches its destination after some fixed amount of time with no or
negligible small jitter. Predictable Reliability means that at most a certain amount of
packets gets lost or cannot be transmitted on time. The aim of a protocol that implements
PRPD is to minimize the error rate under a given delay.
The technology to transmit 3DTV on IP has some noticeable drawbacks. The following
is typical for current 3D television on IP:
7
• Multiple views are transmitted on the same channel sent by a single server which
means that devices that can only render 2D video either waste bandwidth (when
service-compatible mode is used, e.g. in MVC) by having to receive the extended
view or are not being able to play the video at all.
• In cases where only a 2D transmission technique is available 3D video is transmitted
using the regular 2D video coding where two frames are “squeezed” into one, either
side-by-side or top-bottom (frame-compatible mode, e.g. in AVC). This not only
reduces the spatial resolution by 1/2, but 2D receivers can not really watch the
channel, at least not nicely [20]. Even when options are set that order the 2D
receiver to scale the base view up to the whole screen it is not guaranteed that
every receiver (especially old ones) have it implemented correctly.
• Unfortunately unicast is sometimes used even in cases where broadcast television
is transmitted or for video-on-demand or download services. Unicast transmission
has the advantage that it is possible to use reliable protocols such as TCP where
the data is guaranteed to reach its destination (as it is done in Internet TV over
HTTP), but one has to send the data n times for n receivers.
• In some scenarios multicast is already used (e.g. in IPTV) which allows to send the
data once for n receivers, however it is then not guaranteed that the data reaches
all receivers in the full extent. RTP often extends multicast transmission to provide
support for some additional features such as Quality of Service (QoS).
• TCP and UDP - the two major IP based protocols - have disadvantages that make
them hard to use for multimedia transport. Audiovisual applications have strong
constraints on reliability and delivery time. TCP uses positive acknowledgements
which allows for a reliable connection, but cannot provide a given delivery time
and makes multicast transmissions infeasible. In contrast, UDP allows for timely
delivery, but does not provide reliable connections, because it does not have ac-
knowledgements at all. [16]
8
• The Stream Control Transmission Protocol (SCTP) is a connection oriented reliable
transport protocol with sequenced delivery. It uses positive acknowledgements like
TCP and therefore does not support multicast. [16]
• DCCP on the other hand is (like UDP) a real-time datagram service (without
sequenced delivery), but it adds congestion control. However it also uses positive
acknowledgements which again makes multicast infeasible. [16]
To sum this up the main disadvantages of the current technologies are the ineffectual
backward-compatibility which requires unnecessary bandwidth consumption or affects
the spatial resolution as well as the use of non-media-aware transport protocols such as
TCP or pure UDP without any extension.
A possible solution to these disadvantages is to split the 3D-video into two parts, where
one is the base view only and the other is the enhancement view only. The two views
can then be sent via distinct multicast groups (preferably from distinct servers) to the
receivers and each receiver can decide (depending on his display and bandwidth capa-
bilities and personal preferences) to subscribe to the base view only or to the base and
enhancement view. In addition a protocol can be used that offers Predictable Reliability
under Predictable Delay (PRPD). Distinct Servers should be used for various reasons.
One reason is to distribute the load. Another reason is that it allows to position the
servers depending on the geological needs. In an area where no 3D receivers are located
but only 2D receivers it is not necessary to set up a secondary server for the enhancement
view.
Multiple problems exist when implementing this 2D-backward-compatible 3D-Tele-
vision service on IP based Networks . One problem is how to split the video stream into
two streams and restore the same total order at the receiver. Another problem is how to
synchronize the two servers such that they work absolutely isochronic. A third problem
is how to transmit the video to many receivers at once in a backward compatible and
media friendly way (i.e. low delay) while minimizing transmission cost. These problems
have not been answered in this context so far and will be solved in the following sections.
9
2.1.1 System Architecture
The System to be implemented consists of two servers. One server is the base server.
It is responsible for streaming the base view. The other server is an enhancement server
which is responsible for streaming the enhancement view. The servers are needed to
have the same local time, such that corresponding frames of the different views are sent
to the receivers at the same time. To sync the two servers, a common time server will
be considered. Multicast is used for the transmission because multicast allows multiple
receivers to connect without having to send the data multiple times. Each view has its
own multicast address, such that receivers can connect to the view(s) that they want.
In addition the system should be able to compensate for errors during transmission and
provide a predictable delay such that the two views are received contemporaneously, no
matter what the routes have been. The receivers run a service which receives either
one (the base stream only) or both (base and enhancement) streams and constructs a
standard MVC stream which can be displayed by any MVC or AVC capable player. Also
see Figure 2.1.
10
3D 2D3D
Common Time Source
BaseView
EnhancementView
PCR
PRRT PRRT
Figure 2.1: Architecture
2.2 Essential Techniques
As explained before, this section deals with the technical backgrounds of parts that will
be used in this thesis. Most of the information is extracted from international standards
or has been researched by others. The linkage of the tools in this section will be explained
in the next section.
The following techniques listed here and explained on the next pages are essential for
the realization of this project:
Network Protocol: For the transmission a protocol is needed that is capable of using
multicast and at the same time allows for error correction. This predictable reliabil-
11
ity needs to be achievable under predictable delay (PRPD). Section 2.2.1 explains
a protocol that has these features.
Multiplexing: For 3D video a code is needed that is backward compatible and easy to
demultiplex and remultiplex. Also a format for transmission is needed which takes
care of the challenges of media transmission. This is explained in Section 2.2.2.
Server Synchronization: As explained earlier it is also very important to keep the me-
dia servers synchronous, such that corresponding frames arrive at the receivers at
(almost) the same moment in time. For this Section 2.2.3 explains the necessary
tools.
2.2.1 Network Protocol
Media transport has very specific demands for the transport protocol. First it needs
to support multicast such that multiple receivers can be served at once. It also needs to
strive to minimize data loss (that all the data actually reaches the receivers) (reliability)
and also that the transmission delay is constant to a certain degree. From the presented
protocols, UDP is the only one that allows multicast, but it has no error correction.
2.2.1.1 PRRT
In [7] a protocol called Predictably Reliable Real-time Transport (PRRT) has been
developed that addresses the issues with media transport that TCP/UDP and SCTP/D-
CCP have8. It uses an Adaptive Hybrid Error Correction (AHEC) scheme that provides
Predictable Reliability under Predictable Delay (PRPD) and combines the advantages of
Forward Error Correction (FEC) and Automatic Repeat reQuest (ARQ).
More information and examples on this protocol can be found in [6–9, 15, 16].
What is needed for the protocol described is especially the predictable reliability which
is essential to provide a quasi error free (QEF) video playback. Predictable reliability
is therefore very important. For the timely reception of the video data also predictable
delay is needed, because all receivers are using a buffer to compensate for jitter, but the
8More information can be found at http://www.nt.uni-saarland.de/projects/prrt/
12
larger the buffer the longer a user has to wait when he tunes in or switches a channel.
Predictable delay gives an upper bound for the buffer.
PRRT fulfills these necessary constraints to provide an IP based multicast backward
compatible 3D transmission service.
2.2.1.1.1 Functional Description
PRRT is a protocol that efficiently supports multimedia services such as three dimen-
sional television, but also other services such as tele-presence. Its hybrid error correction
allows both proactive (FEC) and reactive (ARQ) reliability mechanisms. The architects
strived to create an optimal trade-off between those mechanisms while achieving the
desired delay constraint [8].
This is done, because pure retransmissions can only guarantee partial reliability when
a target delay is in effect. The number of retransmissions is limited by the round-trip
time (RTT). Predictable reliability can be achieved by using an optimal ratio of proactive
and reactive redundancy under the given time constraint [9].
Packet LossPacket-levelFEC Encoder
Packet-levelFEC Decoder
NACKParameter CalculationData
SourceDataSink
D target
k k
Figure 2.2: AHEC Architecture [8, p. 2]
Figure 2.2 shows the AHEC architecture that PRRT uses. k is the FEC information
length and Dtarget is the desired total transmission delay. The Figure also depicts that the
ARQ uses negative acknowledgements (NACKs) and that the FEC is adaptive. Lost or
erroneous data packets can be corrected by FEC independently of the connections round-
trip time (RTT) or by explicitly flagging the packets as lost using the ARQs NACKs.
13
In addition to the information length k the FEC code takes a parity distribution vector# »
Np as an argument. The parity packets are split into portions of size Np[w]. Each of these
parity portions are transmitted in their corresponding transmission cycle w.
# »
Np = (Np[0], Np[1], . . . , Np[r])
This vector defines “the portions of parity for r+1 cycles, where# »
Np[0] refers to the initial
transmission immediately after the k data packets. The portions for 1 ≤ w ≤ r are
sent upon receiving a negative acknowledgment (NACK) from the receiver, indicating
that additional parity is required to recover lost data packets. The properties of optimal
erasure codes ensure the recovery of all data packets as soon as arbitrary k packets from
the same block are available at the receiver.” ([8, p. 3]).
After the encoding the data and Np[0] parity packets are sent in advance. Note that
Np[0] can also be zero, in which case no parity packets are sent in the first round. Parity
packets for the upcoming rounds need to be stored at the sender in a local packet cache
depending onDtarget. The receiver can check for completeness by using a sequence number
gap detection. Received parity packets can be used to recover lost data; however if the
available parity is insufficient a NACK can be sent to request more parity, at most r
times. These NACK messages not only trigger more parity to be sent by the sender,
but are also used to adapt the AHEC parameters together with the RTT to better suit
the network conditions [8]. The packet erasure rate Pe, the number of receivers R in a
multicast group and the average source packet interval Ts (or data rate depending on the
packet size) are also used to define the parameters [9].
Experimental data of optimal values for different network transmissions can be seen in
Table 1 in [8, p. 4]. A key conclusion of those experiments however was, that the optimal
parameters are strongly RTT dependent.
2.2.2 Multiplexing
For this project there exist two noteworthy media types: Stereo video (also known as
3D video or 3DV) and free viewpoint video (FVV). Multi-view video coding (MVC) is
used for wrapping the 3D video stream.
14
2.2.2.1 3DV / FVV
3DV is considered to be 3D television these days. Two cameras in the distance of the
human’s eyes capture the scene to create the illusion of 3D for the viewer. However the
viewpoint is static, the viewer cannot decide to watch the scene from another angle. In
3DV there is one base view that is a common 2D video stream and an enhancement view
which is not only coded on a temporal base, but also on a view base. Instead of storing
two scenes (base and enhancement) it is also possible to store a scene (base) and a depth
or disparity map (enhancement). [1, 20, 21]
For the first solution the video for the left eye is compressed as it would be for con-
ventional 2D video. The frames in the video for the right eye then are compressed with
reference not only to preceding and succeeding frames, but also to the corresponding
frames of the video for the left eye, because they are very similar. For the second solu-
tion, a video plus depth format can be used. The video for the left eye is again compressed
as it would be for 2D video. Then an additional depth map is saved, that means another
video which is grayscale where dark means close and white means far away (or the other
way around). An advantage of this compared to the conventional way is that it is possi-
ble to alter the disparity at will. It can again be compressed using conventional codecs
[21, 23].
15
Figure 2.3: Top: Base and Enhancement View;Bottom: Base View and Depth Map9
Free viewpoint video on the other hand enables the viewer to choose from where he
wants to view the scene. Many cameras are needed to shoot a scene in FVV. This has
the advantage that the viewer can choose its viewpoint, either by choosing which two
streams (one for each eye) to decode, or if a device is capable of displaying more streams
by moving itself. In the first case it is problematic to decode views when the base view
is not included, because it is always needed for decoding any enhancement views. If the
enhancement views are too far away even the intermediate enhancement views are also
needed. Therefore in FVV it is not possible (without extra work) to receive a number
of arbitrary channels. When using depth maps for FVV one also has to consider that at
some point there are not enough picture information available to reconstruct the image.
Therefore when using depth maps, it might be necessary to insert additional enhancement
or base views as scenes. [1, 20, 21]
9Sources: http://www.flickr.com/photos/57605784@N06/5437070989/ and http://mangtrithuc.net/173566-Chuy%E1%BB%83n-t%E1%BB%AB-2D-sang-3D-:-nh%E1%BA%ADn-%C4%91%C3%A0o-t%E1%BA%A1o-v%C3%A0-chuy%E1%BB%83n-giao-c%C3%B4ng-ngh%E1%BB%87.html
16
Figure 2.4: Free viewpoint video, interactive selection of virtual viewpoint and viewingdirection within practical limits [21, p. 319]
2.2.2.2 MVC / AVC
Multi-view video coding (MVC) is a way to encode “a set of N temporally synchro-
nized video streams” [21, p. 314]. 3DV means N = 2, where FVV has N > 2. MVC is a
new profile of the H.264/AVC standard. It not only uses interframe prediction, but also
interview prediction. It is backward compatible to the AVC base view; in other words a
device that does not understand MVC but AVC can play the (2D) base view. The other
views are called enhancement views. The media data in H.264/AVC is organized in so
called network abstraction layer (NAL) units. For the enhancement views in MVC a new
NAL unit type is used, which is just ignored by pure AVC decoders. The view (base or
enhancement) can be identified by the corresponding NAL unit header [1, 20]. The NAL
unit header information can therefore be used to demultiplex the MVC stream. Also see
Figure 2.5.
17
Figure 2.5: MVC NAL unit interface [20, p. 673]
A challenge when using FVV is the complex prediction structure used in MVC. When
3DV is used, frames from the base view only depend on other frames of the base view
and the frames from the one enhancement view depend on frames from the base view or
the enhancement view. This can be seen in Figure 2.6.
I b B b B b B b I
B b B b B b B b B
b B b
b B b
T 11T 10T 9T 8T 7T 6T 5T 4T 3T 2T 1T 0
S1
S0
Figure 2.6: Typical MVC prediction structure for 3DV [1, 24]
However when it comes to FVV the prediction structure is more complex and frames
from an enhancement view do not solely depend on frames from the base view and the
same enhancement view, but also on frames from other (multiple) enhancement views.
This can be seen in Figure 2.7.
18
I b B b B b B b I
B b B b B b B b B
P b B b B b B b P
B b B b B b B b B
P b B b B b B b P
B b B b B b B b B
P b B b B b B b P
P b B b B b B b P
b B b
b B b
b B b
b B b
b B b
b B b
b B b
b B b
T 11T 10T 9T 8T 7T 6T 5T 4T 3T 2T 1T 0
S7
S6
S5
S4
S3
S2
S1
S0
Figure 2.7: Typical MVC prediction structure for FVV [1, p. 3]
2.2.2.3 MPEG2-TS
MPEG-2 Transport Stream (TS) is the common transmission format for DVB systems.
It is not only used in DVB-T/-T2 (terrestrial), -C/-C2 (cable) and -S/-S2 (satellite)
television, but also in DVB-IPTV.
AnMPEG transport streammultiplexes multiple packetized elementary streams (PES).
This can for instance be a video and an audio stream, but also multiple of those. The
TS deals with synchronization in two ways: First it makes sure that corresponding audio
and video streams are played out concurrently to achieve “lip-sync” and second it makes
sure that the stream is played back at the same speed as the sender to avoid buffer over-
flow or underflow. Two types of timestamps are provided by an MPEG TS. One is the
presentation timestamp (PTS) which is used for synchronization among two PESs. It
defines the exact moment when an audio or video frame should be played. The other is
the decoding timestamp DTS. It is used to restore the original order inside a PES and
thus assures that the buffer does not fill up or run empty.
As the local clocks in sender and receiver are usually not running at the same speed,
it is necessary to sync them as well to avoid buffer over- or underruns. These drifts
are corrected by the program clock reference (PCR) which is also in the TS: In periodic
intervals a combination of a 90KHz (33 bits) and 27MHz (9 bits) resolved timestamp is
19
included in the stream. The 33 bits resolved in 90KHz are the base where the 9 bits
resolved in 27MHz extend the base. As the resolution is higher the extension allows for
a higher precision between two base timestamps. [12, 13]
In ISO/IEC 14496-10:2004 [14, Annex B] it is described how to code NAL-Units in a
bytestream:
The first NAL-Unit in the bytestream is preceded by a four byte sequence which is
0x00000001. Every other NAL-Unit is preceded by a three byte sequence which is
0x000001. In addition each NAL-Unit can be succeeded by one or more null bytes (0x00).
As explained before, the NAL-Unit header is one byte, but it can be three additional
octets for MVC NAL-Units. The first byte of the header consists of a zero bit (a bit
which is always zero), the next two bits indicate whether the data in the NAL-Unit is
used by upcoming frames or if it can be discarded after decoding. The five remaining bits
in the one byte NAL-Unit header is the NAL-Unit type. A value of 14 or 20 indicates
an extended header for MVC. That means that whenever a NAL-Unit header has one of
those two types that NAL-Unit can be ignored for decoding when only 2D-playback is
desired.
The resulting bytestream is a AVC/MVC Elementary Stream (ES). It needs to be
packetized and encapsulated once again into a Packetized Elementary Stream (PES) as
described in ISO/IEC 13818-1:2000 [13, Section 2.4.3.7]:
Sync & Header
Payload
Elementary Stream
Packetized Elementary Stream
Transport Stream
}NAL-Unit
Figure 2.8: NAL-Units in the Elementary Stream in the Packetized Elementary Stream in theTransport Stream
20
The header of a PES-packet begins also with a three byte sequence of 0x000001 which is
followed by a one byte stream id. The next two bytes define the packet length. Then there
can either be the data of the elementary stream or additional header values depending
on the stream id. See [13] for a complete list. The optional PES header can end with up
to 32 stuffing bytes with value 0xFF to meet the requirements of the channel (e.g. the
transport stream).
These PES packets can now be encapsulated into TS packets. The PES packet can
at most be 184 bytes, because the TS packet header is at least 4 bytes but can also
be longer and a TS packet is always exactly 188 bytes in length. This is described in
ISO/IEC 13818-1:2000 [13, Section 2.4.3.2]:
The header of a transport stream starts with a sync byte which is always 0x47. Then
three bits that indicate transport error, payload unit start and transport priority followed
by 13 bits that hold the PID. Then two bits control the transport scrambling and the
next two bits indicate if this transport stream packet contains an adaption field and/or
payload. The next four bits are a continuity counter, followed by the optional adaption
field and payload. If an adaption field is present the first byte of it contains the whole
length of the adaption field. It can also be succeeded by stuffing bytes with value 0xFF .
2.2.2.4 Analysis
The implementation is going to focus on stereo video (3DV) and the backward-com-
patibility to 2D video. It is necessary to go down in the protocol stack through the
Transport Stream and the Packetized Elementary Stream inside the Elementary Stream
to get to the NAL-Units. The NAL-Units inside the TS-Packets decide whether one TS-
Packet belongs to the base view and is sent over the primary server or belongs to the
enhancement view and is sent over the secondary server. A possible solution to bring the
TS-Packets sent over two different routes into the correct original total order is to add an
identifier (e.g. a sequence number) in front of every TS-Packet before transmitting them.
The same solution could also provide FVV and at the same time be compatible with
3DV and 2DV [19]. MPEG2-TS is going to be used to take care of the transportation on
PRRT/IP, especially for providing a clock for the receivers.
21
2.2.3 Synchronization
PRRT claims to be able to compensate for the different transmission delays; minor
delays in the transmission that PRRT does not handle can be compensated for by a
small local client buffer. Nevertheless it is necessary that all servers share a common
time, such that associated frames are being sent to the receivers at the same moment.
Of course the transmission delay to the receivers can be different for each receiver and
for each of the streams, but this cannot be addressed by the servers but needs to be
handled by the receivers, e.g. by using a buffer. This buffer can compensate for minor
differences that are caused by the transmission delays and also different server times.
However, if the server clocks are off by tens of seconds or even minutes the buffer has to
be correspondingly large. This would result in the fact that a customer has to wait that
amount of time when he first switches to a channel until the buffer is filled.
Deeths and Brunette [2] gave an example to illustrate this necessity: if one server’s
seconds are 0.04 milliseconds longer than another servers seconds the two clocks will
deviate by 21 minutes after one year. For an extended movie of 3 hours length the
difference would be 432ms or 11 frames (at 25fps). Therefore it is necessary that the
server clocks deviate as little as possible to minimize the size of the buffer and therefore
the delay for the user.
On the next pages two protocols that can be used for synchronizing the servers will be
explained. The first is the Network Time Protocol (NTP), the second is the Precision
Time Protocol (PTP) as specified in IEEE 1588.
2.2.3.1 NTP
The Network Time Protocol is one of the most popular time synchronization protocols
that is used in the Internet. It is important to note, that it has been designed for WANs.
It typically achieves accuracy down to a few tens of milliseconds (or less in a WAN) and
can be as good as sub-milliseconds in LANs. [2, 4]
Although there are servers that are directly connected to high-precision clocks it is not
exclusively a master-slave system. In NTP peers synchronize to the best clock, where
clients synchronize to a server (or to multiple servers). [2]
22
NTP is also helpful for the accuracy of the clock when the connection to the server is
disturbed. A constant drift can be corrected from the history. [2]
2.2.3.1.1 Functional Description
The offset is determined by considering the whole path from the server to the client
for the delay measurement assuming that both directions have the same delay. [22]
As it can be seen in Figure 2.9 the client initiates the communication in NTP. At time
T1 the client sends a request to the server which receives it at time T2. At time T3 the
server sends the two values T2 and T3 to the client which receives it at time T4. With
these four timestamps the client can now calculate the local clock offset and the round
trip delay. For this two intermediate values are calculated:
a = T2 − T1 and b = T4 − T3
The clock offset is then calculated by
O =a− b
2
and the round trip delay by
D = a+ b
θ0
T1 T4
T2 T3B
A
Figure 2.9: Measuring Delay and Offset [18, p. 3];A = Client, B = Server
23
Once again it is important to note that the accuracy is directly dependent on the as-
sumption that the delay is symmetrical in both directions. As the delay is usually not
symmetrical the uncertainty is ≤ D/2. [18, 22]
NTP is usually configured to use more than one server which is more robust, but to
achieve the best accuracy at multiple devices it is better to use a single server. NTP
daemons usually have a polling interval of 1024s or they are self adjusting between 64s
and 1024s. This can however also be larger or as small as 16s. [22]
2.2.3.2 PTP
The Precision Time Protocol as specified in IEEE 1588 is another time synchronization
protocol. It can correct clocks up to an accuracy of less than one microsecond [25].
However it is only designed for LANs, unlike NTP which is designed for WANs.
PTP has a strict Master-Slave approach. The best clock in the network is the Master,
all other clocks are slaves which synchronize to the Master; first the offset and then the
drift. [25]
One fact to consider when nano-second accuracy should be achieved is that the algo-
rithm expects the transit delay in both directions to be the same. Even in Ethernet LAN
this is not the case, because the wires in an Ethernet LAN cable do not all have the same
length. [3, 25]
Unlike NTP, PTP uses multicast and can synchronize multiple computers at once.
Therefore it is – technically – possible to use PTP not only on Ethernet, which it has
been designed for. [25]
As in NTP the offset determination can consider the whole path for the delay measure-
ment, but as PTP claims to achieve at least microsecond or even nanosecond accuracy
one has to consider that every hop (that includes hubs, switches, gateways, etc.) intro-
duces delay. A hub typically introduces network jitter in the area of 300-400ns. A switch
400ns plus 2-10µs processing time. One also has to consider that these devices have a
queue in which data packets are processed and a single very long packet can delay the
transmission by up to 122µs. Therefore PTP introduces the concept of Boundary Clocks.
A Boundary Clock can be included in any of the just mentioned devices, e.g. a switch.
24
It acts as a slave to the side of the network where the master is and as a master to the
side of the network where the slaves are. That means that it syncs to the master and all
slaves that are connected to the switch sync to the switch which is then their master and
not to the original master. This way the internal latencies of the device will not affect
the synchronization accuracy. This is also shown in Figure 2.10. [3, 25]
Figure 2.10: Network topology and clock types [25, p. 5]
2.2.3.2.1 Functional Description
In PTP timestamps can be taken close to the hardware, ideally directly in the network
card (if supported). This way PTP does not have to wait for the processor, as it can be
busy which would delay the process and diminish the accuracy. [3, 25]
As it can be seen in Figure 2.11 the server initiates the communication in PTP. At
time t1 the server sends a sync message to the client which receives it at time t2. In a
follow up message the server sends the previously acquired timestamp t1 to the client.
After reception the client sends a message at time t3 to the server which receives it at
time t4 and sends this timestamp to the client. With these four timestamps the client
25
can now calculate the local clock offset and the delay. For this two intermediate values
are calculated:
a = t2 − t1 and b = t4 − t3
The clock offset is then calculated by
O =a− b
2
and the delay by
D =a+ b
2
Once again it is important to note that the accuracy is directly dependent on the
assumption that the delay is symmetrical in both directions [3, 25].
Master Clock Time Slave Clock Time
Data atSlave Clock
Follow_Up messagecontaining value of t1
Delay_Resp messagecontaining value of t4
t1
t2
Sync message
Delay_Req message
t2
t1, t2
t3
t4
t1, t2, t3
t1, t2, t3, t4
t2m
t3m
time
Figure 2.11: Offset and Delay Measurement [3, p. 19]
26
In PTP the sync interval is a lot smaller than in NTP, namely only 2s by default.
[3, 25]
2.2.3.3 Analysis
Although PTP promises the better accuracy it is also more complex and more difficult
to setup. It has the huge advantage that it is not necessary to take the delay from the
server to the client as one large path, but it can sync the time on smaller hops. This
however is only possible with hardware support. With respect to the implementation the
accuracy of NTP seems to be good enough, because it promises to achieve an accuracy
down to a few tens of milliseconds or less in a WAN. Considering the frame rate of 25fps
meaning that two frames are 40ms apart already a very small buffer is able to compensate
for lag which is more than half the gap between two frames (20ms). Therefore in the
implementation experiments are going to show if NTP is good enough and if it is not
PTP can still be used to provide the desired accuracy.
In addition it is important that clients run at the same speed as the servers do (al-
though they don’t need to have the exact same time). Otherwise the client buffer could
become empty if the client’s clock runs faster or it fills up, if it is slower. Fortunately
the PCR in MPEG2-TS takes care of this already. The receiver uses the PCR values
to recover the senders system clock by putting the values through a phase-locked loop
(PLL). The Presentation Time Stamp (PTS) and Decoding Time Stamp (DTS) that are
in the elementary stream are aligned towards the original system clock recovered through
the PLL. For more details see [13, Annex D].
27
3 Proposed Method
The idea behind this work is to create a protocol which is able to provide 3D television
over an IP network while keeping the necessary bandwidth consumption as low as possible;
both in terms of redundant data transmission (i.e. using multicast instead of unicast) as
well as in backward compatibility (i.e. not having to transmit enhancement view(s) to
receivers that can only display 2D). The main problem of this is to keep the delay as low
as possible. To have a more effective load distribution in mesh networking such as the
Internet it is advantageous to distribute the views among different servers. This however
introduces new problems. It is necessary to synchronize the servers such that they begin
transmitting the views of the video at the same point in time and in addition they need
to maintain the same pace, such that corresponding frames from different views can be
received concurrently (with the limitation of different transmission delays due to different
network routes).
To achieve this the protocol consists of three components where one part depends on
the two others:
3.1 Transport Layer
Nowadays television is mostly transmitted using broadcast media like satellite, cable or
terrestrial. However television on IP is getting more and more important. 3D television
will be the next important step in broadcast television. For current 3D television on IP
it is typical that multiple views are transmitted on the same channel (service-compatible
mode, e.g. in MVC) sent by a single server. Another common method is to use only a
single video stream which carries two videos with half the original resolution, either side-
by-side or top-bottom (frame-compatible mode, e.g. in AVC). IPTV often uses multicast,
but sometimes also unicast; especially for fast channel switching within a multicast envi-
ronment. Video-on-demand or download services always use unicast. For IPTV UDP is
used in most cases; for video-on-demand or download services it is in almost all cases TCP.
28
To take a look at the protocol stacks that are usually used Figure 3.1 shows typical
protocol stacks for internet television where Figure 3.2 shows the actual protocol stacks
of some DVB implementations, especially DVB-IPTV.
Figure 3.1: Streaming protocol stacks [11, p. 698]
Figure 3.2: Digital TV delivery protocol stacks (DVB as an example for underlying layertechnologies) [20, p. 674]
In Section 2 some protocols have been discussed. TCP uses positive acknowledgements
to achieve reliable data transmission, but it is achieved using pure retransmissions only,
which is infeasible for many clients. In addition, it cannot provide a maximum delivery
time, due to its retransmission nature. Also it does not support multicast. UDP on the
other hand allows multicast and also provides timely delivers, but it has no error correc-
tion mechanism. Lost or corrupted data cannot be recovered as neither acknowledgements
29
nor parity is available. SCTP has the same set of problems as TCP: It also uses positive
acknowledgements and therefore does not support multicast. In contrast DCCP does not
provide sequenced delivery (as TCP and SCTP do) and is therefore a real-time protocol.
However it still sticks to positive acknowledgements which again makes multicast infea-
sible. As these protocols all have drawbacks for multimedia transport, another protocol
has been evaluated:
As explained in detail in Section 2.2.1 PRRT is a network protocol that allows multi-
cast and guarantees Predictable Reliability under Predictable Delay (PRPD). PRRT is
at the transport layer level and transmits all data between the servers and the receivers.
At the time of conducting the experiments the PRRT protocol was not yet able to
derive the optimal parameters autonomously. Therefore it was necessary to set them
manually for the experiments. As all experiments have been carried out on LANs it was
possible to assume a rather small Dtarget of 100ms. In addition the tests were done using
a single client only which makes it possible to set k to 1, making the code a repetition
code.# »
Np is set to (0, 1, 1, 1) allowing three rounds of repetition.
3.2 Application Layer: Time
As explained in detail in Section 2.2.3 NTP is an industry-standard solution for clock
synchronization in distributed systems. NTP will be applied to synchronize the servers.
It is also recommended to synchronize the receivers using NTP, but it is not absolutely
necessary, as the MPEG2 Transport Streams deals with the synchronization between
sender and receiver anyway – without setting the local clock though.
To achieve the best possible accuracy with NTP the polling has been set to the smallest
possible value of four, which means the daemon polls every 24s = 16s. The result of this
can be seen in Section 4.1.
30
3.3 Application Layer: Data
At the time of writing no MVC video data nor MVC encoder or decoder were available.
Therefore to get some interim results a classical setup with two full AVC streams has been
considered.
This section explains how to implement the necessary tools using the techniques de-
scribed in Section 3.1 and 3.2.
3.3.1 Time analysis
Time
Server Client
NTP NTP
UDP... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...
Figure 3.3: Synchronized server/client architecture that sends timestamps using UDP
The first activity was to write an application that evaluates the delay of network trans-
mission and the usability of NTP. To get started a simple UDP server/client architecture
was implemented. The two hosts were synchronized to a common time server as described
above in Section 3.2. To be more realistic to the actual use case the UDP packet size was
extended to the size of a typical TS packet, namely 7 ∗ 188byte = 1316byte. These first
experiments already indicated that NTP is adequate for the desired application; detailed
experimental results can be seen in Section 4.1.
31
Time
Server Client
NTP NTP
PRRT... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...
Figure 3.4: Synchronized server/client architecture that sends timestamps using PRRT
The next step was to implement the very same not on UDP, but on PRRT to find out
the effects of PRRT on the delay of the transmission. Again the packet size was extended
to the size of a typical TS packet. The Dtarget of PRRT was set to 100ms; that means
that every data packet should be at the receiver after 100ms. Detailed experimental
results can be seen in Section 4.1.
Time
Server 1
Client
NTP NTP
PRRT... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...
Server 2 ... 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 Timestamp 〉 ...PRRT
Figure 3.5: Synchronized dualserver/client architecture that sends timestamps using PRRT
In a third step the described architecture was taken into account and the delay for two
servers was measured. To achieve this a multi-threaded receiver was written which is
capable of receiving two concurrent streams from two distinct servers. For this detailed
statistics on the distribution of the data packets in terms of delay and loss were recorded.
They can be seen in Section 4.1.
32
3.3.2 TS analysis & streaming
The next step was to write a Transport Stream parser which is able to extract the
PCR value which appears every few packets10, because the PCR value is important for
the server to determine the pace in which to send data in the stream.
Server ClientUDP
... 〉 Payload 〉 Payload+PCR 〉 Payload 〉 Payload 〉 Payload+PCR 〉 ...Transport Stream
File Player
Figure 3.6: Synchronized UDP server/client transport streaming
A tool which is able to stream a MPEG2-TS in the correct speed (which can be deter-
mined by the PCR) over the network to another host had to be written. To minimize the
source for errors in the implementation the first version used UDP, such that a standard
software player such as VLC can play the video. A problem with the PCR had to be
faced: One cannot count on the first PCR value starting at zero for a video. This seems
to be clear for all kinds of videos such as TV recordings (which are in TS format already)
or excerpts from DVDs (newly encapsulated in TS format) and is even more clear when
thinking about broadcast, but even when wrapping a video file in TS, common tools
(such as FFmpeg) do not necessarily start at zero. Therefore it has to be considered that
the PCR can start at any number.
10At least every 100ms according to MPEG; at least every 40ms according to DVB
33
3.3.3 Two Server TS streaming
Server ClientPRRT
... 〉 Payload 〉 Payload+PCR 〉 Payload 〉 Payload 〉 Payload+PCR 〉 ...Transport Stream
File File
Play
erU
DP
Figure 3.7: Synchronized PRRT server/client transport streaming
After having proven all the individual components to work, it was now time to cre-
ate applications that glue all the above together. As before a one server solution has
been written first, which is basically the same as the one with UDP, but using PRRT
instead. The receiver has been created twice: one version writes the received data onto
the harddisk, the other version forwards the data that has been transmitted with PRRT
using a UDP socket. This can be any host, but most likely is localhost. This way a
standard player (again such as VLC) can be used to play the video by just listening on
the loopback interface where it is very unlikely that errors occur, when the system is not
under heavy load.
Server 1 ClientPRRT
... 〉 Payload 〉 Payload+PCR 〉 Payload 〉 Payload 〉 Payload+PCR 〉 ...Transport Stream
File
File
Play
erU
DP
PID
Server 2
File
PID
Multi-plexer
Recv
2Re
cv1
PRRT... 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 ...
TS-packets
Star
ttim
e
TCP
Figure 3.8: Synchronized PRRT dualserver/client transport streaming
34
In the Timestamp experiment in Section 3.3.1 the concept of multithreading was al-
ready introduced. The receiver application (which again exists in two versions: as a file
writer and as a UDP relay) needs to use two threads to receive two streams from two
servers and at the same time a third thread needs to act as a multiplexer which sorts the
packets into the correct order again.
It is of course necessary to set some time limit to this; otherwise the receiver would
become stuck if only a single packet is missing or in the wrong order. To be able to
bring the TS-packets in the correct order some additional logic is necessary. PRRT does
guarantee that the order of all data packets sent is maintained, however this is of course
only true for a single PRRT channel, not for two. Therefore to bring the packets back
into the same order some kind of identifier needs to exist. As the TS-packets do not carry
any unique identifiers in their header it was necessary to come up with an independent
approach. A four byte identifier was inserted in front of every TS packet. It can be used
to restore the total order of all TS packets at the receiver. Four byte means 4,294,967,296
values before this identifier wraps around. Even with a datarate of 20Mb/s it would take
three and a half days before this identifier repeats. Therefore one can assume that two
packets in the receiving buffer with the same identifier are actually the same packet (in
case that ever occurs) and one only has to deal with the wrap-around. That also means
that every TS packet is now 188byte + 4byte = 192byte, every PRRT packet is now
192byte ∗ 7 = 1344byte. The servers need to insert this identifier in front of every TS
packet before sending it.
Another possible solution instead of the identifier would be to use the M2TS approach.
M2TS also has four bytes in front of every TS packet, but it is not an identifier, but
two bits for copy protection and 30 bits timestamp. However this timestamp cannot be
used to restore the exact order of all packets (as multiple TS packets can have the same
timestamp) and is therefore not examined any further.
The multiplexer itself is the third thread in the receiving application. The two receiving
threads are passing up to ten whole data packets received at once to the multiplexer. The
multiplexer then divides the data packets containing up to seven TS packets plus identifier
into the single TS packets and stores them in a map, where the identifier is the key and
35
the TS packet is the value. In another map the time of reception of each TS packet is
stored to be able to identify delayed packets.
Depending on the receiver being in 2D or 3D mode an endless loop sorts the packets
into the correct order. In the 2D mode everything in the map is just output in the order of
the identifiers and then removed from the map. This can be done, because PRRT makes
sure the order of the data is maintained. In the 3D mode it is evident that the packets
from the two servers are sorted into the correct total order. To achieve this in the loop a
method looks for the next packet (the packet that has an identifier which is exactly one
larger than the current packet). If it is found it is attached to the output routine and
removed from the map. If it is not found the next step depends on the state of the map.
If the map is empty, the thread is being put to sleep for a predefined amount of time –
the buffertime. During this time the two receiving threads can receive more data. When
they do, they wake the multiplexer thread. When they do not, the multiplexer thread is
woken after the buffertime. In the other case, when the map is not empty, the maximum
waittime is limited by the buffertime and the receiving time of the next TS packet in the
map. The thread waits and continues to run at latest at receiving time + buffertime.
When the TS packet appears during this time in the map it is directly sent to the output
and the normal operation continues. If it does not occur in the map during this time the
packet is considered to be lost and the normal operation continues.
As explained in the beginning of this section no MVC material was available at this
point in time. Therefore the 3D stereoscopic video material used is two AVC streams in
one transport stream. It is easy to distinguish between the two views, as each of them
gets its own PID in the header of each TS packet. The implication of this is, that the
primary server (for the base view) just needs to send everything from the TS data but
the TS packets with the PID for the enhancement view and the secondary server (for
the enhancement view) just needs to send only those TS packets with the PID for the
enhancement view. As the two servers’ clocks run at the same speed thanks to the use
of the NTP protocol it is not necessary to synchronize the actual playback between the
two servers; it is sufficient to agree upon a common point in time when to start the
transmission. To do this a single TCP packet is sent from the primary to the secondary
36
server which contains a timestamp when the transmission should start, the PID which the
secondary server should transmit and a video source. TCP is used because it guarantees
reliability. The two servers then begin the transmission at the same moment in time and
are sending the data using PRRT such that they should appear in almost the correct
order at the receiver. As written before, the receiver only needs to restore the final order
by looking at the identifiers. If the secondary server for the enhancement view cannot
be contacted within one second (which is plenty of time for TCP retransmissions) the
primary server starts without the secondary server and provides the base view only.
3.3.4 Player
As stated before VLC is a software player that is capable of playing UDP streams. A
feature of VLC useful for this project is, that it can in addition show two videos that are as
an elementary stream in a Transport Stream concurrently within perfect sync (as opposed
to other Players, such as Xine). Unfortunately VLC is not prepared for 3D playback (as
of version 1.1.11); it can only display the two videos in two separate windows. There is
a filter for VLC available (its name is “Mosaic”) that allows to prepare the output for
3D display (e.g. using line-by-line, side-by-side or top-bottom, as supported by most
3D displays). This of course reduces the spatial resolution by 50%. However as this
module has not been designed for this it only supports software processing (no hardware
processing) and in tests even a very fast state-of-the-art computer was not able to play
the 3D video properly.
One possibility to display the video using VLC anyway was to trick VLC into crushing
the videos such that they are only half as wide and moving the windows into the correct
position11 to achieve a side-by-side representation.
This shows the general feasibility of the protocol implementation, however current free
software players do not have the necessary tools to show the 3D video properly out of
the box.
11“Devil’s Pie” can be used for that: http://burtonini.com/blog/computers/devilspie/
37
3.3.5 MVC
The preceding sections showed the general feasibility of this approach. However a very
important part could not be investigated yet - the MVC coding. At the beginning of
the work no MVC data was available and could therefore not be tested. To get interim
results another approach with two AVC streams was considered, this however required
more bandwidth compared to MVC, because a MVC stream is more efficient for 3D video
than two AVC streams.
When MVC data was available for testing the first issue to consider was how to get down
to the NAL-Units which control to what view perspective the subsequent data packets
belong. The answer to this problem is to follow the trail described in Section 2.2.2.3:
The NAL-Units are encapsulated in a Packetized Elementary Stream. The individual
PES packets are then encapsulated into the Transport Stream. To get to the NAL-Units
it is necessary to analyze the headers of the packets from the different levels to find the
corresponding payloads which carry the subjacent level’s packet:
The payload unit indicator in the header of the TS-packet gives valuable information:
It describes if the payload present in the current TS-packet is the beginning of a PES-
packet. The next important information from the TS-header is the actual beginning
position of the payload. It can be calculated by determining the presence and if so the
length of the adaption field which is always in front of the payload. Now there are two
paths that need to be considered:
The first is when the TS-packet transports the beginning of a PES-packet (which is
indicated by the payload unit indicator value of one) one can check for the correct sync
byte sequence (0x000001) and read the stream ID and packet length of the PES packet.
The stream ID is important because it tells us if the packetized elementary stream (PES)
contains video or audio or other program related content (e.g. subtitles). As only the
NAL-units in MVC are of interest for the problem the analysis is restricted to the stream
ID 0xE0 which is the one for AVC/MVC. The packet length can either be non-zero which
means that the whole PES-packet is exactly of that length or it can be zero. In the latter
case the length is undetermined and the actual length can only be identified by listening
for the next payload unit indicator in TS-packets of the same TS PID. To find the first
NAL-Unit it is crucial to find the beginning of the PES payload which means one has to
38
find the end of the PES header. The length of the header is dependent on the stream
ID. In the case of AVC/MVC the PES header always has an extended header which has
a dynamic length. However the length is transmitted within the header, so it can easily
be read out. If the packet length is non-zero the complete length can now be calculated
and taken as a “carryover” parameter for the next TS-packet. If the beginning of the
PES payload has been found it is then possible to search for the sync bytes of the NAL-
Units as used in the elementary stream (ES). The sync bytes are either 0x00000001 or
0x000001 depending on the position inside the ES. This however is not important for the
problem and the short form can be used. A PES-packet is carried inside a TS-packet.
The payload carried by the PES-packets forms the ES. As multiple NAL-Units can be
inside the part of the ES that is inside a single PES-packet it is necessary to loop through
the whole payload. The header of each NAL-Unit contains a forbidden bit that may never
be one, two bits that indicate if the data can be dumped after decoding (not of interest
for this problem) and five bits that represent the NAL-Unit type. This is the important
information for this solution. If only a single NAL-Unit type is not of value 14 or 20 (the
two values used in MVC) the TS-packet must be sent using the primary server, because
it contains information for the base view. In the case of a subsequent PES-packet this
NAL-Unit type would now be used for deciding if the TS-packet is sent using the primary
or the secondary server. However as this PES-packet is the first PES-packet and contains
the header it always needs to be sent using the primary server. As one cannot be sure,
that the next PES-packet starts with a NAL-Unit (it could also be a continuation of the
last NAL-Unit in the current PES-packet) the type of the current NAL-Unit is preserved
and considered for the next PES-packet.
The other is when the TS-packet transports subsequent PES-packets. This is indicated
by a null value fo the payload unit indicator. As before it is necessary to search for the
sync bytes of NAL-Units. This is basically the same process as described before but
there is one important difference. The NAL-Units in the first PES-packet are always sent
using the primary server. Here in subsequent PES-packets the TS-packet is sent using the
secondary server which serves the enhancement view whenever there are only NAL-Units
of type 14 or 20 inside. In any other case the TS-packet is sent using the primary server
and the data is allocated to the base view.
39
Server 1 ClientPRRT
... 〉 Payload 〉 Payload+PCR 〉 Payload 〉 Payload 〉 Payload+PCR 〉 ...Transport Stream
File
File
Play
erU
DP
2D
Server 2
File
3D
Multi-plexer
Recv
2Re
cv1
PRRT... 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 Payload 〉 ...
TS-packets
Star
ttim
e
TCP
Figure 3.9: Synchronized PRRT dualserver/client MVC transport streaming
After the analysis of this rather complicated encapsulation of protocols it was time to
use the code in an actual implementation. This was easy to accomplish: The solution
presented in Section 3.3.3 was copied and the server application was adapted to use the
above concept to decide which server needs to send which TS-packets instead of the pure
PID-based solution before. The results can be seen in Section 4.4.
3.3.6 MVC-Player
Although MVC claims to be backward compatible to AVC it is not possible to play
either of the transmitted variants (base and enhancement view vs. base view only)
in most software players. Only Xine is able to play the both streams in 2D. This is
however most likely a signaling problem in the structure of the Transport Stream. As
the transmission of the base view only played just as good in Xine as the transmission
of both base and enhancement view it can be assumed that the omission of NAL-Units
works just as expected.
To get a precise answer to this question, other players were evaluated: At the time of
testing there were only very little software players available that had support for MVC.
VLC is not capable of decoding MVC as of version 1.1.11. Also no Linux player could
be found that supported MVC. The only player that could be used for testing is the
40
“Stereoscopic Player” by Peter Wimmer12. It has two major drawbacks: First it only
runs on Windows which makes it impossible to use it together with the tools written,
because of their dependency on PRRT which is only available as a Linux kernel module.
The other is that it also does not accept UDP streams which would be one way to use it
in conjunction with a Linux computer that acts as a relay between PRRT and UDP. An
attempt to redirect the UDP-Stream to a Named Pipe on Windows which is then read by
the Player failed, because Windows Named Pipes can only be read by applications that
use the Win32 API CreateFile() function to open the file and apparently this player
uses some other method. Nevertheless the main functionality to play MVC coded files is
fulfilled.
Anyhow it has been shown before, that the speed in which the data is transmitted
over the presented tools is correct and allows proper playback. Therefore it was only
necessary to check if the transmitted data could be rendered in the Stereoscopic Player
in both variants - base view only as well as base and enhancement view. To test this
the File-Version of the Multiplexer described before has been used to capture the data
into a file. Two files have been recorded: For the first file the multiplexer acted as a 3D-
receiver, for the second file it acted as a 2D-Receiver. The first file (with full 3D stereo
content) could be played in stereoscopic mode as well as in monoscopic mode without
any problems. The second file (without the enhancement view) played exactly like the
first one in monoscopic mode. This shows that the proposed method for de-multiplexing
and re-multiplexing works.
12http://3dtv.at/
41
4 Experimental Evaluation
In this section all the steps that are explained in Section 3 are evaluated in experiments.
Various experiments exist that analyze the synchronization capabilities (Section 4.1), the
transport stream tools (Section 4.2) and the final two server transport stream application
(Section 4.3). In addition, also the solution with MVC is tested in Section 4.4 and
furthermore the effects of nonuniform round trip times have been evaluated in Section 4.5.
4.1 Time Analysis
As explained in Section 3.2 and Section 3.3.1 as the first step it is necessary to evaluate
the efficiency of NTP. To achieve the best possible accuracy with NTP the polling is set
to the smallest possible value of four, which means the daemon polls every 24s = 16s. In
Listing 4.1 some typical NTP statistics for the two servers are shown.
1 remote refid st t when poll reach delay offset jitter2 ==============================================================================3 *ptbtime1.ptb.de .PTB. 1 u 7 16 377 20.385 -0.027 0.2904
5 remote refid st t when poll reach delay offset jitter6 ==============================================================================7 *ptbtime1.ptb.de .PTB. 1 u 2 16 377 20.291 -0.014 0.127
Listing 4.1: NTP peers statistics
For the measurements a publicly available timeserver is chosen. In this case the first
timeserver of the “Physikalisch-Technische Bundesanstalt” is used. The PTB is the Ger-
man national organization that is responsible by law to distribute the exact time. Their
server ntp1.ptb.de is a stratum 1 NTP server13.
Lines 1–3 in Listing 4.1 represent the NTP statistics of the master server where Lines
5–7 represent the NTP statistics of the slave server. The values for delay, offset and
jitter are milliseconds. It can be seen that the delay of the two servers to the NTP
server is almost identical, as are offset and jitter. In this particular example the offset is
(−0.027ms) − (−0.014ms) = (−0.013ms) which means that the local clocks of our two
servers are (at this particular point in time) only 13µs apart. As the polling interval is as
13A stratum 0 device is a precise clock such as an atomic clock. A server directly connected to a stratum 0device is called a stratum 1 device. A device connected via a network to a stratum 1 device is calleda stratum 2 device, and so on.
42
small as possible with NTP (16s) the two clocks are corrected almost four times a minute
and cannot drift away from each other. Typically the difference in the offset are between
1 and 50 microseconds. A series of measurements over one hour has shown this. Every
16 seconds the data has been compared (250 times in total) and these are the results:
Delay Offset Jitter
Min 19.981ms −0.072ms 0.044ms
Max 20.459ms 0.136ms 8.384ms
∆ 0.478ms 0.208ms 8.340ms
∅ 20.276ms −0.000ms 0.737ms
σ 0.129ms 0.035ms 1.671ms
Table 4.1: Master Server statistics
Delay Offset Jitter
Min 20.052ms −0.079ms 0.019ms
Max 20.458ms 0.162ms 0.557ms
∆ 0.406ms 0.241ms 0.536ms
∅ 20.286ms 0.001ms 0.212ms
σ 0.118ms 0.039ms 0.139ms
Table 4.2: Slave Server statistics
Delay Offset JitterMin 0.001ms 0.000ms 0.000msMax 0.209ms 0.115ms 8.312ms∆ 0.208ms 0.115ms 8.312ms∅ 0.046ms 0.028ms 0.624msσ 0.035ms 0.024ms 1.605ms
Table 4.3: Difference between Master and Slave Server statistics
Table 4.1 and Table 4.2 show the minimum and maximum values, the difference between
those two, the average and the standard deviation of the 250 measurements for delay,
offset and jitter for the master and slave server. Table 4.3 shows the difference between
the master and the slave server. Note that the difference is taken between measurements
taken at the same time and not the difference between Table 4.1 and Table 4.2. From
Table 4.1 and Table 4.2 one can see that the delay is pretty constant with ≈ 20ms and
a standard deviation of only ≈ 120µs. In fact Table 4.3 shows that the delay is on
average 46µs apart with a standard deviation of 35µs. Another interesting fact is that
the difference in offset (what the clocks are actually apart from each other) is only 28µs
with a standard deviation of 24µs. These small values show that the tolerance is a lot
smaller than a frame interval and therefore verify that NTP is adequate in serving as a
protocol to synchronize the two servers.
43
The first experiment is the one described in Section 3.2. These very basic experiments
showed that the time that it takes for the data to be transported over UDP to the
destination is negligible:
1 1316028531 000345: "1316028531 000179"2 1316028532 000256: "1316028532 000077"3 1316028533 000296: "1316028533 000119"4 1316028534 000156: "1316028534 000011"5 1316028535 000279: "1316028535 000113"6 1316028536 000256: "1316028536 000112"
Listing 4.2: UDP Timestamp test
The data in Listing 4.2 is an excerpt of a test run which sends a timestamp from a
server to a client. The two clients are synchronized using NTP as described before. In
Listing 4.2 the first value (before the colon) is the timestamp of reception where the first
block are seconds and the second block are microseconds. The second value (after the
colon) is the timestamp of sending. Although this is only a very brief extract one can see
that the differences between the two values are never larger than 200µs and this already
includes transmission delay and clock inaccurateness.
The second experiment is also described in Section 3.2. This is basically the same
experiment as the one before, but it uses PRRT and sends at a higher rate:
1 1316029750 750509: "1316029750 592138 00000000000000000394"2 1316029750 752212: "1316029750 592581 00000000000000000395"3 1316029750 752227: "1316029750 593023 00000000000000000396"4 1316029750 752239: "1316029750 601124 00000000000000000412"5 1316029750 752252: "1316029750 601598 00000000000000000413"6 1316029750 752266: "1316029750 602067 00000000000000000414"7 1316029750 754434: "1316029750 602534 00000000000000000415"8 1316029750 754457: "1316029750 603211 00000000000000000416"
Listing 4.3: PRRT Timestamp test
The data in Listing 4.3 is an excerpt of a test run which sends a timestamp from a
server to a client. The two clients are synchronized using NTP as described before. In
Listing 4.3 the first value (before the colon) is the timestamp of reception where the first
block are seconds and the second block are microseconds. The second value (after the
colon) is the timestamp of sending (the third parameter is a sequence number). Although
this is only a very brief extract one can see that the differences between the two values are
44
between 150ms and 160ms, although the Dtarget of PRRT was set to 100ms. As this is
not the case with UDP one has to assume that this is due to the current implementation
of PRRT, which has difficulties to guarantee the desired Dtarget, maybe due to scheduling
reasons at the host.
One could argue that this is due to the higher rate, but even when sending at the same
rate as in the experiment before this effect is visible:
1 1316080212 118389: "1316080212 000046 00000000000000000014"2 1316080213 118388: "1316080213 000078 00000000000000000015"3 1316080214 118420: "1316080214 000085 00000000000000000016"4 1316080215 118395: "1316080215 000087 00000000000000000017"5 1316080216 118403: "1316080216 000114 00000000000000000018"6 1316080217 118386: "1316080217 000075 00000000000000000019"7 1316080218 118389: "1316080218 000105 00000000000000000020"
Listing 4.4: PRRT Timestamp test at lower rate
Although the values in Listing 4.4 show only a delay of 118ms, which is a lot less than
in the example before, the goal of Dtarget at 100ms is clearly missed. However as this
is static it still is predictable and therefore not a problem for the remainder of the project.
The third time related experiment is again described in Section 3.2. In this experiment
not one but two servers are used concurrently. The receiver receives the data from both
servers (2000 packets/second, each 1316 bytes) and measures the delay, the number of
lost and the number of late packets:
1 1: 2569.134440 kB/s2 1: 104 -105=9 , 105 -106=6000 , 106 -107=6004 , 107 -108=5987 , 108 -109=6012 , 109 -110=6004 ,
110 -111=6000 , 111 -112=5980 , 112 -113=6028 , 113 -114=6019 , 114 -115=5999 , 115 -116=5969 ,116 -117=6067 , 117 -118=6046 , 118 -119=6074 , 119 -120=5908 , 120 -121=6308 , 121 -122=6790 ,122 -123=5748 , 123 -124=5672 , 124 -125=5314 , 125 -126=2 , 126 -127=2 , 127 -128=2 ,128 -129=1 ,
3 1: Packets = 119945 , Lost = 0, Lossrate = 0.000000% , Late = 0, Laterate = 0.000000%4 =======================================5 2: 2569.648503 kB/s6 2: 104 -105=287 , 105 -106=5942 , 106 -107=6591 , 107 -108=6101 , 108 -109=5389 , 109 -110=6067 ,
110 -111=6783 , 111 -112=5818 , 112 -113=5469 , 113 -114=5975 , 114 -115=7157 , 115 -116=5510 ,116 -117=5356 , 117 -118=6245 , 118 -119=7259 , 119 -120=5146 , 120 -121=5526 , 121 -122=6540 ,122 -123=6739 , 123 -124=5276 , 124 -125=4793 ,
7 2: Packets = 119969 , Lost = 0, Lossrate = 0.000000% , Late = 0, Laterate = 0.000000%
Listing 4.5: PRRT Timestamp test with two servers
The data shown in Listing 4.2 is a short test over ten seconds. It shows the datarate
of 2569kB/s and lists the delay for the packets shown in buckets, each of them being
45
one millisecond. It can be seen that most packets are between 105ms and 125ms, a few
are 104ms or between 125ms and 129ms, but there are no packets faster than 104ms or
slower than 129ms. In Listing 4.2 it is also denoted that there has been neither packet
loss nor have packets been late. This is because the physical bandwidth of the ethernet
has not been exceeded and no route with high probability of packet loss (such as a wireless
LAN) has been used. To make the data more clear a graphical representation is helpful.
However as a ten second measurement can easily be marked as non-representative data
has been recorded over one hour (7.2 million packets in total at the same data rate of
2569kB/s) as can be seen in Figure 4.1.
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
1
2
Figure 4.1: PRRT Timestamp test with two servers over one hour;blue shows the primary server, red the secondary server;
x = time/ms, y = number of packets
The results are very alike. One can also see that all packets have been received after at
least 103ms and at most 125ms. It is interesting to see that the two streams are almost
46
cyclic and shifted. At this point it is unknown why this is the case, but it does not matter
anyway, because 100% of all packets are received within the range of 22ms and thus all
ongoing experiments can take this fact into account.
The experiments in this section showed that NTP is pretty accurate and that it is more
than adequate in serving as a background service that synchronizes the two servers. They
also showed that PRRT can provide predictable delay, however it does not achieve the
Dtarget that it takes as a parameter, but needs about 25–50% longer. This can however
easily be accumulated for by using a small buffer, which is needed anyway to restore the
total order of the packets from the two servers.
4.2 TS Analysis
As explained in Section 3.3.2 a program is written that extracts the PCR values from
a transport stream file. It is necessary to extract the PCR values to be able to stream
the video data from the server to the receiver in the correct speed. An excerpt of all the
PCRs in a transport stream can be seen in Listing 4.6:
1 # 3: PID = 0x100 , PCR = 766669 us2 # 11: PID = 0x100 , PCR = 866669 us3 # 33: PID = 0x100 , PCR = 966669 us4 # 39: PID = 0x100 , PCR = 1066669 us5 # 63: PID = 0x100 , PCR = 1166669 us6 # 83: PID = 0x100 , PCR = 1266669 us
182 # 65402: PID = 0x100 , PCR = 18666669 us183 # 65696: PID = 0x100 , PCR = 18766669 us184 # 66102: PID = 0x100 , PCR = 18866669 us185 # 66372: PID = 0x100 , PCR = 18966669 us
411 #165085: PID = 0x100 , PCR = 41466669 us412 #165751: PID = 0x100 , PCR = 41566669 us413 #166369: PID = 0x100 , PCR = 41666669 us414 #167102: PID = 0x100 , PCR = 41766669 us
Listing 4.6: PCR values in a TS file
The first parameter shows the number of the TS-packet, the second argument is the
PID of the TS-packet with the PCR and the third value is the actual PCR value in
microseconds. In Listing 4.6 line 1 the first PCR is in the third TS-packet, the PID is
256 (the decimal value of hexadecimal 0x100) and the value is about 767ms. Once again,
note that this very first PCR value in the transport stream is not zero. In this example
47
every 100ms one PCR value occurs. It is also visible that they only occur in the first
PID, although there are four in total in the tested transport stream.
As written in Section 3.3.2 a program that is able to stream a transport stream over
UDP in the correct speed depending on the PCR values is necessary. To test this a
video of exactly 100s length was wrapped into a transport stream (the same video as
before). As just seen the first PCR in this video is 766669µs, the last is 100666669µs.
The difference between these two PCR values is 99900000µs = 99.9s with a little bit of
data before the first and after the last TS-packet with a PCR value that makes up for
the missing 0.1s. A testrun with a small application that measures the time between
the first received and the last received UDP packet ended with a total transmission time
of 99902731µs which is only 2.7ms longer than what has been expected. That is an
error of 0.0027% and thus neglectable. The test with a standard media player such as
VLC also showed that the streaming is done in the correct speed. Another interesting
experiment is to test if no packets have been lost during the transmission. On a local
(wired) network with no packet loss this should always be the case. To test this, the
transport stream received is written into a file and compared to the transport stream file
of the sender. In a first attempt the filesize was different, which indicates that packets
have been lost, although the network introduces no packet loss. In addition no visible
artifacts could be seen in the test with a media player (see above). The problem here was
the poorly written helper application that listens to the UDP stream and writes it into a
file, which could not read the packets from the systems internal UDP packet buffer fast
enough. Increasing this buffer as shown in Listing 4.7 solved the problem and the filesize
of the received transport stream was equal to the one of the source. In addition also the
checksum is compared and it is also the same, which means that the transport stream
has been transmitted without packet loss or reordering at the correct speed.
48
1 $ sysctl -a | grep net.core.rmem2 net.core.rmem_max = 1126403 net.core.rmem_default = 1126404 $ sudo sysctl -w net.core.rmem_max =6680005 net.core.rmem_max = 6680006 $ sudo sysctl -w net.core.rmem_default =6680007 net.core.rmem_default = 6680008 $ sysctl -a | grep net.core.rmem9 net.core.rmem_max = 668000
10 net.core.rmem_default = 668000
Listing 4.7: Increasing the UDP kernel receive buffer on Linux
4.3 Two Server TS Analysis
As explained in Section 3.3.3 the same application as just explained is written again,
but this time it uses PRRT instead of UDP. The two variations with PRRT (stream using
UDP, write a file) do work just exactly like the version that uses UDP. The version that
takes PRRT and streams UDP (e.g. to localhost) can be used to watch the transport
stream with a software player such as VLC. The version that takes PRRT and writes the
file yields the same results as before, the filesize and checksum are the same as for the
source file. Interestingly, the time that it takes for the whole stream to be transmitted
is a little longer than before. This is most likely due to the differences in delay that the
single packets have in PRRT as explained in Section 4.1. It takes 99919720µs where the
pure UDP based transmission takes 99902731µs which is 17ms longer and almost 20ms
in total. That equates to one frame delay (assuming a 50fps video), however the software
player itself has yet another buffer than can easily compensate for this minor delay.
Also explained in Section 3.3.3 is what needs to be done to use not one server but two
servers. When the two servers start sending, a multiplexer in the receiver sorts the TS
packets back into their original order and either writes them to a file or forwards them as a
UDP stream (e.g. to localhost). The first version that writes the data into a file produces
the same result as in the experiments before: both filesize and checksum are equal to the
original source file. This is also true for the UDP variant of the application. During the
experiment also the time for the full transmission was measured again. This time it was
shorter again, namely 99900514µs compared to one server UDP with 99902731µs and
one server PRRT with 99919720µs. It has not been expected, that the accuracy would
49
be better using a multiplexer (which introduces extra complexity) than it is with the two
other solutions.
4.4 MVC Analysis
As explained in Section 3.3.5 for MVC it is necessary to go a few layers deeper than the
MPEG2-TS to get to the NAL-Units which indicate to what view perspective the payload
belongs. To test this the two-server application from above is rewritten to consider the
NAL-Unit type to decide which channel should be used to send the data. Once again
a test video has been taken and transmitted. As before the application is capable of
reconstructing the original file from the two streams. Also the transmission with only a
single stream is successful. The video length of the tested file is 11701633µs, the complete
transmission (in both cases) takes 11730053µs which is 30ms longer than expected. This
however is not visible in the player.
50
0
20
40
60
80
100
120
140
160
ms
ms
Figure 4.2: MVC transmission test;blue shows the primary server, red the secondary server;
x = time/ms in the video, y = time/ms for the data packet from the sender to the receiver
In Figure 4.2 the exact time that each data packet needs to travel from one of the two
servers to the client has been recorded. It shows that the delay is stable between 100
and 125ms, although there are a few peaks that exceed these numbers. This is due to
a significantly higher data rate at these moments, as it can also be seen by the higher
number of dots in the graph.
51
0
5000
10000
15000
20000
25000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000
Figure 4.3: MVC transmission test datarate;x = time/ms in the video, y = number of TS data packets accumulated over 100ms
Figure 4.3 shows the data rate of the video transmission. It can be seen that the data
rate is ten times as high during the peaks described above compared to the rest of the
transmission.
4.5 Nonuniform Round Trip Times
As the two servers can be in different geographical locations this section looks at the
effects of nonuniform round trip times (RTTs). In the experiments above the RTT was
the same for the two servers within the range of ±1ms. In this experiment the same
measurement as in Section 4.4 is executed, but this time a delay of 40ms in a first
experiment and a delay of 60ms in a second experiment is added to the RTT of the
secondary server (which streams the enhancement view). To execute a realistic scenario
a delay is added to both directions of the transmission of the same size. For an additional
52
40ms in the RTT, 20ms is added to the sending path and 20ms is added to the receiving
path. For an additional 60ms in the RTT, 30ms is added to the sending path and 30ms
is added to the receiving path.
This can be achieved running a network emulation. netem is included in current linux
distributions running Kernel 2.6 and can be controlled by the tc-command which is part
of the iproute2 package. The tc-command can be used to alter the traffic control settings.
Listing 4.8 shows the commands to add a delay for all outgoing data:
1 $ tc qdisc add dev eth0 root netem delay 20ms2 $ tc qdisc show3 qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 20.0ms4 $ tc qdisc del dev eth0 root netem delay 20ms
Listing 4.8: Adding and deleting 20ms of delay for outgoing data on eth0.
Adding a delay for incoming data is more complicated. It is necessary to create an
Intermediate Functional Block device in Linux14, where traffic control settings can be
applied to. This can be done as shown in Listing 4.9.
1 $ modprobe ifb2 $ ip link set dev ifb0 up3 $ tc qdisc add dev eth0 ingress4 $ tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 1:1 action
mirred egress redirect dev ifb05 $ tc qdisc add dev ifb0 root netem delay 20ms6 $ sudo tc qdisc show7 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------8 qdisc netem 8001: dev ifb0 root refcnt 2 limit 1000 delay 20.0ms9 $ tc qdisc del dev ifb0 root netem delay 20ms
Listing 4.9: Adding and deleting 20ms of delay for incoming data on eth0.
To verify the added delay an ICMP ECHO_REQUEST can be used via the ping-
command as shown in Listing 4.10.
14More information can be found here: http://www.linuxfoundation.org/collaborate/workgroups/networking/ifb
53
1 $ sudo tc qdisc show2 qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1
1 13 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------4 qdisc pfifo_fast 0: dev ifb0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1
1 15
6 $ ping -c 5 10.10.10.107 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.8 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =1.47 ms9 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =0.159 ms
10 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =0.222 ms11 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =0.190 ms12 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =0.170 ms13 --- 10.10.10.10 ping statistics ---14 5 packets transmitted , 5 received , 0% packet loss , time 4001ms15 rtt min/avg/max/mdev = 0.159/0.443/1.478/0.518 ms16
17 $ sudo tc qdisc add dev eth0 root netem delay 20ms18 $ sudo tc qdisc add dev ifb0 root netem delay 20ms19 $ sudo tc qdisc show20 qdisc netem 8005: dev eth0 root refcnt 2 limit 1000 delay 20.0ms21 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------22 qdisc netem 8006: dev ifb0 root refcnt 2 limit 1000 delay 20.0ms23
24 $ ping -c 5 10.10.10.1025 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.26 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =40.2 ms27 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =40.3 ms28 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =40.2 ms29 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =40.2 ms30 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =40.2 ms31 --- 10.10.10.10 ping statistics ---32 5 packets transmitted , 5 received , 0% packet loss , time 4005ms33 rtt min/avg/max/mdev = 40.212/40.276/40.309/0.182 ms34
35 $ sudo tc qdisc change dev eth0 root netem delay 30ms36 $ sudo tc qdisc change dev ifb0 root netem delay 30ms37 $ sudo tc qdisc show38 qdisc netem 8005: dev eth0 root refcnt 2 limit 1000 delay 30.0ms39 qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------40 qdisc netem 8006: dev ifb0 root refcnt 2 limit 1000 delay 30.0ms41
42 $ ping -c 5 10.10.10.1043 PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.44 64 bytes from 10.10.10.10: icmp_seq =1 ttl =64 time =60.2 ms45 64 bytes from 10.10.10.10: icmp_seq =2 ttl =64 time =60.2 ms46 64 bytes from 10.10.10.10: icmp_seq =3 ttl =64 time =60.2 ms47 64 bytes from 10.10.10.10: icmp_seq =4 ttl =64 time =60.1 ms48 64 bytes from 10.10.10.10: icmp_seq =5 ttl =64 time =60.1 ms49 --- 10.10.10.10 ping statistics ---50 5 packets transmitted , 5 received , 0% packet loss , time 4005ms51 rtt min/avg/max/mdev = 60.169/60.237/60.284/0.045 ms
Listing 4.10: Verifying the RTT
54
It can be seen that the RTT increases as expected to 40ms and 60ms. netem can
also be used to create packet loss, packet duplication, packet corruption and packet
reordering. As these events are handled by PRRT, they are not analyzed here. PRRT
also takes care of different RTTs, however the aim of this experiment is to find out the
effects of nonuniform RTTs where the RTT is smaller than the pre-configured delay of
Dtarget = 100ms aim of PRRT.
55
0
20
40
60
80
100
120
140
160
180 ms
ms
Figure 4.4a: secondary server with +40ms RTT
0
20
40
60
80
100
120
140
160
180
ms
ms
Figure 4.4b: secondary server with +60ms RTT
Figure 4.4: MVC transmission test;blue shows the primary server, red the secondary server;
x = time/ms in the video, y = time/ms for the data packet from the sender to the receiver
56
The results of the first experiment can be seen in Figure 4.4a. Compared to the
experiment in Figure 4.2 there are no conspicuous differences. This is due to the fact
that PRRT can still achieve the aim of Dtarget when there is a delay in the RTT of
40ms. Looking at the average delay time for each packet however a difference can be
seen. Where the average is 120ms for the experiment in Figure 4.2, it is 131ms for this
experiment with 40ms additional RTT as shown in Figure 4.4a.
In Figure 4.4b the results of the second experiment can be seen. Compared to the
experiment in Figure 4.2 and also to the experiment with 40ms additional RTT in Fig-
ure 4.4a there are again no striking differences. This is also due to the fact that PRRT
can still achieve the aim of Dtarget when there is a delay in the RTT of 60ms. Again,
looking at the average delay time for each packet a difference can be seen. For the ex-
periment with 60ms additional RTT as shown in Figure 4.4b the average delay for each
packet is 127ms, which is less than for 40ms additional RTT, but still more than in the
experiment shown in Figure 4.2.
Despite the fact that the whole transmission is delayed, the resulting video is still
working perfectly, as PRRT could balance this delay without any problems. With higher
RTTs it is of course necessary to configure the PRRT protocol correspondingly.
57
5 Conclusion
This Thesis evaluated the possibilities to create a 2D-backward-compatible 3D-Tele-
vision service on IP based Networks using an AHEC-protocol. The technological back-
ground has been explained and evaluated and the applicable technologies have been
selected. The analysis resulted in the combination of NTP, PRRT and MVC inside an
MPEG2-TS and suggested that these can offer a solution to the described problem.
The written prototype applications and the evaluation of their results affirmed the
initial assumption that the combination of the above technologies can provide a set of
applications that are capable of achieving a system that allows 2D and 3D video trans-
mission in a backward compatible and bandwidth saving manner.
In addition the first apporach led to a solution that uses two AVC streams to achieve
3D appearance and at the same time is more bandwidth friendly than other solutions
due to its backward compatibility.
Altogether this Thesis showed that it is possible to create a 2D-backward-compatible
3D-Television service which is bandwidth efficient on IP based Networks using modern
media-aware technologies and an AHEC-protocol.
5.1 Future Work
This section gives some ideas for future features and applications of this solution.
The solution presented only considers stereo video with one base view and one enhance-
ment view. The concept however can also be extended to FVV with one base view and
many enhancement views. Each enhancement view would be a distinct multicast group.
This however is not easy to implement, as it is a challenge to have a receiver subscribed
to the correct multicast groups, as the views’ dependencies are way more complex than
in the stereo case.
To protect the base view better than the enhancement view unequal error protection
can be used (e.g. the base view has a stronger protection than the enhancement view(s)).
As each view has its own channel, this can be easily implemented by using different PRRT
parameters for the base view than for the enhancement view.
58
In Section 1.2 the concept of hybrid broadcasting implemented by NHK was already
introduced. The solution presented here could als be used in a hybrid environment by
sending the base view on a classical broadcast technology like DVB-S2 and the enhance-
ment view on IP. This can be achieved by minor changes to the protocol, especially in
the area of huge differences of round trip times and the different error correction schemas
for the different transportation routes.
59
Appendix
A References
[1] Y. Chen, Y.K. Wang, K. Ugur, M.M. Hannuksela, J. Lainema, and M. Gabbouj.
The emerging mvc standard for 3d video services. EURASIP Journal on Applied
Signal Processing, 2009:8, 2009.
[2] D. Deeths and G. Brunette. Using ntp to control and synchronize system clocks-part
i: Introduction to ntp. Sun BluePrintsTMOnLine-July, 2001.
[3] J. Eidson. Ieee-1588 standard for a precision clock synchronization protocol for
networked measurement and control systems. Systems-Test and Measurement Ap-
plications, 10:10–13, 2005.
[4] J. C. Eidson. Measurement, Control, and Communication Using IEEE 1588 (Ad-
vances in Industrial Control). Springer-Verlag New York, Inc., Secaucus, NJ, USA,
2006. ISBN 1846282500.
[5] ETSI. Digital Video Broadcasting (DVB); Second generation framing structure,
channel coding and modulation systems for Broadcasting, Interactive Services, News
Gathering and other broadband satellite applications (DVB-S2) (ETSI EN 302 307
V1.2.1). European Telecommunications Standards Institute, 2009.
[6] M. Gorius, J. Miroll, and T. Herfet. Service compatible efficient 3d-hdtv delivery.
In 14th ITG Conference on Electronic Media, 2011.
[7] M. Gorius, J. Miroll, M. Karl, and T. Herfet. Predictable reliability and packet
loss domain separation for ip media delivery. In IEEE International Conference on
Communications (ICC), 2011.
[8] M. Gorius, G. Petrovic, and T. Herfet. Predictably reliable real-time transport over
large bandwidth-delay product networks. 2011.
60
[9] M. Gorius, Y. Shuai, and T. Herfet. Predictably reliable media transport over wire-
less home networks. 2011.
[10] A. Gotchev, G.B. Akar, T. Capin, D. Strohmeier, and A. Boev. Three-dimensional
media for mobile devices. Proceedings of the IEEE, 99(4):708–741, 2011.
[11] C.G. Gurler, B. Gorkemli, G. Saygili, and A.M. Tekalp. Flexible transport of 3-d
video over networks. Proceedings of the IEEE, 99(4):694–707, 2011.
[12] T. Herfet. Future media internet: Video- & audiotransport – a new paradigm.
Manuscript, 2011.
[13] ISO/IEC 13818-1:2000(E). Information technology – Generic coding of moving pic-
tures and associated audio information: Systems. ISO, Geneva, Switzerland, 2000.
[14] ISO/IEC 14496-10:2004(E). Information technology – Coding of audio-visual objects
– Part 10: Advanced Video Coding. ISO, Geneva, Switzerland, 2004.
[15] M. Karl, M. Gorius, and T. Herfet. A distributed multilink media transport ap-
proach. In International Conference on Intelligent Network and Computing (ICINC
2010), Kuala Lumpur, November 2010.
[16] M. Karl, M. Gorius, and T. Herfet. Routing: Why less intelligence sometimes is
more clever. In IEEE International Symposium on Broadband Multimedia Systems
and Broadcasting (ISBMSB), Shanghai, March 2010.
[17] D.L. Mills. Internet time synchronization: The network time protocol. Communica-
tions, IEEE Transactions on, 39(10):1482–1493, 1991.
[18] D.L. Mills. Precision synchronization of computer network clocks. ACM SIGCOMM
Computer Communication Review, 24(2):28–43, 1994.
[19] H.M. Ozaktas and Levent. Onural. Three-dimensional television. Springer-Verlag
Berlin Heidelberg.
61
[20] T. Schierl and S. Narasimhan. Transport and storage systems for 3-d video using
mpeg-2 systems, rtp, and iso file format. Proceedings of the IEEE, (99):1–13, 2011.
[21] A. Smolic, P. Merkle, K. Müller, C. Fehn, P. Kauff, and T. Wiegand. Compression of
multi-view video and associated data. Three-Dimensional Television, pages 313–350,
2008.
[22] V. Smotlacha et al. One-way delay measurement using ntp synchronization. In Proc.
TERENA Networking Conf, page 2003.
[23] A.M. Tekalp and M.R. Civanlar. Efficient transport of 3dtv. Three-Dimensional
Television, pages 351–369, 2008.
[24] A. Vetro, T. Wiegand, and G.J. Sullivan. Overview of the stereo and multiview video
coding extensions of the h. 264/mpeg-4 avc standard. Proceedings of the IEEE, 99
(4):626–642, 2011.
[25] H. Weibel. High precision clock synchronization according to ieee 1588 implementa-
tion and performance issues. Proc. Embedded World 2005.
[26] S. Wenger. H. 264/avc over ip. Circuits and Systems for Video Technology, IEEE
Transactions on, 13(7):645–656, 2003.
[27] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the h.
264/avc video coding standard. Circuits and Systems for Video Technology, IEEE
Transactions on, 13(7):560–576, 2003.
[28] L. Zhang, Z. Liu, H. Xia, et al. Clock synchronization algorithms for network mea-
surements. In INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE
Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 160–
169. IEEE, 2002.
62
B List of Figures
2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 AHEC Architecture [8, p. 2] . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Top: Base and Enhancement View; Bottom: Base View and Depth Map15 16
2.4 Free viewpoint video, interactive selection of virtual viewpoint and viewing
direction within practical limits [21, p. 319] . . . . . . . . . . . . . . . . . 17
2.5 MVC NAL unit interface [20, p. 673] . . . . . . . . . . . . . . . . . . . . 18
2.6 Typical MVC prediction structure for 3DV [1, 24] . . . . . . . . . . . . . 18
2.7 Typical MVC prediction structure for FVV [1, p. 3] . . . . . . . . . . . . 19
2.8 NAL-Units in the Elementary Stream in the Packetized Elementary Stream
in the Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Measuring Delay and Offset [18, p. 3]; A = Client, B = Server . . . . . . 23
2.10 Network topology and clock types [25, p. 5] . . . . . . . . . . . . . . . . . 25
2.11 Offset and Delay Measurement [3, p. 19] . . . . . . . . . . . . . . . . . . 26
3.1 Streaming protocol stacks [11, p. 698] . . . . . . . . . . . . . . . . . . . . 29
3.2 Digital TV delivery protocol stacks (DVB as an example for underlying
layer technologies) [20, p. 674] . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Synchronized server/client architecture that sends timestamps using UDP 31
3.4 Synchronized server/client architecture that sends timestamps using PRRT 32
3.5 Synchronized dualserver/client architecture that sends timestamps using
PRRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Synchronized UDP server/client transport streaming . . . . . . . . . . . 33
3.7 Synchronized PRRT server/client transport streaming . . . . . . . . . . . 34
3.8 Synchronized PRRT dualserver/client transport streaming . . . . . . . . 34
3.9 Synchronized PRRT dualserver/client MVC transport streaming . . . . . 40
4.1 PRRT Timestamp test with two servers over one hour; blue shows the
primary server, red the secondary server; x = time/ms, y = number of
packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 MVC transmission test; blue shows the primary server, red the secondary
server; x = time/ms in the video, y = time/ms for the data packet from
the sender to the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
63
4.3 MVC transmission test datarate; x = time/ms in the video, y = number
of TS data packets accumulated over 100ms . . . . . . . . . . . . . . . . 52
4.4 MVC transmission test; blue shows the primary server, red the secondary
server; x = time/ms in the video, y = time/ms for the data packet from
the sender to the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4a secondary server with +40ms RTT . . . . . . . . . . . . . . . . . . 56
4.4b secondary server with +60ms RTT . . . . . . . . . . . . . . . . . . 56
C List of Tables
4.1 Master Server statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Slave Server statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Difference between Master and Slave Server statistics . . . . . . . . . . . 43
D List of Listings
4.1 NTP peers statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 UDP Timestamp test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 PRRT Timestamp test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 PRRT Timestamp test at lower rate . . . . . . . . . . . . . . . . . . . . 45
4.5 PRRT Timestamp test with two servers . . . . . . . . . . . . . . . . . . . 45
4.6 PCR values in a TS file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.7 Increasing the UDP kernel receive buffer on Linux . . . . . . . . . . . . . 49
4.8 Adding and deleting 20ms of delay for outgoing data on eth0. . . . . . . 53
4.9 Adding and deleting 20ms of delay for incoming data on eth0. . . . . . . 53
4.10 Verifying the RTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
64
E CD
This CD contains this thesis as a PDF-file, the LATEX source code of this thesis, images
and other included files.
65