Module 1 - VoIP Fundamentals

8/4/2019 Module 1 - VoIP Fundamentals

1/54

ITU Centres of Excellence for Europe

NGN Services VoIP and IPTV

Module 1:

VoIP fundamentals


2/54

1

Table of contents

1.1. ITU NGN Standards for main real time services ............................................31.1.1. ITU IPTV Standards.............................................................................5

1.1.2. VoIP Standards....................................................................................61.2. NGN and Internet fundamentals ..................................................................111.3. VoIP infrastructure .......................................................................................161.4. Peer-to-peer VoIP........................................................................................241.5. VoIP protocols and codecs ..........................................................................301.6. Signaling for VoIP ........................................................................................39

1.6.1. H.323 .................................................................................................391.6.2. SIP.....................................................................................................421.6.3. Media Gateway Control Protocol (MGCP) .........................................48

References .........................................................................................................52


3/54

2

1.1. ITU NGN Standards for main real time services

The next generation networks (NGN) provide new realities in thetelecommunication industry characterised by many factors such as: the need toconverge and optimise the operating networks and the extraordinary expansionof digital traffic (i.e., increasing demand for new multimedia services (includingVoIP and IPTV), increasing demand for mobility, etc. While different servicesconverge at the level of digital transmission, the separation of distinct networklayers (transport, control, service and applications functions see Figure 1.1)provides support for competition and innovation at each horizontal level in theNGN structure. At the same time NGNs also create strong commercial incentivesfor network operators to bundle, and therefore increase vertical and horizontal

integration, leveraging their market power across these layers. This may bringabout the need for closer regulatory and policy monitoring, in order to prevent therestriction of potential development of competition and innovation in a nextgeneration environment, and therefore the risk of reducing benefits forconsumers and the potential of new networks for economic growth and forproviding multimedia services (including the main real-time services: VoIP andIPTV) with high level of QoS provisioning.

Although there is a significant amount of work underway in standardisationforums on NGN, at the policy level, there is a still not complete agreement on aspecific definition of NGNs and of VoIP and IPTV standards. The term isgenerally used to depict the shift to higher network speeds using broadband, the

migration from the PSTN to an IP-network, and a greater integration of serviceson a single network, and often is representative of a vision and a market concept.According to ITU-T Recommendation Y.2001 (12/2004) General overview ofNGN is described with the following definition:

A Next Generation Networks (NGN) is a packet-based network ableto provide Telecommunication Services to users and able to make use ofmultiple broadband, QoS-enabled transport technologies and in whichservice-related functions are independent of the underlying transport-related technologies. It enables unfettered access for users to networksand to competing service providers and services of their choice. Itsupports generalised mobility which will allow consistent and ubiquitous

provision of services to users.


4/54

3

Figure 1.1. Separation of the functional plans in two NGN stratums.

As it is known the NGN standardization work started in 2003 within ITU-T,and is worldwide today in various major telecom standardization bodies. Themost active NGN relevant standardization bodies are ITU, ETSI, ATIS, CJK andTMF. The Next Generation Mobile Networks (NGMN) initiative is a major body formobile-specific NGN activities, which are important contributors to the 3GPPspecification for NGMN.

For those who maybe dont know the ITU (InternationalTelecommunication Union) is an international organization within the UnitedNations in which governments and the private sector coordinate global telecom

networks and services. ITU-T is the telecommunications sector of ITU. Itsmission is to produce high-quality recommendations covering all the fields oftelecommunications.

In 2003, under the name JRG-NGN (Joint Rapporteur Group on NGN), theNGN pioneer work was initiated. The key study topics are:

NGN requirements; the general reference model; functional requirements and architecture of the NGN; evolution to NGN.

Moreover, the two fundamental recommendations on NGN are: Y.2001: General overview of NGN.

Y.2011: General principles and general reference model for next-generation networks.

These two documents comprise the basic concept and definition of NGN. In May2004, the FG-NGN (Focus Group on Next Generation Networks) was establishedin order to continue and accelerate NGN activities initiated by the JRG-NGN. FG-NGN addressed the urgent need for an initial suite of global standards for NGN.The NGN standardization work was launched and mandated to FG-NGN.


5/54

4

On 18 November 2005, the ITU-T published its NGN specification Release1, which is the first global standard of NGN and marked a milestone in ITUs workon NGN. The NGN specification Release 1, with 30 documents, specified theNGN Framework, including the key features, functional architecture, componentview, network evolution, etc. Lacking protocol specifications, the ITU NGN

Release 1 is not at an implementable stage; however, it is clear enough to guidethe evolution of todays telecom networks. With the release of NGN Release 1,the FG-NGN has fulfilled its mission and closed.

As we mention before, ITU coordinates the global efforts (includinggovernments, regional and national SDOs, industry forums, vendors, operators,etc.) in developing the ITU recommendations. Moreover, the ITU takes a three-stage approach as follows to develop the NGN standards -

(a) Stage 1: identify service requirements;(b) Stage 2: describe network architecture and functions to map service

requirements into network capabilities;and (c) Stage 3 : define protocol capabilities to support the services.

All services (including VoIP and IPTV) and capabilities have to bespecified to stage 3 to ensure that the standards are implementable. ITUs NGNspecifications are mainly contained in the Y-series and Q-seriesrecommendations. The Y.2xxx series recommendations specify the overallcharacteristics of NGN whereas the Y.19xx series recommendations specify IPTelevision (IPTV) over NGN. The Q.3xxx series recommendations focus on thesignalling requirements and protocols for NGN. Lists of these recommendationsare given in the following:

ITU-T Rec. Y.1901 Requirements for the support of IPTV services

Y.1900-Y.1999 : IPTV over NGN

Y.2250-Y.2299 : Service aspects: Interoperability of services and networks

in NGN ITU-T Rec. Y.1910 IPTV functional architecture

ITU-T Rec. H.720 Overview of IPTV terminal devices and end systems

ITU-T Rec. H.721 IPTV Terminal devices: Basic model

ITU-T Rec. X.1191 Functional requirements and architecture for IPTVsecurity aspects


6/54

5

1.1.1. ITU IPTV Standards

Definition of IPTV by ITU-Ts IPTV Focus Group is:IPTV is defined as multimedia services such as

television/video/audio/text/graphics/data delivered over IP based networks

managed to provide the required level of QoS/QoE, security, interactivity andreliability.

The IPTV standardization work within ITU is on-going under the umbrellaof a Global Standards Initiative, i.e. the IPTV-GSI. To date ITU has published twomajor standards for IPTV -

ITU-T Rec. Y.1901 Requirements for the support of IPTV services ITU-T Rec. Y.1910 IPTV functional architecture

Y.1901 specifies the high level requirements to support IPTV services,including the following major areas:

(a) general requirements on service offering, accounting and charging;

(b) QoS and performance, e.g. quality of experience (QoE), trafficmanagement;

(c) security, including service and content protection, service security,network security, IPTV terminal security, subscriber security;

(d) network related aspects, including multicast distribution, mobility;(e) end-system capabilities and interoperability aspects; and(f) middleware and content aspects.

Moreover, Y.1910 describes the IPTV functional architecture to supportIPTV services. The IPTV functional architecture is based on the use of existingnetwork components as well as the NGN architecture, leading to three possible

architectures:(a) IPTV functional architecture for non-NGN network components;(b) IPTV functional architecture based on NGN functional architecture, but

not based on IMS;(c) IPTV functional architecture based on NGN and its the IMS

component.

On the other side, Y.1910 identifies the functional entities for each of thearchitectures mentioned above and the reference points (i.e. the interfaces)between these functional entities. It then describes the functional capabilities ofthese entities and reference points, including functional entities for interworking

between different IPTV functional architectures, and with third-party applications.As envisaged by ITU, the next generation IPTV may see a change that requiresinteroperation between service providers and/or network providers. A potentialoutcome of this will be that a customer can go into a shop, buy an IPTV box, calltheir network operator and sign-up, and then access services from a range ofthird party service providers.

In additional to Y.1901 and Y.1910, there are other IPTV standardspublished by ITU-T, including the following:


7/54

6

ITU-T Rec. H.720 Overview of IPTV terminal devices and endsystems, which provides a high level description of the functionalityof terminal devices for IPTV services;

ITU-T Rec. H.721 IPTV Terminal devices: Basic model, whichspecifies the functionalities of IPTV terminal devices for IPTV basic

services over a dedicated content delivery network, taking intoaccount conditions on content delivery such as QoS; and ITU-T Rec. X.1191 Functional requirements and architecture

for IPTV security aspects, which describes the functionalrequirements, architecture and mechanisms dealing with thesecurity and protection of IPTV content, service, network, terminaldevices and subscribers.

1.1.2. VoIP Standards

One of the most important emerging trends in telecommunications, whichdevelopment represents a major change in the emerging information andcommunication technologies, undoubtedly is Voice over IP the transmission ofvoice over packet-switched IP networks. VoIP in general have been advocatedand studied since the mid 1970s. It was the advent of DSP technology for voicecompression in the late 1980s and early 1990s that gave these services theimpetus they needed to enter the mainstream. Commercial-grade technologiesand services started to appear in the 1997 and books on the topic started toappear in 1998, with Mr. Minolis co-authored Delivering Voice over IP book(Wiley, April 1, 1998) being the first text on the market on this topic. A lot hastranspired since then. Nowadays, enterprise NGN networks, cellular carriers,

voice-over-cable carriers, Quadruple play emerging trends, Pure-play VoIPcarriers, and even traditional voice carriers are all moving rather aggressively toa VoIP paradigm.

VoIP has developed considerably in recent years and is gainingwidespread public recognition and adoption through consumer solutions such asSkype and BTs strategy of moving to an IP-based network. The great potentialfor very low-cost is driving the use of the IP technology, but in the long-term,VoIP is more significant is introducing free phone calls - that represents a majorchange in telecommunications. The fact that VoIP transmits voice as digitisedpackets over the Internet means that it has the potential to converge with otherdigital technologies, which in turn will result in new services and applications

becoming available. However, the adoption of VoIP, especially over mobile andwireless networks is not without complications. The traditional PSTN telephoneinfrastructure has been built up over the last one hundred years or so and hasdeveloped into a robust voice communications system that provides reliabilityfigures of nearly 100%. In contrast, VoIP is a relatively new technology with afledgling architecture that is built on inherently less reliable data networks. Thismeans that there are therefore justifiable concerns around the extent to which itis deployed. However, today technology offers opportunities for the development


8/54

7

of new applications and educational services, particularly through the potentialfor converging voice with other media and data. In the long-term, VoIP is likely toimpact on some of the bigger picture developments within further and highereducation such as virtual universities, identity management, and integration withenterprise-level services and applications.

Furthermore, are presented the main factors that have been promoted byVoIP and its main barriers. So, the main factors that have been promoting VoIPinclude: Low cost/no cost software (softphone and configuration tools) for PCs and

PDAs; Wide availability of analogue telephone adapters; Growing availability of broadband, wireless hot spots and other forms of

broadband access; Packetised voice enables much more efficient use of the network (bandwidth

is only used when something is actually being transmitted); The VoIP network can handle connections from many applications and many

users at the same time (unlike the dedicated circuit-switch approach). Relative high cost of PSTN calls.

On the other hand, the main barriers opposing VoIP include: High quality and reliability of the PSTN; VoIP quality of service can be variable; Lack of intrinsic QoS in many of IP networks around the world; Many challenges in wireless VoIP users; Some VoIP feature, service and VoIP service provider

interconnection limitations; Relative difficulty in setup and use; End-2-end integrity of the signalling and bearer path problems; Introduction of call plans and flat rates charges by traditional PSTN

operators.

On the other side, as it is well known, there is no permanent physicalconnection which is established in packet-based networks, such as IP basednetworks. However, for VoIP, the communicating devices at the end-points buildup a connection using corresponding protocols (such as H.323 or SIP: IETF RFC3261, 2002; ITUT Rec. H.323, 1998). Due to the fact that the InternationalTelecommunication Union (ITU) and the Internet Engineering Task Force (IETF)are the two major international organisations recommending standards for VoIP,the ITU recommends H.323 and the IETF recommends the Session InitiationProtocol (SIP). While there is some overlap of functionality there are differencesin approach and terminology.

The prior buildup of the link basically settles the agreement between thetwo end-points that speech data will be exchanged between them. Only at thisstage of the communication setup, the connection-oriented Transmission ControlProtocol (TCP) is typically applied. After the connection is established, codedspeech data are packetized into packets that are sent from source to destination.


9/54

8

At different levels of packetization, header information is added to the speechdata payload, successively increasing the packet size.

The subsequent packetization steps and the protocols involved areillustrated in Table 1.1, furthermore, the resulting packet structure and headerinformation are shown in simplified form in Figure 1.2.

We must to emphasize that the headers can be compressed to reduce theamount of data to be sent across the network (e.g. IETF RFC 3095, 2001). Alsowe must to note that while the TCP used for connection setup is connection-oriented, requiring acknowledgments between endpoints, the User DatagramProtocol (UDP) typically used for the transport of the speech data isconnectionless, and hence yields fewer and smaller packets (see Table 1.2).

Table 1.1:Protocols and media access technologies involved in VoIP packetization.

* RTP: Real-time Transport Protocol (IETF RFC 3550, 2003); UDP: User Datagram Protocol(IETF RFC 768, 1980); IP: Internet Protocol; WLAN: Wireless Local Area Network (IEEE Std802.11, 2005).

Figure 1.2. Illustration of the headers in VoIP.


10/54

9

Table 1.2:Header sizes of different protocols involved in VoIP.

Moreover, on sending, the speech packets search their way through thenetwork, where they are routed from one node to the next based on thedestination address they carry. Consequently, subsequent packets may takedifferent paths on their way to the destination. In case of congestion at somepoint of the network, they may arrive out of order or simply with considerable,and/or varying delay (delay and/or jitter, which are the main QoS parameterswhen we considered VoIP). An efficient speech communication cannot be carriedout, if the transmission delay becomes too large (more then 150 ms). Hence,

packets arriving too late for timely playback may be discarded by the receiver(packet loss). Similarly, if a router in the network is faced with too many packetsduring a traffic-burst period, it may have to drop packets.

Furthermore, let we see what we need standards in VoIP? The answer isvery simple; as with any communications technology, VoIP needs well-definedand industry supported methods for signalling call control information in order tosucceed. Without such standards, the ability to communicate between usersbecomes at best severely restricted or difficult to achieve. Initial implementationsof commercial VoIP solutions used proprietary techniques until industrydeveloped a consensus around the use of ITU-Ts multimedia conferencingstandards as a useful starting point. More specifically it was the development and

promotion of the H.323 specification, which provided the initial focal point withinthe industry.

Other relevant VoIP standards and recommendations include: H.225 defines the lowest layer that formats the transmitted video,

audio, data, and control streams for output to the network, andretrieves the corresponding streams from the network;

H.235 specifies the security requirements for H.323communications. Four security services are provided:authentication, integrity, privacy, and non-repudiation;

H.245 specifies messages for opening and closing channels formedia streams and other commands, requests and indications;

H.248, also known as Megaco (MEdia GAteway COntrol), is acurrent draft standard and a co-operative proposal from IETF andITU. Also described in RFC 3015. It addresses the samerequirements and has many similarities to MGCP;

H.261. If video capabilities are provided, it must adhere to theH.261 protocol with QCIF as its mode;

H.263 specifies the CODEC for video over the PSTN;


11/54

10

Various audio CODECs are specified under G.711, G.722G.723,G.723.1, G.726, G.729 and G.729.a;

T120 a protocol for data and conference control.Over 120 leading computer, telecommunication and technology

organisations have indicated their intent to support and implement H.323 in their

products and services. This wide ranging support establishes H.323 as the defacto standard for audio and video conferencing over the Internet.Moreover, the section 1.5 is giving more information and details about

VoIP protocols, codes, signaling (H.323, SIP, MGCP) and etc.


12/54

11

1.2. NGN and Internet fundamentals

The NGN is an evolutionary process and it can be expected that operatorswill take different migratory paths, switching to NGN while gradually phasing outexisting circuit networks, or building a fully-IP enabled network. The investment indeveloping NGN is motivated by several factors (Table 1.3). Telecommunicationoperators across the whole world have been faced with a decline in the numberof fixed-line telephone subscribers, coupled with a decrease in average revenueper user (ARPU), as a result of competition from mobile and broadband services.

Traditional sources of revenue (voice communications) have declinedrapidly and fixed-lines operators are subject to an increase in competitivepressure in the market to lower tariffs and offer innovative services. This hasgenerated pressure from the investors community to decrease the cost and

complexity of managing multiple legacy networks, by disinvesting from non-coreassets and reducing operational and capital expenses.

Table 1.3. NGN drivers

In this context, the migration from separate network infrastructures to nextgeneration corenetworks is a logical evolution, allowing operators to open up thedevelopment of new offers of innovative content and interactive, integratedservices, with the objective to retain the user base, attract new users, andincrease ARPU (see Box 3 in a Table 1.3). NGN is therefore often consideredessential for network operators to be more than bit pipes and to strategically


13/54

12

position themselves to compete in the increasingly converged world of servicesand content, where voice is no longer the main source of revenue, and maybecome a simple commodity. The investment in next generation access networks

both wired and wireless will be necessary in order to support the newservices enabled by the IP-based environment, and to provide increased quality.

At the same time, the important investment necessary to develop next generationinfrastructures brings about new economic and regulatory issues, which will beanalysed in the following sections.

Although the shift in the migration to all-IP networks is taking place atdifferent paces in different countries, several operators in the several countryacross the world have already updated their transport networks, and are nowdealing with NGN at the local access level. Solutions embraced by fixedoperators may also increasingly support IP Multimedia Subsystem (IMS), toenable fixed-mobile convergence.

For the moment the most common services provided through the newnetworks are the provision of PSTN/ISDN emulation services, i.e. the provision of

PSTN/ISDN service capabilities and interfaces using adaptation to an IPinfrastructure, and video on demand (VoDs). At the same time the businessworld is showing an increasing interest in new NGN-enabled services andapplications. Companies are migrating their Time Division Multiplexing switchesto IP in order to enable integrated applications for specific industry-basedfunctionalities and purposes.

Progress in the field of mobile (cellular) communications is taking shapewith the development of the IMS standard. For the moment two services havebeen standardised under the IMS protocol, Push to Talk over Cellular (PoC) andVideo Sharing. Prominent telecommunication network equipment suppliers areactively supporting the take up of IMS and some of them are implementing IMSstrategies and commercial IMS products. IMS is seen as the enabler for themigration to next generation networks of mobile operators and therefore for theimplementation of fixed-mobile convergence. No evident killer application hascurrently emerged, with many operators focusing on one specific service: voice.Facilitating the use of voice applications, enabling users to handle their callseasily between fixed and mobile networks, and to receive calls wherever theyare, is fundamental for the takeup of the service. Operating in an IMSenvironment would allow a seamless handover from WLAN (fixed) to mobileduring calls (Voice Call Continuity). In order for real-time voice calls to be offeredseamlessly between the circuit switched domain and the Wireless LANinterworking with IMS architecture, the Third Generation Partnership Project(3GPP) is currently working to develop the appropriate Technical Specificationsto define this functionality as a standard 3GPP feature. The study by 3GPP of thestandard is underway. In the meanwhile, fixed-mobile converged services havebeen launched by some mobile operators with access to fixed networks, using adifferent standard Unlicensed Mobile Access (UMA) allowing users toseamlessly switch from fixed to mobile networks (see below, paragraph on FixedMobile Convergence).


14/54

13

In addition, increasing competitive pressure on mobile carriers is comingfrom the IP world. Thanks to the availability of dual-use devices and Wi-Fihotspots, service providers such as Skype, Google, and others are able tooffer on the market a host of new services for mobile users in a very short periodof time. This rapidity constitutes an important comparative advantage, which in

some cases provoke the reaction of mobile operators (and manufacturers),tending to limit the services and applications users can access from their mobilehandset.

Moreover, the technological developments associated with nextgeneration networks should help combine the characteristics of the traditionaltelecommunication model, and of the new Internet model, dissolving the currentdivisions and moving towards a harmonised and coherent approach acrossdifferent platforms, gradually bringing to full convergence fixed and mobilenetworks, voice, data services, and broadcasting sectors. In short, in the futurethe choice of the technology used for the infrastructure or for access will nolonger have an impact on the kinds and variety of services that are delivered.

This however does not reflect the current situation, where the two worlds stillhave different visions and commercial models (Figure 1.3).

Figure 1.3 NGN Convergent model

The telecommunications tradition emphasises the benefits of highercapacity local fibre access facilities, and powerful network intelligence. Access inthis context should be simple and reliable, with centralised network managementand control to guarantee the seamless provision of a wide range of services,bundled network-content-applications offers, and one-stop shop solutions.

On the other hand, the Internet world traditionally focuses on edgeinnovation and control over network use, user empowerment, freedom to choose


15/54

14

and create applications and content, open and unfettered access to networks,content, services and applications. Freedom at the edges is considered moreimportant than superior speed of managed next generation access networks.Indeed, the Internet still represents different things to different people, and nextgeneration networks are seen as both a possibility for improved services or as a

way to constrain the Internet into telecommunication boundaries, adding newcontrol layers, capable of discriminating between different content, andmonetise every single service accessed. Services provided over nextgeneration networks allegedly will differ from services currently provided over thepublic Internet which is based on a best effort approach, where the quality oftransmission may vary depending on traffic loading and congestion in thenetwork, while with NGN packet delivery is enhanced with Multi Protocol LabelSwitching (MPLS). This allows operators to ensure a certain degree of Quality ofService similar to the more constant quality of circuit switched networks through traffic prioritisation, resource reservation, and other network-basedcontrol techniques, as well as to optimise network billing as in circuit-switched

transport. The concept of network-based control seems to be the main differencebetween the public Internet approach and next generation managed IP networksapproach. NGN offers the possibility to provide a detailed service control andsecurity from within the network, so that networks are aware of both the servicesthat they are carrying and the users for whom they are carrying them, and areable to respond in different ways to this information.

In contrast, the Internet aims to provide basic transmission, remainingunaware of the packets/services supported. While the Internet model remainstherefore completely open to users and new applications and services, inmanaged IP networks operators are able to control the content going through thenetwork. In turn, this may have negative implications for the content of third partyproviders if their traffic is discriminated against in relation to that of an integratedoperator. However, one is a clear, and that is the fact that wining combination ofNGN protocol stack will be IP/Ethernet/Optical, due to the fact that it gives themost intelligent infrastructure solution for NGN. In Figure 1.4 is shown thatintelligent IP/Ethernet NGN structure. Moreover, in the end of this section, oneNGN transport and service configuration example is shown in Figure 1.5,together with the main functions that are supported by NGN release 1specifications.

As it is well known in Release 1 all services are carried over IP althoughIP itself may in turn be carried over a number of underlying technologies, such asATM, Ethernet, etc. Release 1 assumes IPv4 or IPv6 networking at packetinterconnection points and packet network interfaces and therefore focuses onthe definition of IP packet interfaces.


16/54

15

Figure 1.4 Intelligent Infrastructure for IP NGN

Figure 1.5 Transport and service configuration of NGN


17/54

16

1.3. VoIP infrastructure

This section describes the main VoIP Infrastructures which are worldwideused, with high reliability, functionality and QoS provisioning. First of all, let wedescribe the main elements of the VoIP infrastructure (illustrated in Figure 1.6).

Figure 1.6. Basic VoIP Infrastructure

As we can see, the main VoIP entities are the following: VoIP Servers: These are the components responsible for processing

the VoIP signalling messages, routing the signalling messages to thecorrect destination and possibly executing additional services such as

user authentication, PSTN like services such as call forwarding and etc.As we can see here the VoIP servers are using the session initiationprotocol (SIP) (as the most worldwide signalling protocol), but it notexcusing the usage of H.323 too.

PSTN gateways: Often also called media gateways, these are thecomponents connecting the Internet to the PSTN and hence enablingcalls from the Internet to the PSTN and vice versa. These componentshave the following functionalities:

- Termination of PSTN signalling- Transcoding functionality (voice encoding from G.711 to other

media encoding-G.727, G.729 etc)

- Splitting voice samples into RTP Packets Address resolution servers: VoIP addresses are usually described

as URIs in the form of sip:user@domain or as an E.164 number+49303030. As with any Internet service, there is a need to translatebetween the high level service specific names and IP addresses. Forthis there are two major components in the VoIP architecture:- DNS servers: Domain name servers constitute a distributeddatabase enabling the mapping between domain names and IP


18/54

17

addresses.- ENUM severs: ENUM is a service enabling the mapping between anE.164 number and a SIP URI (for more details, see the next Module).

Authentication, Authorization and Accounting servers: AAA severscontain the necessary information to authenticate a user (e.g.,

password) as well as the user profile. The user profile indicates ingeneral the user specific services such as white and black lists or callforwarding specification for example. AAA servers are usually basedon protocols such as RADIUS or DIAMETER.

NAT-Traversal support: As SIP carries addressing information of thecommunicating end parties inside the signalling messages. Thereby,when using private addresses some additional mechanisms areneeded to assist the end systems in traversing the network addresstranslators, as clients which advertise private address cannot becontacted from the public Interent Some of these mechanisms to mapprivate addresses to public ones are located directly at the NATs,

some require additional servers to be provided by the VoIP provider(See Sec. 3.9 for details):- STUN: If a NAT itself cannot provide the mapping from the privateclients address, to a public visible one, the client can contact aSTUN server. This server provides mechanisms to detect the publicaddress of the client. Hence, the client will be enabled to generatemessages with its public address advertised.- RTP-Proxy: Some special kinds of NATs dont allow incomingconnections from a client unless this connection was initiated by thesame client. If two clients behind such NATs want to establish aRTP connection, one client needs to contact a public host (the RTPproxy) that allows incoming connections and as such proxies thetraffic between both user clients.

Application servers: Application servers are components thatenhance the VoIP service with additional services such asconferencing, voicemail or integration with other applications such ascalendar or media players.

The most important SIP operation is that of inviting new participants to acall. Moreover, SIP is worldwide used protocol for signalling in VoIP (besidesothers: H.323, MGCP and etc.), due to that here we are focused on VoIPInfrastructure solutions for SIP. In the chapter 1.5 and 1.6 more detail descriptionand elaboration for another VoIP signalling protocols and architecture is given.

In order to describe SIP NGN VoIP infrastructure we need to explain themain SIP functionalities and SIP entities:

Registrar: User agents contact registrar servers to announce theirpresence in the network. The registrar server is mainly thought tobe a database containing locations as well as user preferences asindicated by the user agents.


19/54

18

Proxy: A proxy server receives a request which it forwards towardsthe current location of the callee either directly to the callee or toanother server that might be better informed about the actuallocation of the callee.

Redirect: A redirect server receives a request and informs the

caller about the next hop server. The caller then contacts the nexthop server directly. User Agent: A logical entity in the terminal equipment that is

responsible for generating and terminating SIP requests.In SIP, a user is identified through a SIP URI in the form of

sip:user@domain. This address can be resolved to a SIP proxy that isresponsible for the users domain. To identify the actual location of the user interms of an IP address, the user needs to register his IP address at the SIPregistrar responsible for his domain (see Figure 1.7).

Figure 1.7. SIP register flow

Figure 1.8. SIP INVITE flow via Proxy

Thereby when inviting a user, the caller sends his invitation to the SIPproxy responsible for the users domain, which checks in the registrars databasethe location of the user and forwards the invitation to the callee (see Figure 1.8).The callee can either accept or reject the invitation. The session initiation is thenfinalized by having the caller acknowledging the reception of the callees answer.


20/54

19

During this message exchange, the caller and callee exchange the addresses atwhich they would like to receive the media and what kind of media they canaccept. After finishing the session establishment, the end systems can exchangedata directly without the involvement of any SIP proxy.

Furthermore, one example of VoIP Infrastructure Solution (Cisco VoIP

Infrastructure Solution for SIP) is been illustrated. There are two possibleapproaches of the Cisco VoIP Infrastructure Solution for SIP: from anintranetwork approach and an internetwork approach.

As a first step toward a total SIP-based VoIP solution in the intranetworkapproach, VoIP gateways configured to support SIP are implemented to replacethe traditional DAL and bypass carrier toll lines. In Figure 1.9, Cisco SIPgateways and an IP network have been introduced between the private branchexchanges (PBXs).

Figure 1.9. Toll Bypass and DAL Replacement

As the next step, SIP proxy servers are used to provide support for ascalable private number plan. In Figure 1.10, SIP proxy servers have been addedto the previous IP network.

Figure 1.10. Scalable Private Number Plan Support

Moreover, in the next step, Cisco SIP IP phones are added. These phonesconnect directly to the IP network and, when used with the other SIP

components, provide features such as call hold, call waiting, call transfer, andcall forwarding. In Figure 1.11, Cisco SIP IP phones have been connecteddirectly to the IP network.

As the next step, application services (such as a RADIUS server) areintegrated with the SIP proxy servers. This enables the SIP proxy servers toperform authentication (via HTTP digest). It also provides end customers withenhanced services, such as "find me" and call screening. The Cisco SIP


21/54

20

gateways interface with the application services using AAA and RADIUS forbilling purposes.

Figure 1.11. Cisco SIP IP phone Support

In Figure 1.12, application servers have been added to the IP network tointerface with the SIP proxy servers.

Figure 1.12. Application Services Support

In the final step, a unified-messaging server is added to provide voicemail. In Figure 1.13, a unified-messaging server has been added to the IPnetwork.

Figure 1.13. IP Telephony Services with Unified Messaging


22/54

21

Here the VoIP infrastructure (Solution for SIP) intranetwork phase issummarised:

At the center is a QoS-enabled IP network using Ciscointernetworking equipment with a set of Cisco SIP gateways and one

or more SIP proxy servers. The Cisco SIP gateways are connected to the PBXs via T1 or E1lines with channel-associated signaling (CAS) or primary-rate-interface (PRI) signaling.

Several traditional telephones or fax machines are connected tothe PBXs.

Cisco SIP IP phones are connected directly to the IP network. A server running a unified-messaging application is also

connected to the IP network. SIP is used for signaling (or session initiation) between the SIP

clients, Cisco SIP IP phones, Cisco SIP gateways, and SIP proxy

servers. RTP/RTCP is used to transmit voice data between the SIPendpoints after sessions are established.

As this example shows, the Cisco VoIP Infrastructure Solution for SIP isdesigned not only to provide an alternative to traditional telephony equipment,but also to interact with existing equipment.

Moreover, in the following a possible internetwork phase implementationof the Cisco VoIP Infrastructure Solution for SIP for integrating a SIP-enabledVoIP network with a public-switched-telephone-network (PSTN) infrastructure isillustrated. This phased approach builds on an existing SIP VoIP network asoutlined in the the above Intranetwork Phased Approach Implementation.

As the first step to an internetwork phased approach, Cisco Secure PIXFirewalls are added to the existing intranetwork for inside network security. InFigure 1.14, Cisco Secure PIX Firewalls have been added to the IP network.

Figure 1.14. The Cisco Secure PIX Firewall in a SIP Network


23/54

22

The final internetwork phase is to implement the Cisco SS7 Interconnectfor Voice Gateways Solution for integrating the SIP-enabled VoIP network with aPSTN infrastructure. In Figure 1.15, Cisco SS7 Interconnect for Voice GatewaysSolution components have been added.

Figure 1.15. Cisco SS7 Interconnect for Voice Gateways Solution Implemented with aSIP VoIP Network

Moreover, if we see from NGMN (Next Generation Mobile Network) ITU-Taspect, Mobile VoIP service is assumed to be a seamless service, i.e. a VoIPservice that is implemented such that it will ensure that mobile users will notexperience any service disruptions while changing the point of attachment.Mobile VoIP service requires the support of service continuity for terminalmobility taking into account network conditions (e.g. the number of user sessions,mobility events and bandwidth consumption) and users requirements. In thatpoint, Figure 1.16 illustrates a general network architecture involving twooperators supporting different types of access networks, i.e. cellular accessnetworks (such as 3G), WiFi access networks and mobile WiMAX accessnetworks and where users of the mobile VoIP service may move betweendifferent access networks in the same operator domain or between differentoperator domains.

As shown in Figure 1.16, NGN architectural components, i.e. servicecontrol functions (SCF), mobility management and control functions (MMCF),network attachment control functions (NACF) and resource and admissioncontrol functions (RACF), are assumed to be used for supporting the QoSenabled mobile VoIP service. For more details about QoS provisioningarchitecture in VoIP see in section 2.2 (in the following Module 2).


24/54

23

Figure 1.16 - General network architecture for QoS enabled mobile VoIP service


25/54

24

1.4. Peer-to-peer VoIP

The term peer-to-peer (P2P) refers to the concept that in a network ofequals (peers, see Figure 1.17) using appropriate information andcommunication systems, two or more individuals are able to spontaneouslycollaborate without necessarily needing central coordination. In contrast toclient/server networks, P2P networks promise improved scalability, lower cost ofownership, self-organized and decentralized coordination of previouslyunderused or limited resources, greater fault tolerance, and better support forbuilding ad hoc networks. In addition, P2P networks provide opportunities fornew user scenarios that could scarcely be implemented using customaryapproaches.

Figure 1.17 llustration of a Peer-to-Peer architecture

Moreover, in the following are give the core characteristic of P2Pnetworks:

Sharing of distributed resources and services: In a P2P network

each node can provide both client and server functionality, that is, it canact as both a provider and consumer of services or resources, such asinformation, files, bandwidth, storage and processor cycles.Occasionally, these network nodes are referred to as serventsderivedfrom the terms client and server.

Decentralization: There is no central coordinating authority for theorganization of the network (setup aspect) or the use of resources andcommunication between the peers in the network (sequence aspect).


26/54

25

This applies in particular to the fact that no node has central control overthe other. In this respect, communication between peers takes placedirectly. Frequently, a distinction is made between pure and hybrid P2Pnetworks. Due to the fact that all components share equal rights andequivalent functions, pure P2P networks represent the reference type of

P2P design. Within these structures there is no entity that has a globalview of the network. In hybrid P2P networks, selected functions, such asindexing or authentication, are allocated to a subset of nodes that as aresult, assume the role of a coordinating entity. This type of networkarchitecture combines P2P and client/server principles.

Autonomy: Each node in a P2P network can autonomously determinewhen and to what extent it makes its resources available to otherentities.

Moreover, P2P technology first came into focus through companies likeNapster, Kazza and Torrents, who developed file sharing applications that would

allow users to share their own files, as well as search for and download files ofother users on the network. Instead of relying on a centralized client serverrelationship, a peer to peer network gets its strength from each individual node,adding bandwidth and processing power with each new member for the good ofthe many. Moreover, peer to peer services can scale indefinitely without the useof expensive central servers (from a cost standpoint that is really rewarding).

Peer to peer Internet VoIP is, like Napster, a software application that youdownload on to your computer from a peer to peer VoIP service provider. Thesofteware or soft phone as it is called, is free to download and calls to or fromanyone on the network are free. The only hardware you need is a headset, or amicrophone and speakers. Internet telephony headsets are cheap and come inUSB or can plug directly into your sound cards. For those with web cams, manyproviders allow you to make video calls to others on the network for free.Services offered with this technology go above and beyond the Telco's, withconference calls, call forwarding, instant messaging and chat - peer to peerInternet telephony literaly turns your computer into a telephone/vidiophonecommunications center.

Like the traditional VoIP providers, calls within the network are freeworldwide, but calls to a PSTN number will usualy cost you if it is an option. Youcan in some cases, have different numbers in other locations so that people cancall you from a land line even from other counties toll free. Even if you do have topay to get on the PSTN, the rates are so much cheaper than aTelecommunication companies.

Just like the other forms of VoIP, developers have had some technologicalhurdles to overcome. Quality of Service, NATed firewalls, and centralizeddirectories of members using a dynamic IP address are just a few. Also, as callsand instant messages are routed through the public Internet, encryption is a mustfor any user. However, peer-to-peer calls become important if callers usefeatures like push-to-talk, video, and mesh-based audio conferencing. The VoIPversions of these features cannot be transmitted over PSTN. A peer-to-peer VoIP


27/54

26

call occurs when two VoIP phones communicate directly over IP without IP PBXsbetween them. A peer-to-peer call can be initiated directly, by calling a phonesSIP URI, or indirectly by dialing a phone number.

Probably the best known Peer-peer VoIP applications are skype, yahooIM, MSN and etc. Here as an example for better explaining of Peer-to-peer VoIP

concept we are focusing on Skype application. Skype is developed by theorganization that created Kazaa. Moreover, Skype allows its users to place voicecalls and send text messages to other users of Skype clients. In essence, it isvery similar to the MSN and Yahoo IM applications, as it has capabilities forvoice-calls, instant messaging, audio conferencing, and buddy lists. However, theunderlying protocols and techniques it employs are quite different.

Like its file sharing predecessor Kazaa, Skype uses an overlay peer-to-peer network. There are two types of nodes in this overlay network, ordinaryhosts and super nodes (SN). An ordinary host is a Skype application that can beused to place voice calls and send text messages. A super node is an ordinaryhosts end-point on the Skype network. Any node with a public IP address having

sufficient CPU, memory, and network bandwidth is a candidate to become asuper node. An ordinary host must connect to a super node and mustauthenticate itself with the Skype login server. Although not a Skype node itself,the Skype login server is an important entity in the Skype network as user namesand passwords are stored at the login server. This server ensures that Skypelogin names are unique across the Skype name space. Starting with Skypeversion 1.2, the buddy list is also stored on the login server. Figure 1.18illustrates the relationship between ordinary hosts, super nodes (in Figure 1.19the worldmap of super nodes to which Skype establishes a TCP connection atlogin are given) and the login server.

Apart from the login server, there are SkypeOut and SkypeIn serverswhich provide PC-to-PSTN and PSTN-to-PC bridging. SkypeOut and SkypeInservers do not play a role in PC-to-PC call establishment and hence we do notconsider them to be a part of the Skype peer-to-peer network. Thus, we considerthe login server to be the only central component in the Skype peer-to-peernetwork. Online and offline user information is stored and propagated in adecentralized fashion. Moreover, illustration of Skype login process is given inFigure 1.20. In this Skype login process, SC sends UDP packets of length 18bytes to all bootstrap SNs. After 5s, it attempts TCP connections with the sevenbootstrap SN IP address and ports 33033. Authentication with the login server isnot shown in Figure 1.20.


28/54

27

Figure 1.18 - Skype P2P Network

Figure 1.19 - Worldmap of super nodes to which Skype establishes a TCP connection atlogin

The Skype network is an overlay network and thus each Skype client (SC)needs to build and refresh a table of reachable nodes. In Skype, this table iscalled host cache (HC) and it contains IP address and port number of supernodes. Moreover, the Skype client listens on particular ports for incoming calls,maintains a table of other Skype nodes called a host cache, uses widebandcodecs, maintains a buddy list, encrypts messages end-to-end, and determines ifit is behind a NAT or a firewall.


29/54

28

Figure 1. 20 Skype login process

Starting with Skype v1.0, the HC is stored in an XML file. Skype also haveimplemented a 3G P2P or Global Index technology, which is guaranteed to finda user if that user has logged in the Skype network in the last 72 hours.

Skype uses wideband codecs which allows it to maintain reasonable call

quality at an available bandwidth of 32 kb/s. It uses TCP for signaling, and bothUDP and TCP for transporting media traffic.

In the following the key features of Skype are summarized: Online and offline user information is stored and propagated in a

decentralized fashion and so are the user search queries. Skype has the ability to traversal firewalls and NATs by using a variant

of STUN & TURN protocol to determine the type of NAT and firewall it isbehind.

Skype uses wideband codecs (iLBC and iSAC) to maintain reasonablecall quality at an available bandwidth of 32 kb/s. Skype codecs allow

frequencies between 50-8,000 Hz to pass through. It uses TCP for signaling, and both UDP and TCP for transporting

media traffic. Signaling and media traffic are not sent on the same ports. Skype uses 256-bit encryption known as AES (Advanced Encryption

Standard), which has a total of 1.1 x 1077 possible keys, in order toactively encrypt the data in each Skype call or instant message. It alsouses 1536 to 2048 bit RSA to negotiate symmetric AES keys. Userpublic keys are certified by Skype server at login.


30/54

29

Skype has also implemented a 3G P2P or Global Index technologywhich has the ability to find a user if that user has logged in the Skypenetwork in the last 72 hours.

Silence suppression is not supported in order to maintain UDPbindings and TCP congestion window size

Skype functions can be classified into startup, login, user search, callestablishment and tear down, media transfer, and presence messages. All thosefunctions are explained below:

Startup: When Skype run for the first time, it contact Skype serverwith HTTP message to get latest version.

Login: During this process, skype client authenticates its username and password with the login server, advertises its presenceto other peers and its buddies, determines the type of NAT andfirewall it is behind, and discovers online Skype nodes with publicIP addresses.

Search: Skype uses its Global Index (GI) technology to search for

a user. If both users were on public IP addresses, online and were in thebuddy list of each other, then upon pressing the call button, thecaller skype client established a TCP connection with the calleeclient. Signaling information was exchanged over TCP.

During call tear-down, signaling information is exchanged overTCP between caller and callee if they are both on public IPaddresses, or between caller, callee and their respective SNs.

Compared to Yahoo, MSN, and Google Talk applications, Skype reportedthe best mouth-to-ear latency. Moreover, Skype is a selfish application and ittries to obtain the best available network and CPU resources for its execution. Itchanges its application priority to high priority in Windows during the time call isestablished. It evades blocking by routing its login messages over SNs. This alsoimplies that Skype is relying on SNs, who can misbehave, to route loginmessages to the login server. Skype does not allow a user to prevent its machinefrom becoming a SN although it is possible to prevent Skype from becoming aSN by putting a bandwidth limiter on the Skype application when no call is inprogress. Theoretically speaking, if all Skype users decided to put bandwidthlimiter on their application, the Skype network can possibly collapse since theSNs hosted by Skype may not have enough bandwidth to relay all calls.


31/54

30

1.5. VoIP protocols and codecs

Besides SIP and H.323 VoIP protocols (mentioned in section 1.1), thereare a number of other protocols that may be used in VoIP applications. Althoughthese protocols will generally interoperate with H.323 standards, some may not.In that context, some of the main other VoIP protocols include:

Media Gateway Control Protocol (MGCP): A development of SGCPand IPDC protocols. It is a signaling and control protocol for controllingVoice over IP (VoIP) Gateways from external media gatewaycontrollers or call agents. A VoIP Gateway is a part of a network thatprovides conversion between the audio signals carried on telephonecircuits and data packets carried over the Internet or over other packetnetworks. Media Gateway Control (MEGACO) and H.248 are an

enhanced version of MGCP. MGCP responds to the requirements inRFC3435 Media Gateway Control Protocol Version 1;

IP Device Control (IPDC). A group of protocols for controlling hardwaredevices such as control gateway devices at the boundary between thecircuit- switched telephone network and the Internet. Examples of suchdevices include network access servers and VoIP gateways;

Real Time Transport Protocol (RTP). Described in IETF RFC 1889,this is a realtime, end-to-end protocol, utilising existing transport layersfor data that has realtime properties;

RTP Control Protocol (RTCP). Described in IETF RFC 1889, a protocolto monitor QoS and carry information on the participants in a session.

It also provides feedback on total performance and quality so allowmodification to be made.

Resource Reservation Protocol (RSVP). Described in IETF RFC 2250-2209. This is a general purpose signalling protocol allowing networkresources to be reserved for a connections data stream, based onreceiver-controlled requests. There may be scability issues in usingthis protocol due to its focus and management of individual applicationtraffic flows26;

Simple Gateway Control Protocol (SGCP). SGCP is a simple "remotecontrol" protocol that the call agent uses to program gatewaysaccording to instructions received through signalling protocols such as

H.323 or SIP27. Now superseded by MGCP, an IETF work inprogress;

Session Announcement Protocol (SAP). Protocol used by multicastsession managers to distribute a multicast session description to alarge group;

Real Time Streaming Protocol (RTSP). Interface management to aserver providing real-time data;


32/54

31

Session Description Protocol (SDP). Describes the session for otherprotocols including SAP, SIP and RTSP.

In common with many communication and data systems, the protocolsused in VoIP generally follow a layered hierarchy, similar to the Open Systems

Interconnect theoretical model developed by the International Organisation forStandards (OSI). There are, however, exceptions to this, for example IP overATM. The following Table 1.4 provides an overview of the principal VoIPprotocols (as described in a Cisco).

Table 1.4. Illustration of the main VoIP protocols

VoIP codecs

A VoIP codec ("coder - decoder") is an algorithm that squeezes (the"coder") digitized audio so it fits more easily into a VoIP data channel (IPpackets), and then re-expands it (the "decoder") so the user can hear the audioonce again. VoIP codecs operate by taking uncompressed digital audio, andapplying an agreed, standardized algorithm to reduce the number of bits it takesto represent that audio. It's important for both ends of a phone call to agree on

what that algorithm should be, of course, and in a VoIP phone call (using H.323,SIP and etc. signaling), this agreement is achieved when the call is first placed,through a process called "capabilities exchange". In other words, the codecs(coder/decoders) provide the means to convert analogue voice signals to digitalsignals and reverse the process on delivery. Codecs are also known asVocoders or voice coder/decoders. On conversion from analogue to digital, adata stream is packetised and transported across the network. The receivingendpoint will not only have to reassemble the packets into the correct sequence,


33/54

32

but also decode the contents. Clearly commonality of standards and codecs isessential if the communication is to be intelligible. Any detected signaling tonesare routed around the codec which can modify the tones to the point it is notrecognized by the device being signaled.

Moreover, every VoIP phone contains one or more codecs, and during call

establishment, they share their lists of supported codecs. One phone, forexample, may say "Hey stranger, I can support codecs A, B, or C", and the otherone will respond "Nice to meet you, I can support codecs "B, C, or D." At thispoint, both phones recognize that they could converse in either B or C (thisprocess is can easily be compared to two multilingual strangers meeting on thestreet, figuring out what languages they share, then deciding which of the sharedlanguages to proceed in). Depending on how they have been set to prioritizevarious parameters, one phone may then say "well, since C gives better audiobandwidth than B, let's proceed with that," or "B uses a lower bit rate and mycompany thinks that's more important, so let's proceed with that."

Most VoIP phones contain a number of different codecs covering a range

of performance levels and, often, bandwidths. Having wideband capabilitydoesn't mean that a phone is unable to connect to a narrowband phone, it justmeans that it has a wider repertoire and can do both, like a musician who canplay both clarinet and saxophone (perhaps not at the same time). So now let'sconsider the following question: On what basis does one evaluate and choosethose wideband codecs?

Here's a first glance at the most important codec characteristics:

Audio bandwidth (higher is better)

Data rate or bit rate (how many bits per second, fewer is better)

Audio quality loss (how much does it degrade the audio, lower is

better) Kind of audio (does it only work with speech, or with anything?)

Processing power required (less is better)

Processor memory required (less is better)

Openly available to vendors? (yes is essential)

Inserted delay (audio latency caused by the algorithm, less is better)

Resilience (how insensitive to lost or corrupted packets, more is better)

ITU standards-based (standardized by the InternationalTelecommunications Union - yes is better)

It is quickly evident from the large number of parameters, however, that nocodec is likely to be "best" in all categories at any given time. As you readthrough this, its possible you already have experience in evaluating theseparameters among narrowband codecs for existing VoIP systems (if you've evercompared G.711 against G.729, for example). Except for boosting the audiobandwidth to wideband, the other tradeoffs are much the same. Let's look at


34/54

33

some of the key parameters, and compare them among the most popularwideband VoIP codecs.

Principal Wideband VoIP Codecs Today are:

L256. The simplest of all wideband codecs, the 7 kHz L256 directly sendsall the bits of digital audio sampled into 16-bit words at 16 kilosamples persecond (ksps), using no compression whatever, hence the name ("Linear256 ksps). L256 is a basic requirement in all VoIP phones, but is seldomused because of its high bit rate.

G.719. Perhaps the best match among requirements for communicationsystems at 20 kHz, G.719 is a recent ITU-approved arrival that combinesexcellent quality for music and voice with low latency, modest processorload, and network-friendly bit rates.

G.722. This is the grandfather of 7 kHz wideband VoIP codecs, and themost widely deployed so far. G.722 applies adaptive differential pulsecode modulation (ADPCM) to high and low frequencies separately,

yielding an algorithm that works equally well with music or voice. G.722.1. Also known as "Siren 7," this modern 7 kHz audio codec is in

almost every videoconferencing system today and is gaining traction inVoIP because of its higher efficiency and lower bit rate. G.722.1 is a"transform" (as in "Fourier transform") codec and works by removingfrequency redundancies in any kind of audio.

G.722.2. This codec, "AMR-WB," is a 7 kHz wideband extension of thepopular adaptive multi-rate (AMR) cellphone algorithm, and excels indelivering wideband high-quality voice at the lowest bit rates. G.722.2'salgebraic code excited linear prediction (ACELP) algorithm is optimized for

speech, and works by sending constant descriptions of how to shape andstimulate a human speech tract to reproduce the sound you feed into it.

G.722.1 Annex C. Also known as "Siren14," this is a 14 kHz extension ofG.722.1 and is popular because of its wider bandwidth, its efficiency, andits availability (under license) for zero royalty.

Speex. Speex is an open-source CELP codec.

MPEG. There are more than 25 versions of the moving pictures expertgroup (MPEG) transform codecs, each delivering a set of performancelevels optimized for various parameters. The variant best suited totelecommunications is MPEG4 AAC-LD, a lower-delay version of the

intended MP3 successor, MPEG4 AAC. MP3. The popular MP3 format uses a form of transform coding, and is

optimized for media distribution.

FLAC. The Free Lossless Audio Codec (FLAC) produces much higher bitrates than most other codecs, but compensates by preserving completeaudio quality.


35/54

34

Each conferencing environment has its own acoustical challenges that requirean appropriately designed conferencing solution. Let's examine some of thesedifferences.

Audio Bandwidth

Audio bandwidth corresponds to audio fidelity, that is, the ability to carrysounds ranging from very low pitches, like a kettledrum or a sonic boom, to veryhigh pitches, like a cymbal or a plucked guitar. Therefore, more bandwidth isbetter. The human voice has important content beyond 14 kHz (this is whywideband telephony, even at 7 kHz, delivers such a telling improvement overolder 3 kHz analog phones). The human ear can be sensitive to 20 kHz, andvirtual every medium we experience today carries sound over this full range.VoIP is also working its way toward supporting up to 20 kHz, but today, there aremore codecs available to support 7 kHz audio than these even higherbandwidths. This is because 7 kHz in and of itself provides an easily achievable

and dramatic improvement in voice-only communications. Desktop phonessupporting 7 kHz are available from many vendors today, but to-date, onlyPolycom has introduced conference phones at 14 kHz and 20 kHz.

Data Rate

The data rate required by a compressed audio channel becomes importantwhen network bandwidth is limited, especially when supporting multiple phoneconnections. This is a common issue in narrowband VoIP telephony (comparingthe bit rates of G.711 vs. G.729 is a common discussion), and its importance inwideband-capable systems is no different. Table 1.5 shows some typical

numbers.

Table 1.5 Audio bandwidth versus bit rate for some popular VoIP codecs.

BW (kHz) Typical bit rate (kbps)

3.3 8 (G.729), 56 (G.711)

7 10 (G.722.2), 24 (G.722.1), 64 (G.722)

14 32 (G.722.1C)

20 32 (G.719), 64 (AAC-LD)

22 32 (Siren22)

From the Table 1.5 we can easy conclude that the typical bit rates don'tnecessarily raise with rising audio bandwidth; the bit rate has as much to do withthe codec chosen as with the bandwidth. The reasons for this are twofold: audio


36/54

35

contains most of its information in the lower frequencies, so there's lessinformation to be coded and sent in the higher frequencies, and the human ear isless sensitive to inaccuracies at the higher frequencies, so a compressionalgorithm can be a little less precise without being noticed.

Another point to note is the span of bit rates among wideband codecs. For

example, 7 kHz audio requires 64 kbps from G.722, but only 10 kbps fromG.722.2. Here, the difference is due to the assumptions made by the codecs.G.722.2, an ACELP codec, assumes that it's working on human speech. It knowsthat it's not going to be fed the sounds of a violin or a speeding locomotive, so ittakes a whole different approach to compression and consequently can beextremely efficient about it. This is why G.722.2 is preferred for cellphone use,where the cost of the bit rate is high, but another codec such as G.722.1 wouldbe preferred if the application were broader and included multiple talkers, ormusic. Figure 1.21 shows how these different VoIP codecs stack up whencomparing bandwidth to bit rate.

Figure 1.21: Audio bandwidth versus bit rate for different VoIP codecs.

Processor loading

High-complexity codecs drive up the cost of a VoIP phone or endpointbecause they require faster, more expensive processors and more memory. Theissue multiplies with VoIP phones that perform multi-party bridged calls internally,which is a common VoIP-enabled feature today. Table 1.6 gives a couple of good

example of how codecs can differ in their appetites for processor power. At 7kHz, G.722.2 shows the highest demand for processor power, but we rememberthat its operation also results in the lowest bit rate. G.722.1, at one-seventh theprocessor power of G.722.2 and 40 percent the bit rate of G.722, is a goodcompromise. Comparing the two 20 kHz codecs, the difference in processorloading is striking, but surprisingly, there's no compensating advantage in bit rateor quality.


37/54

36

Table 1.6 MIPS versus audio bandwidth for some popular codecs.

BW (kHz) Codec MIPS

7 G.722.2 38

7 G.722 14

7 G.722.1 5.5

20 G.719 18

20 MPEG-4 AAC-LD 36

This may be because the MPEG codec adapts technology originally intendedfor media streaming and recording where G.719 was always targeted for VoIPand telecommunications, but it does form a good demonstration of how importantdifferences can pop up.

Audio Quality

One reason that comparing codecs can get tricky is that audio quality, asomewhat subjective measure, is also tightly related to bit rate within a particularcodec. One codec may tout extremely low bit rates, but a quick listening willreveal that the audio quality at those lowest bit rates is almost unusable. In thispaper, I have tried to relate bit rates at comparable audio qualities, so the"typical" figures here will often be higher than the provider's "minimum" figures.But they are realistic, and appropriate for VoIP usage. There are standardobjective measures of audio quality (MOS, PESQ, etc.), but if you are making aserious comparison of codecs, it's best to do a real "applesto-apples" test andapply the same standard test track to all candidate algorithms, with eachcandidate running at its planned bit rate. Private and open-source codecsuppliers will be glad to provide an algorithm simulator that runs on a standardPC, making it possible for you to do this test yourself. Give some thought to thistest track. Even in VoIP applications, we're not always dealing with just thehuman voice. You often will find two or more voices speaking at the same time,or someone talking in a room with lots of reverberation: two situations that canreally throw off a human voice tract codec (some CELP and ACELPimplementations, for example, can be particularly sensitive to these things). Evena door closing or pencil dropping while someone is talking can come across verystrangely, so it's good to have a full test in order to build a good comparison.

Latency

Latency is the time delay from when you say a word until the other personhears it, also referred to as the "mouth-to-ear" delay. When it gets too long,


38/54

37

conversations become difficult and stilted, with participants frequently butinadvertently interrupting each other and not understanding why. Twenty yearsago, we often heard very long latencies on long-distance calls because of thewidespread use of satellite links (even at the speed of light, a couple of hops offof satellites perched 22,500 miles above the earth add up to the better part of a

second), but today long latencies are mostly the consequence of oldervideoconferencing systems or carelessly planned VoIP phone systems.A common recommendation is that one-way latency, which includes the

codec, should not exceed 150 milliseconds. This is not a problem in a well-designed system using telecom codecs such as G.722, G.722.1, etc. Butoccasionally a media or streaming codec will find its way into a VoIP system withdisruptive results. Media codecs, such as those used to transmit streaming audioover the internet, are often not optimized for latency because one-way streamingconnections are not sensitive to latency. Because they can insert an appreciablefraction of a second delay, they should be avoided in VoIP andtelecommunication systems.

Another contributor to latency in a VoIP system is a "jitter buffer." This is akind of shock absorber for data flow that soaks up the momentary variations thatoccur in any IP network. These are sometimes embedded within a codec, whichmakes it important to be sure that multiple, redundant jitter buffers are notinadvertently built into a system (a jitter buffer can be 20 to 80 milliseconds ormore).

Cost

As users, we usually do not see the cost of a codec, but cost can influence itsselection into a phone system or a phone. There are license fees, or royalties,

associated with some codecs; often, a per-year minimum fee, with a per-port orper-phone fee, and perhaps an initial fee as well.Some of the codecs we use today, such as G.722, are royalty-free because

the underlying patents have expired. Some codecs, such as G.722.1 Annex C,are royalty-free because their vendor has decided that the industry is betterserved if high performance codecs are widely deployed. Other codecs, such asMPEG4 AAC-LD, still bear royalties. Royalties are not necessarily a bad thing,because codecs are often the result of long and expensive research resulting invaluable characteristics (such as low bit rate in G.722.2, which saves money inuse). They are simply something to be aware of when considering your VoIPnetwork deployment plans.

Standardization and Availability

The ITU is the de facto worldwide agency for standardization oftelecommunications codecs. This is the industry organization that assigns thenumbers to our familiar codecs; G.722.1's full name, in fact, is ITU-T G.722.1,because it is a product of the ITU-T Telecommunications Standards Sector, andlike all ITU standards has been subjected to open, rigorous multi-vendor


39/54

38

evaluation before being accepted. While proprietary codecs may be incorporatedin limited-use systems, it's of paramount importance that business VoIPtelephony systems, which require worldwide interoperability and high reliability,be configured with ITU-approved codecs. The ITU sanction also ensures thatcodecs are available to all vendors on fair and reasonable terms.

Furthermore, in the end of this section, we must to emphasize that here aretwo trends to keep in mind in VoIP audio bandwidth today; one is strategic, andone is technical.

The strategic trend is this: VoIP telephony is moving toward full bandwidth20 kHz sound, because the VoIP endpoint is undergoing transformation to amulti-purpose, multimedia device that integrates communications, applications,and even entertainment. As you have seen, there's little cost or bit rate penalty ingoing to wideband telephony using modern codecs (even the fullband G.719codec has a lower bit rate than G.711), and competitive pressure will drive VoIPvendors to achieve full human compatibility in a very few years. Someapplications will remain at the voice-friendly 7 kHz point due to tight cost or size

constraints, but we'll see an increasing tide of fully capable 20 kHz VoIP systemswith unified capabilities.The technical trend follows from the strategic: which are the codecs will

bring us to this 20 kHz world? At 7 kHz, G.722 is mature, free, and already widelydeployed in endpoints and in PBXs and softswitches. G.722.2 will be deployed inapplications where its higher cost is offset by its very low bit rate and high quality,much of this driven by mobile phones. Its adoption there will push the network,and consequently wired endpoints, to follow. And finally, G.722.1 addsmultimedia capability at less than half the bit rate of G.722, and one-seventh theprocessing cost of G.722.2. These three codecs form a functionally complete setfor 7 kHz performance. The choice at 14 kHz is G.722.1 Annex C because of itsmaturity, modest bit rate and processing needs, and zero-cost license.

And finally, 20 kHz performance in the VoIP world will come from the ITU'snew G.719, as the likely successor to Siren22 (Siren22 is a principal predecessorof G.719, however).


40/54

39

1.6. Signaling for VoIP

The first hurdle to overcome when making a VoIP phone call is toestablish a connection between the parties involved. In legacy telephony, this isdone by switching circuits until a physical wire is established between locations.The Internet Protocol on the other hand is connectionless by nature. IP packetshave a tendency to take whatever route they find first, and end up in whateverorder they arrive. For time sensitive applications such as voice and video this isunacceptable. Steps must be taken to establish a point to point connection and tokeep it open for the duration of the call. Similar to the handshake of the DHCPprotocol, the VoIP signaling protocols use TCP to set up, manage and tear downthe VoIP phone call. Signaling protocols are not concerned with the actual mediastream of voice or video, and could care less about QoS and traffic engineering.

Their basic functions are to first initiate a session, then to find common groundfor communication between the parties involved, and to terminate the session atcalls end. In the following the most used signaling protocols for VoIP (H323, SIPand MGCP) are presented.

1.6.1. H.323

Derived from related specifications for multimedia conferencing overISDN, H.323 defines a protocol architectural framework (see Figure 1.22) thatencompasses the ability to use it in both direct-routed and server-routedsignalling modes. Within this architecture, the server-routed signalling mode isknown as gatekeeper routed signalling due to the term used within the H.323specifications to describe the server component.

Figure 1.22. Protocol Architecture for H323.


41/54

40

H.323 is actually an umbrella standard, encompassing several otherprotocols, including H.225, H.245, and others. It acts as a wrapper for a suite ofmedia control recommendations by the ITU. Each of these protocols has aspecific role in the call setup process, and all but one are made to dynamic ports.Figure 1.23 shows the basic H.323 architecture and Figure 1.24 provides an

overview of the H.323 typical registration and call set-up process.

Figure 1.23. Basic Architecture of H323.

An H.323 network is made up of several endpoints (terminals), a gateway,and possibly a gatekeeper, Multipoint control unit, and Back End Service. Thegatekeeper is often one of the main components in H.323 systems. It providesaddress resolution and bandwidth control. The gateway serves as a bridgebetween the H.323 network and the outside world of (possibly) non-H.323devices. This includes SIP networks and traditional PSTN networks. Thisbrokering can add to delays in VOIP, and hence there has been a movementtowards the consolidation of at least the two major VoIP protocols. A MultipointControl Unit is an optional element that facilitates multipoint conferencing andother communications between more than two endpoints. Gatekeepers are anoptional but widely used component of a VoIP network. If a gatekeeper ispresent, a Back End Service (BES) may exist to maintain data about endpoints,including their permissions, services, and configuration.

Currently in version 2, H.323 is a standard recommended by theTelecommunication Sector of the ITU. It defines real-time multimediacommunications and conferencing over packet-based networks that do notprovide a guaranteed Quality of Service (QoS) such as the LAN and the Internet.As we mentioned before, it is an umbrella standard belonging to the H.32xclass of standards recommended by the ITU for videoconferencing applications.


42/54

41

Figure 1.24. H.323 typical registration and call set-up process

These were amongst the earliest standards to classify and providesolutions to VoIP (given in section 1.1):

H.310 for conferencing over Broadband ISDN (B-ISDN); H.320 for conferencing over Narrowband ISDN; H.321 for conferencing over ATM; H.322 for conferencing over LANs with guaranteed QoS; H.324 for conferencing over Public Switched Telephone Networks.

Earlier versions of H.323 had a large overhead in control signalling,particularly when establishing a session. This has presented some scalabilitylimitations, especially when a large number of simultaneous sessions arepresented. Subsequent version have focussed on addressing these issues.

However, H.323 is an immensely powerful technology, incorporating manyfeatures that can be switched on or off depending upon the network deploymentcontext. It is by the careful choice of these options and appropriate design ofgatekeeper-based applications to route signalling messages that H.323 networksmay be scaled to very large dimensions. As can be seen in Figure 1.24; thecalling client initially engages in a registration sequence to identify it to thenetwork. This type of behaviour is an essential feature of a VoIP system and


43/54

42

effectively provides a degree of inherent terminal and user mobility functionalitysince the client may register from anywhere in the connected IP network.

1.6.2. SIP

In contrast to H.323, the IETF has been developing a competing, but

potentially complementary, architecture for multiparty, multimedia conferencingon the Internet as is shown in Figure 1.25. The session initiation protocol (SIP) isa component of this architecture and provides the basic session controlmechanism used within it. The SIP protocol has gained a substantial followingwithin the industry by offering the potential for an easily implemented method ofestablishing and controlling basic voice calls. From its very lightweight inception,SIP has been developed to address the challenges of being used outside basicpoint-to-point voice calls and an overly simplistic direct-mode signaling model. Ascan be seen by contrasting the same simple call flows using H.323 (Figure 1.24)with those using SIP (Figure 1.26), the differences between the technologies arenot always as obvious as some may wish them to be, which belies the true roles

each should be able to play.

Figure 1.25. IETF multimedia conferencing architecture.

Moreover, the architecture of a SIP network is different from the H.323

structure. A SIP network is made up of end points, a proxy and/or redirect server,location server, and registrar. A diagram is provided in Figure 1.27. In the SIPmodel, a user is not bound to a specific host (neither is this the case in H.323,gatekeeper provides address resolution). The user initially reports their locationto a registrar, which may be integrated into a proxy or redirect server. Thisinformation is in turn stored in the external location server.

Furthermore, the messages from endpoints must be routed through eithera proxy or redirect server. The proxy server intercepts messages from endpoints


44/54

43

or other services, inspects their To: field, contacts the location server to resolvethe username into an address and forwards the message along to theappropriate end point or another server. Redirect servers perform the sameresolution functionality, but the onus is placed on the end points to perform theactual transmission. That is, Redirect servers obtain the actual address of the

destination from the location server and return this information to the originalsender, which then must send its message directly to this resolved address(similar to H.323 direct routed calls with gatekeeper).

It is more then obviously that SIP protocol itself is modeled on the three-way handshake method implemented in TCP (see Figure 1.26).

Figure 1.26. Typical registration and call set-up using SIP.

The main advantages of SIP consists of its offering an easilyimplemented, powerful, control environment capable of scaling to very largenetworks due to its simple message request/response format. This, combinedwith its relative immaturity compared with H.323, encouraged its adoption in theaccess segment of third generation networks, since this affords the opportunity toincorporate any mobile-specific elements that were subsequently identified.

On the other hand, both protocols can be extended to manage newcapabilities. The argument has been advanced that H.323 is more stablebecause of its maturity but SIP provides better support for some functionality andis easier to implement. Fortunately the ITU and the IETF are now co-operating indeveloping standards in this area.


45/54

44

Figure 1.27. Typical SIP Architecture.

Moreover, in Figure 1.28 a simplified illustration of a call between VoIPSIP phones within the same SIP IP telephony network is given. When calls aremade within a single SIP IP telephony network, the process typically involves theorigination and destination phones and a single proxy server.

Figure 1.28. Calls Within a Single SIP VoIP Network

In this illustration, the following sequence occurs:

1. Cisco SIP IP phone A initiates a call by sending an INVITE message to theSIP proxy server. (There can be more than one proxy server for redundancy.)


46/54

45

2. The SIP proxy server interacts with the location server and possibly withapplication services to determine user addressing, location, or features.

3. The SIP proxy server then proxies the INVITE message to the destinationphone.

4. Responses and acknowledgments are exchanged, and an RTP session isestablished between Cisco SIP IP phones A and B.

When calls are made between SIP VoIP networks, the process typicallyinvolves the origination and destination phones as well as two or more SIP proxyservers. Figure 1.29 is a simplified illustration of a call between SIP VoIP phonesin different SIP VoIP networks.

Figure 1.29 Calls Between SIP IP Telephony Networks

In this illustration, the following sequence occurs:

1. Cisco SIP IP phone A initiates a call by sending an INVITE to the SIP proxyserver. (There can be more than one proxy server for redundancy.)

2. The SIP proxy server might interact with application services such as RADIUSto obtain additional information.

3. The SIP proxy server in phone A's network contacts the SIP proxy server in

phone B's network. The local proxy uses the domain name system (DNS) domainto determine if it should handle the call or route it to another proxy. The remoteproxy is contacted based on the domain of the destination device.

4. The SIP proxy server in phone B's network might interact with applicationservices to obtain additional information.


47/54

46

5. The SIP proxy server in phone B's network contacts the destination phone(Cisco SIP IP phone B).

6. Responses and acknowledgments are exchanged, and an RTP session isestablished between Cisco SIP IP phones A and B.

Moreover, SIP 200 OK, 180 Ringing, and 183 Session Progressmessages pass through the same set of proxies, for they are in the same callsequence. SIP CANCEL or BYE requests sent by a terminating user agent mightor might not pass through the same set of proxies.

Furthermore, when calls are made between a SIP VoIP network and atraditional telephony network, the process typically involves the originationphone, one or more proxy servers, a gateway, and a PBX or PSTN device.Figure 1.30 is a simplified illustration of a call between a Cisco SIP IP phone anda traditional phone in a traditional PSTN.

Figure 1.30 Calls Between a SIP VoIP Network and a Traditional TelephonyNetwork

In this illustration, the following occurs:

1. Cisco SIP IP phone A initiates a call by sending an INVITE to the SIP proxyserver. (There can be more than one proxy server for redundancy.)

2. The SIP proxy server might interact with application services such as RADIUS

to obtain additional information.

3. The SIP pr

Module 1 - VoIP Fundamentals

Documents

Transcript of Module 1 - VoIP Fundamentals