INFO 203Week #61 IT For Engineers Application and Transport Layers INFO 203 Dr. Jennifer Booker.

INFO 203 Week #6 1

IT For EngineersApplication and Transport Layers

INFO 203Dr. Jennifer Booker

Application Layer The Application Layer is the reason the rest of

the network exists – to serve applications Most of the software familiar to end users are

applications Email, FTP, newsgroups, chat, the Web, streaming

video, video conferencing, IPTV, etc. We focus first on key concepts related to the

Application Layer, then discuss some specific applications briefly

Week #6 2INFO 203

Application Layer New applications designed for network

implementation need to decide whether the application is based on Client-server architecture Peer to peer (P2P) Or some hybrid combination of the two

Week #6 3INFO 203

Client-server Architecture In client-server architecture, the server

Handles requests from many clients, and Is generally always available Often has a fixed IP address

Clients generally don’t communicate with each other, and may be on or off independently of each other and the server Client-server applications include email, FTP,

the Web, remote login

Week #6 4INFO 203

P2P Architecture P2P architecture assumes the clients are on

or off at will, and all are treated equally as potential servers and/or clients Apps include BitTorrent, Skype, and IPTV Client-server and P2P combinations exist, called

a Hybrid Architecture

Week #6 5INFO 203

Process Communication Any network application (no matter which

architecture) needs to communicate between hosts using processes In this sense, a process is a program running on a

client, server, or peer host Processes may communicate with other

processes on the same host; this is controlled by the host’s operating system (OS)

We are interested in processes that communicate between hosts

Week #6 6INFO 203

Process Communication Processes exchange messages

The sending or client process creates a message and sends it into the network

The receiving or server process gets the message from the network and might reply

Notice that client and server process only relate to their relative roles in sending a message, not the client-server or other architectures mentioned earlier

Week #6 7INFO 203

Addressing Processes For the server process to get the message, it

has to be addressed correctly The host address and receiving process are

the key parts of the address The host address is its IP address (the 32-

or 128-bit address of the host’s network interface) The receiving process is identified by its

port number, since many processes can be running at once

Week #6 8INFO 203

Addressing Processes

Week #6 9

Client processServer process

IP address

Socket Port

InternetTCP or UDP and lower

Layers

TCP or UDP and lower

Layers

Sockets send packets Ports listen for them

INFO 203

Port Number Port numbers follow default values, set by

the IANA, unless specified otherwise 21 = FTP 23 = Telnet 25 = SMTP 53 = DNS 80 = HTTP, http://mine.com implies http://mine.com:80 110 = POP3 194 = IRC, and hundreds more

Week #6 10INFO 203

More Protocols Application-layer protocols define how a

particular application’s processes are structured What types of messages are allowed The syntax of those messages The meaning of the fields in the syntax Rules for processing messages – when and

how to send messages, how to reply, etc.

Week #6 11INFO 203

Application vs its protocols A single application often needs to use

several application-layer protocols A web browser might use HTTP, but also FTP,

telnet, gopher, etc. An email application might use POP3, SMTP,

IMAP, etc. Many app protocols are defined in RFCs

But many application-layer protocols are proprietary

Week #6 12INFO 203

RFC Summary The “Internet Official Protocol Standards”

RFC used to identify the current standards (STD) for every protocol As a result of RFC 7100, that information is on a

website http://www.rfc-editor.org/search/standards.php For example, STD 9 is the standard for FTP

Week #6 13INFO 203

Application Services The transport layer connects the application

layer to everything else Have a choice of two protocols, TCP and

UDP, unless you want to write your own! Key services include

Reliable data transfer – how important is it? Or is your app loss-tolerant?

Week #6 14INFO 203

Application Services How much bandwidth or throughput does your

app need? Does sending rate have to equal receiving rate? Some apps are elastic – can tolerate wide

ranges of available bandwidth How sensitive is your app to timing?

Games and telephony tend to be sensitive to slow or erratic transmission delays

How important is security?

Week #6 15INFO 203

TCP Services TCP provides a connection-oriented service,

where the sockets of the client and server recognize a connection for the duration of the session Connection is duplex – messages can go both

ways at once TCP is highly reliable – the bits leaving one side

all get to the other side, and get put back in the original order

Week #6 16INFO 203

TCP Services TCP also provides congestion control, for benefit

of the Internet This throttles the sending processes when the

connection is congested, and can limit bandwidth TCP does not guarantee any level of

transmission rate, or provide delay guarantees So you’ll get your data across, but we

don’t know when

Week #6 17INFO 203

UDP Services UDP is a lightweight protocol – meaning it

doesn’t do much! UDP is connectionless UDP is unreliable – data may never get there UDP packets may arrive out of order and not

realize it There are no transmission rate guarantees

Week #6 18INFO 203

Services NOT Provided TCP and UDP do not provide guarantees of

throughput or timing TCP does nothing for security per se, but

SSL can be added on See Chapter 7 in INFO 331

Week #6 19INFO 203

Application Protocols We’ll examine protocols for Internet-based

applications HTTP FTP SMTP POP3 IMAP DNS

Week #6 20INFO 203

HTTP The HyperText Transfer Protocol (HTTP)

is the heart of the Web Defined by RFCs 1945 (v1.0) and 2616 (v1.1) Has client and server programs which

communicate via HTTP messages Web pages contain objects – files of various

sorts, such as a base HTML file, which cites JPG and/or GIF images, etc.

App to use HTTP is a browser

Week #6 21INFO 203

HTTP A Web server houses the objects

Apache and Microsoft Internet Information Services (IIS) are common Web server apps

HTTP defines the messages that pass between client and server Uses TCP for transport protocol HTTP has no memory of previous actions (a

stateless protocol) – so if you ask for a file 126 times, it will send the file 126 times

Week #6 22INFO 203

HTTP vs HTML Don’t confuse HTTP with HTML

HTTP is the protocol used to define how files are requested and transferred between server and clients

HTML is the format of web pages So an HTML file might be the structure of

an entity body transferred using HTTP

Week #6 23INFO 203

HTTP Messages HTTP messages are two types, request

messages (from client) and response messages (from server) All HTTP messages are plain ASCII text

‘Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.’ [RFC 2616, para 4.1]

CRLF is a “carriage return and line feed”

Week #6 24INFO 203

HTTP Messages There are many headers which could appear

in requests or responses Cache-Control, Connection, Date, Pragma,

Trailer, Transfer-Encoding, Upgrade, Via, and/or Warning [RFC 2616, para 4.5]

Disclaimer: RFC 2616 is 176 pages long – so we’re just providing a summary!

Week #6 25INFO 203

HTTP Requests Request messages have variable number

of lines, depending on the method called General request syntax is

Method Request-URI HTTP-Version Methods are OPTIONS, GET, HEAD, POST,

PUT, DELETE, TRACE, or CONNECT [RFC 2616, para 5.1.1] Most commonly used is GET

Request-URI is the desired Uniform Resource Identifier (URI, commonly called a URL)

Week #6 26INFO 203

HTTP Requests HTTP-Version is what it sounds like, e.g.

HTTP/1.1 There are many possible request headers

Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, Expect, From, Host, If-Match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization, Range, Referer, TE (extension transfer-codings), and/or User-Agent [RFC 2616, para 5.3]

Week #6 27INFO 203

HTTP Responses HTTP responses go from server to client General syntax starts with

HTTP-Version Status-Code Reason-Phrase[RFC 2616, para 6.1]

The Status-Code could be dozens of values "200" OK "403" Forbidden "404" Not Found

The Reason-Phrase is any text phrase assigned

Week #6 28INFO 203

HTTP Responses Response headers can include

Accept-Ranges, Age, ETag, Location, Proxy-Authenticate, Retry-After, Server, Vary, and/or WWW-Authenticate [RFC 2616, para 6.2]

Responses usually include entities, unless the HEAD method was used

Week #6 29INFO 203

HTTP Entities An entity is the object sent or returned with an

HTTP message Entities can be with requests or responses

Entity headers include Allow, Content-Encoding, Content-Language, Content-Length (bytes), Content-Location, Content-MD5, Content-Range, Content-Type, Expires, Last-Modified, and/or extension-header [RFC 2616, para 7.1] Where extension-header is any allowable

message-header for that kind of message

Week #6 30INFO 203

HTTP So HTTP describes request and response

message formats Both types typically have a first line which

tells its purpose (the request or status line) There can be many header lines There might be an entity attached

Week #6 31INFO 203

FTP The File Transfer Protocol is one of the oldest

Internet applications (now RFC 959, but started as RFC 114 in 1971)

While HTTP and FTP both send files FTP uses two connections – one for control, one for

data (control information is out-of-band) User login and commands are on the control connection, files

move on the data connection HTTP uses one connection for both purposes (control

information is in-band)

Week #6 32INFO 203

FTP FTP uses TCP, and usually connects to the

server on ports 20 and 21 The client sends user ID and password

FTP may be done to some sites with generic ID, known as anonymous FTP

Once logged in, the user may navigate and view directories, and upload (STOR or PUT) or download (RETR or GET) files

Week #6 33INFO 203

Electronic Mail E-mail is another ancient Internet application,

with origins in RFC 772 in 1980 It provides asynchronous text communication

and allows files to be attached to messages Even voice and video messages

Main elements are users (sender and recipient), mail servers, and the Simple Mail Transfer Protocol (SMTP, RFC 5321) Careful, there’s also an SNTP for network time

Week #6 34INFO 203

Electronic Mail Email is composed in a client, which sends it to

a mail queue in the sender’s mail server The sending mail server uses SMTP to send the

message to the recipient’s mail server If mail can’t be sent successfully, the sender’s mail

server will put the message in a queue, and keep trying (typically for 3 days)

The recipient is notified that the message is present, which they read with their client

Week #6 35INFO 203

Electronic Mail Each user has a mailbox on the mail server

Access to the mailbox is controlled with user name and password

SMTP is the main protocol to get email from one mail server to another It uses TCP, not surprisingly Defined in draft standard RFC 5321 Only uses 7-bit ASCII for message AND body

Forces binary files to be converted to ASCII & back

Week #6 36INFO 203

Mail Message Formats Email contains header information defined

by RFC 822, now RFC 5322 “Internet Message Format” The sender headers can include: FROM,

SENDER, REPLY-TO, RESENT-FROM, RESENT-SENDER, and RESENT-REPLY-TO

Receiver headers can be: TO, CC, and BCC Reference headers can be: MESSAGE-ID, IN-

REPLY-TO, REFERENCES and KEYWORDS

Week #6 37INFO 203

MIME Multipurpose Internet Mail Extensions (MIME)

are used for handling non-ASCII contents in email, e.g. non-Latin character sets, binary files, images, audio, video, etc.

MIME (RFC 2045) adds the ability to handle (1) textual message bodies in character sets other

than US-ASCII, (2) an extensible set of different formats for non-textual message bodies, (3) multi-part message bodies, and (4) textual header information in character sets other than US-ASCII.

Week #6 38INFO 203

MIME The received message also includes a Received: header added to the top of the message

This is familiar in email if you look at the full headers

Week #6 39INFO 203

Mail Access Protocols If you log directly into your email server,

SMTP is all you need to handle email But if you wish to access email from a local

host, you need to use a mail access protocol The biggies at present are

Post Office Protocol version 3 (POP3) and Internet Mail Access Protocol (IMAP)

Week #6 40INFO 203

POP3 POP3 is defined in RFC 1939

It’s a pretty simple protocol compared to many SMTP sends mail between mail servers,

and from the user agent (email app) to their mail server

POP3 transfers mail from your mail server to your user agent

From a user’s view, SMTP handles outgoing email, and POP3 handles incoming email

Week #6 41INFO 203

IMAP IMAP, defined in RFC 3501, allows folders to

be defined on the mail server to organize email there Messages are associated with a folder – first the

generic INBOX, then moved by the user Hence state information about the folder for each

message must be saved across sessions IMAP also provides search capability

within the mailbox

Week #6 42INFO 203

DNS A key need, once the Internet grew beyond a

few thousand hosts, was to automate converting human* readable addresses or hostnames (www.microsoft.com) to IP addresses (207.46.198.60) got IP here

That is the purpose of the Domain Name System (DNS) Before DNS, really big lookup tables were used!

Week #6 43

* Humans who read English, at least!INFO 203

Host vs Domain Names A hostname is the name of a particular host

computer, such as banner.drexel.edu May really represent multiple computers, but logically

they are all the same host A domain name is the top level domain and the

specific domain name, like drexel.edu Top level domains are com, edu, gov, mil, org,

net, etc. and the country codes uk, de, fr, etc.

Week #6 44INFO 203

IP Addresses IP addresses have four groups of bytes, each

group from 0 to 255, separated by periods Why called bytes? Each value from 0 to 255

corresponds to a value of from 0 to (28-1), and a byte is eight bits

IP addresses are typically static (fixed) for servers and other semi-permanent Internet connections, and dynamic for temporary connections (e.g. dial-up, wireless)

Week #6 45INFO 203

DNS DNS runs over UDP, port 53 (something uses UDP!)

DNS is managed by DNS servers, typically running Berkeley Internet Name Domain (BIND) software

DNS is used by other applications (HTTP, SMTP, FTP) to translate host names to IP addresses You can also do a reverse DNS lookup (convert

205.188.97.2 to www-vd03.evip.aol.com)

Week #6 46INFO 203

DNS DNS also provides other key services

Host aliasing allows the true or canonical hostname to have aliases When blah.com works to get to www.blah.com, it’s

because blah.com is a host alias of www.blah.com Mail server aliasing – same concept, but for

mail server names Load distribution across many servers for the

same hostname – so everyone in the world doesn’t use one IP address for microsoft.com

Week #6 47INFO 203

DNS Lookup This would be terribly tedious without caching

Common queries are stored on each level of DNS server, so they don’t have to be looked up constantly

Cached values are cleared typically every two days or less, in case the data changes

Week #6 48INFO 203

nslookup The command nslookup provides basic IP

data for a hostname or domain Nslookup snip.net

Server: ns2.snip.net Address: 209.204.64.3 Name: snip.net Address: 216.83.103.123

A registrar makes changes to the DNS database The list of registrars is at http://www.internic.net/

Week #6 49INFO 203

Transport Layer The Transport Layer handles logical

communication between processes It’s the last layer not used between processes for

routing, so it’s the last thing a client process and the first thing a server process sees of a packet

By logical communication, we recognize that the means used to get between processes, and the distance covered, are irrelevant

Week #6 50INFO 203

Transport vs Network Notice we didn’t say ‘hosts’ in the previous

slide…that’s because The network layer provides logical communication

between hosts

Week #6 51INFO 203

Two Choices Here we choose between TCP and UDP

In the transport layer, a packet is a segment In the network layer, a packet is a datagram

The network layer is home to the Internet Protocol (IP) IP provides logical communication between hosts IP makes a “best effort” to get segments where they

belong – no guarantees of delivery, or delivery sequence, or delivery integrity

Week #6 52INFO 203

IP Each host has an IP address Common purpose of UDP and TCP is extend

delivery of IP data to the host’s processes This is called transport-layer multiplexing and

demultiplexing Both UDP and TCP also provide error checking

That’s it for UDP – data delivery and error checking!

Week #6 53INFO 203

TCP TCP also provides reliable data transfer (not

just data delivery) Uses flow control, sequence numbers,

acknowledgements, and timers to ensure data is delivered correctly and in order

TCP also provides congestion control TCP applications share the available bandwidth

(they watched Sesame Street!) UDP takes whatever it can get (greedy little protocol)

Week #6 54INFO 203

Segment Header Hence the segment header starts with the

source and destination port numbers Each port number is a 16-bit (2 byte) value

(0 to 65,535) Well known port numbers are from 0 to 1023 (210 -

1) After the port numbers are other headers,

specific to TCP or UDP, then the message

Week #6 55INFO 203

UDP The most minimal transport layer has to

do multiplexing and demultiplexing UDP does this and a little error checking

and, well, um, that’s about it! UDP was defined in RFC 768 An app that uses UDP almost talks directly to IP Adds only two small data fields to the header, after the

requisite source/destination addresses There’s no handshaking; UDP is connectionless

Week #6 56INFO 203

UDP for DNS DNS uses UDP A DNS query is packaged into a segment,

and is passed to the network layer The DNS app waits for a response; if it doesn’t

get one soon enough (times out), it tries another server or reports no reply

Hence the app must allow for the unreliability of UDP, by planning what to do if no response comes back

Week #6 57INFO 203

UDP Advantages Still UDP is good when:

You want the app to have detailed control over what is sent across the network; UDP changes it little

No connection establishment delay No connection state data in the end hosts; hence a

server can support more UDP clients than TCP Small packet header overhead per segment

TCP uses 20 bytes of header data, UDP only 8 bytes

Week #6 58INFO 203

UDP Apps Other than DNS, UDP is also used for

Network management (SNMP) Routing (RIP) Multimedia & telephony (proprietary protocols) Remote file server (NFS)

The lack of congestion control in UDP can be a problem when lost of large UDP messages are being sent – can crowd out TCP apps

Week #6 59INFO 203

Checksum Noise in the transmission lines can lose

bits of data or rearrange them in transit Checksums are a common method to

detect errors (RFC 1071) To create a checksum:

Find the sum of the binary digits of the message The checksum is the 1s (ones) complement of

the sum If message is uncorrupted, sum of message plus

checksum is all ones 1111111111111…Week #6 60INFO 203

1s Complement? The 1s complement is a mirror image of a

binary number – change all the zeros to ones, and ones to zeros So the 1s complement of 00101110101 is

11010001010 UDP does error checking because not all

lower layer protocols do error checking This provides end-to-end error checking, since it’s

more efficient than every step along the way

Week #6 61INFO 203

Reliable Data Transfer Mechanisms

Checksum, to detect bit errors in a packet Timer, to know when a packet or its ACK was lost Sequence number, to detect lost or duplicate packets Acknowledgement, to know packet got to receiver

correctly Negative acknowledgement, to tell packet was

corrupted but received Window, to pipeline many packets at once before an

ACK was received for any of them

Week #6 62INFO 203

TCP Intro Now see how all this applies to TCP

First in RFC 793, now RFC 2581 Invented circa 1974 by Vint Cerf and Robert Kahn

TCP starts with a handshake protocol, which defines many connection variables Connection only at hosts, not in between Routers are oblivious to whether TCP is used!

TCP is a full duplex service – data can flow both directions at once, and is connection-oriented

Week #6 63INFO 203

TCP Segment Structure A TCP segment consists of header fields

and a data field The data field size is limited by the MSS

Typical header size is 20 bytes The header is 32 bits wide (4 bytes), so it has

five lines at a minimum

Week #6 64INFO 203

TCP Header Structure The header lines are

Source and destination port numbers (16 bit ea.) Sequence number (32 bit) ACK number (32 bit) A bunch of little stuff (header length, URG, ACK, PSH,

RST, SYN, and FIN bits), then the receive window (16 bit)

Internet checksum, urgent data pointer (16 bit ea.) And possibly several options

Week #6 65INFO 203

TCP Segment Structure We’ve seen the port numbers (16 bits each) Sequence and ACK numbers (32 bits each)

keep track of pieces of a file The ‘bunch of little stuff’ includes

Header length (4 bits) A flag field includes six one-bit fields: ACK, RST, SYN,

FIN, PSH, and URG The URG bit marks urgent data later on that line

The receive window is used for flow control

Week #6 66INFO 203

TCP Segment Structure The checksum is used for bit error detection,

as with UDP The urgent data pointer tells where the urgent

data is located The options include negotiating the MSS,

scaling the window size, or time stamping

Week #6 67INFO 203

Telnet Example Telnet (RFC 854) is an old app for remote

login via TCP Telnet interactively echoes whatever was

typed to show it got to the other side

Week #6 68INFO 203

Timeout Calculation We want the timeout interval larger than

EstimatedRTT, but not huge; use TimeoutInterval = EstimatedRTT + 4*DevRTT EstimatedRTT is a running average RTT DevRTT is a running standard deviation for RTT

Timeout interval is constantly being calculated, with frequent measurement of SampleRTT to find current values for: Estimated RTT, DevRTT, & TimeoutInterval

Week #6 69INFO 203

Flow Control TCP connection hosts maintain a receive

buffer, for bytes received correctly and in order Apps might not read from the buffer for a while, so

it can overflow Flow control focuses on preventing overflow

of the receive buffer So it also depends on how fast the receiving

app is reading the data!

Week #6 70INFO 203

Flow Control Hence the sender in TCP maintains a receive

window (RcvWindow) variable – how much room is left in the receive buffer The amount of room in RcvWindow is returned

to the sender in the receive window field of every segment

If the RcvWindow goes to zero, the sender can’t send more data to the receiver ever!

To prevent this, TCP makes the sender transmit one byte messages when RcvWindow is zero,

Week #6 71INFO 203

UDP Flow Control There ain’t none (sic!) UDP adds newly arrived segments to a buffer

in front of the receiving socket If the buffer gets full, segments are dropped Bye-bye data!

Week #6 72INFO 203

Congestion Control Now address congestion control issues

Congestion is a traffic jam in the middle of the network somewhere

Most common cause is too many sources sending data too fast into the network

Key lessons are: A congested network forces retransmissions for

packets lost due to buffer overflow, which adds to the congestion

Week #6 73INFO 203

Congestion Control And:

A congested network can waste its bandwidth by sending duplicate packets which weren’t lost in the first place

Dropping a packet wastes the transmission capacity of every upstream link that packet saw

If loss and transmission delay are small, CongWin bytes of data can be sent every RTT, for a send rate of CongWin/RTT

Week #6 74INFO 203

Fairness Unequal connections are less fair

Lower RTT gets more bandwidth (CongWin increases faster)

UDP traffic can force out the more polite TCP traffic

Multiple TCP connections from a single host (e.g. from downloading many parts of a Web page at once) get more bandwidth

Week #6 75INFO 203

Are We Done Yet? So we’ve covered transport layer protocols

from the terribly simple UDP to a seemingly exhaustive study of TCP Key features along the way include

multiplexing/demultiplexing, error detection, acknowledgements, timers, retransmissions, sequence numbers, connection management, flow control, end-to-end congestion control

So much for the “edge” of the Internet; next is the network layer, to start looking at the core

Week #6 76INFO 203

INFO 203Week #61 IT For Engineers Application and Transport Layers INFO 203 Dr. Jennifer Booker.

Documents

Transcript of INFO 203Week #61 IT For Engineers Application and Transport Layers INFO 203 Dr. Jennifer Booker.