Book for Internet Technology

Information Technology for B.Sc. IT Semester V Page 2

Chapter Particulars Page

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19

20 21 22 23

24 25 26 27 28 29

Part 1 Networking Basic Networking: Network Protocols: TCP / IP (Transmission Control / Internet protocol) ARP (Address Resolution Protocol) RARP (Reverse Address Resolution Protocol) RIP (Routing Information Protocol) OSPF (Open Shortest Path First) Protocol BGP (Border Gateway Protocol) Part 2 Network Programming Introduction to Network Programming Socket Programming (using TCP and UDP socket) Part 3 Remote Method Invocation RMI Introduction to Distributed Computing with RMI RMI Architecture Naming remote Object Using RMI: Interfaces, Implementations, Stub, Skeleton, Host Server Client, Running RMI Systems Parameters in RMI: Primitive, Object, Remote Object RMI Client – side Callbacks Distributing & Installing RMI Software Part 4 CORBA Introduction to CORBA What is CORBA? CORBA Architecture Comparison between RMI and CORBA Part 5 Wireless LAN Introduction to Wireless LAN How does WLAN work? WLAN setups (Ad-hoc, infrastructure LAN) Use of WLAN Benefits of WLAN Restrictions and Problems with WLAN

Chapter 1 Basic Networking


Chapter 1 Basic Networking:

• Data communication is the transfer of data from one device to another via some form of transmission medium.

• A data communications system must transmit data to the correct destination in an accurate and timely manner.

• The five components that make up a data communications system are the message, sender, receiver, medium, and protocol.

• Text, numbers, images, audio, and video are different forms of information. • Data flow between two devices can occur in one of three ways: simplex, half-

duplex, or full-duplex. • A network is a set of communication devices connected by media links. • In a point-to-point connection, two and only two devices are connected by a

dedicated link. In a multipoint connection, three or more devices share a link. • Topology refers to the physical or logical arrangement of a network. Devices may

be arranged in a mesh, star, bus, or ring topology. • A network can be categorized as a local area network (LAN), a metropolitan-area

network (MAN), or a wide area network (WAN). • A LAN is a data communication system within a building, plant, or campus, or

between nearby buildings. • A MAN is a data communication system covering an area the size of a town or

city. • A WAN is a data communication system spanning states, countries, or the whole

world. • An internet is a network of networks. • The Internet is a collection of many separate networks. • TCP/IP is the protocol suite for the Internet. • There are local, regional, national, and international Internet service providers

(ISPs). • A protocol is a set of rules that governs data communication; the key elements of

a protocol are syntax, semantics, and timing. • Standards are necessary to ensure that products from different manufacturers can

work together as expected. • The ISO, ITU-T, ANSI, IEEE, and EIA are some of the organizations involved in

standards creation. • Forums are special-interest groups that quickly evaluate and standardize new

technologies. • A Request for Comment (RFC) is an idea or concept that is a precursor to an

Internet standard.



Network Models

• The five-layer model provides guidelines for the development of universally compatible networking protocols.

• The physical, data link, and network layers are the network support layers. • The application layer is the user support layer. • The transport layer links the network support layers and the user support layer. • The physical layer coordinates the functions required to transmit a bit stream over

a physical medium. • The data link layer is responsible for delivering data units from one station to the

next without errors. • The network layer is responsible for the source-to-destination delivery of a packet

across multiple network links. • The transport layer is responsible for the process-to-process delivery of the entire

message. • The application layer enables the users to access the network.

Signals

• Data must be transformed into electromagnetic signals prior to transmission across a network.

• Data and signals can be either analog or digital. • A signal is periodic if it consists of a continuously repeating pattern. • Each sine wave can be characterized by its amplitude, frequency, and phase. • Frequency and period are inverses of each other. • A time-domain graph plots amplitude as a function of time. • A frequency-domain graph plots each sine wave’s peak amplitude against its

frequency. • By using Fourier analysis, any composite signal can be represented as a

combination of simple sine waves. • The spectrum of a signal consists of the sine waves that make up the signal. • The bandwidth of a signal is the range of frequencies the signal occupies. • Bandwidth is determined by finding the difference between the highest and lowest

frequency components. • Bit rate (number of bits per second) and bit interval (duration of 1 bit) are terms

used to describe digital signals. • A digital signal is a composite signal with an infinite bandwidth. • Bit rate and bandwidth are proportional to each other. • The Nyquist formula determines the theoretical data rate for a noiseless channel. • The Shannon capacity determines the theoretical maximum data rate for a noisy

channel. • Attenuation, distortion, and noise can impair a signal.



• Attenuation is the loss of a signal’s energy due to the resistance of the medium. • The decibel measures the relative strength of two signals or a signal at two

different points. • Distortion is the alteration of a signal due to the differing propagation speeds of

each of the frequencies that make up a signal. • Noise is the external energy that corrupts a signal. • We can evaluate transmission media by throughput, propagation speed, and

propagation time. • The wavelength of a frequency is defined as the propagation speed divided by the

frequency. Encoding and Modulation

• Line coding is the process of converting binary data to a digital signal. • The number of different values allowed in a signal is the signal level. The number

of symbols that represent data is the data level. • Bit rate is a function of the pulse rate and data level. • Line coding methods must eliminate the dc component and provide a means of

synchronization between the sender and the receiver. • Line coding methods can be classified as unipolar, polar, or bipolar. • NRZ, RZ, Manchester, and differential Manchester encoding are the most popular

polar encoding methods. • AMI is a popular bipolar encoding method. • Block coding can improve the performance of line coding through redundancy

and error correction. • Block coding involves grouping the bits, substitution, and line coding. • 4B/5B, 8B/10B, and 8B/6T are common block coding methods. • Analog-to-digital conversion relies on PCM (pulse code modulation). • PCM involves sampling, quantizing, and line coding. • The Nyquist theorem says that the sampling rate must be at least twice the

highest-frequency component in the original signal. • Digital transmission can be either parallel or serial in mode. • In parallel transmission, a group of bits is sent simultaneously, with each bit on a

separate line. • In serial transmission, there is only one line and the bits are sent sequentially. • Serial transmission can be either synchronous or asynchronous. • In asynchronous serial transmission, each byte (group of 8 bits) is framed with a

start bit and a stop bit. There may be a variable-length gap between each byte. • In synchronous serial transmission, bits are sent in a continuous stream without

start and stop bits and without gaps between bytes. Regrouping the bits into meaningful bytes is the responsibility of the receiver.



Analog Transmission

• Digital-to-analog modulation can be accomplished using the following: *Amplitude shift keying (ASK)—the amplitude of the carrier signal varies. *Frequency shift keying (FSK)—the frequency of the carrier signal varies. *Phase shift keying (PSK)—the phase of the carrier signal varies. *Quadrature amplitude modulation (QAM)—both the phase and amplitude of the carrier signal vary.

• QAM enables a higher data transmission rate than other digital-to-analog methods.

• Baud rate and bit rate are not synonymous. Bit rate is the number of bits transmitted per second. Baud rate is the number of signal units transmitted per second. One signal unit can represent one or more bits.

• The minimum required bandwidth for ASK and PSK is the baud rate. • The minimum required bandwidth (BW) for FSK modulation is BW =f c1 .f c0 +

N baud , where f c1 is the frequency representing a 1 bit, f c0 is the frequency representing a 0 bit, and N baud is the baud rate.

• A regular telephone line uses frequencies between 600 and 3000 Hz for data communication.

• ASK modulation is especially susceptible to noise. • Because it uses two carrier frequencies, FSK modulation requires more bandwidth

than ASK and PSK. • PSK and QAM modulation have two advantages over ASK:

*They are not as susceptible to noise. *Each signal change can represent more than one bit.

• Trellis coding is a technique that uses redundancy to provide a lower error rate. • The 56K modems are asymmetric; they download at a rate of 56 Kbps and upload

at 33.6 Kbps. • Analog-to-analog modulation can be implemented by using the following:

Amplitude modulation (AM) Frequency modulation (FM) Phase modulation (PM)

• In AM radio, the bandwidth of the modulated signal must be twice the bandwidth of the modulating signal.

• In FM radio, the bandwidth of the modulated signal must be 10 times the bandwidth of the modulating signal.

• Multiplexing Multiplexing is the simultaneous transmission of multiple signals across a single data link.



• Frequency-division multiplexing (FDM) and wave-division multiplexing (WDM) are techniques for analog signals, while time-division multiplexing (TDM) is for digital signals.

• In FDM, each signal modulates a different carrier frequency. The modulated carriers are combined to form a new signal that is then sent across the link.

• In FDM, multiplexers modulate and combine signals while demultiplexers decompose and demodulate.

• In FDM, guard bands keep the modulated signals from overlapping and interfering with one another.

• Telephone companies use FDM to combine voice channels into successively larger groups for more efficient transmission.

• Wave-division multiplexing is similar in concept to FDM. The signals being multiplexed, however, are light waves.

• In TDM, digital signals from n devices are interleaved with one another, forming a frame of data (bits, bytes, or any other data unit).

• Framing bits allow the TDM multiplexer to synchronize properly. • Digital signal (DS) is a hierarchy of TDM signals. • T lines (T-1 to T-4) are the implementation of DS services. A T-1 line consists of

24 voice channels. • T lines are used in North America. The European standard defines a variation

called E lines. • Inverse multiplexing splits a data stream from one high-speed line onto multiple

lower-speed lines.

Transmission Media

• Transmission media lie below the physical layer. • A guided medium provides a physical conduit from one device to another. • Twisted-pair cable, coaxial cable, and optical fiber are the most popular types of

guided media. • Twisted-pair cable consists of two insulated copper wires twisted together.

Twisting allows each wire to have approximately the same noise environment. • Twisted-pair cable is used in telephone lines for voice and data communications. • Coaxial cable has the following layers (starting from the center): a metallic rod-

shaped inner conductor, an insulator covering the rod, a metallic outer conductor (shield), an insulator covering the shield, and a plastic cover.

• Coaxial cable can carry signals of higher frequency ranges than twisted-pair cable.

• Coaxial cable is used in cable TV networks and traditional Ethernet LANs. • Fiber-optic cables are composed of a glass or plastic inner core surrounded by

cladding, all encased in an outside jacket.



• Fiber-optic cables carry data signals in the form of light. The signal is propagated along the inner core by reflection.

• Fiber-optic transmission is becoming increasingly popular due to its noise resistance, low attenuation, and high-bandwith capabilities.

• Signal propagation in optical fibers can be multimode (multiple beams from a light source) or single-mode (essentially one beam from a light source).

• In multimode step-index propagation, the core density is constant and the light beam changes direction suddenly at the interface between the core and the cladding.

• In multimode graded-index propagation, the core density decreases with distance from the center. This causes a curving of the light beams.

• Fiber-optic cable is used in backbone networks, cable TV networks, and Fast Ethernet networks.

• Unguided media (usually air) transport electromagnetic waves without the use of a physical conductor.

• Wireless data is transmitted through ground propagation, sky propagation, and line-of-sight propagation.

• Wireless data can be classifed as radio waves, microwaves, or infrared waves. • Radio waves are omnidirectional. The radio wave band is under government

regulation. • Microwaves are unidirectional; propagation is line of sight. Microwaves are used

for cellular phone, satellite, and wireless LAN communications. • The parabolic dish antenna and the horn antenna are used for transmission and

reception of microwaves. • Infrared waves are used for short-range communications such as those between a

PC and a peripheral device. Error Detection and Correction Access Method

• Errors can be categorized as a single-bit error or a burst error. A single-bit error has one bit error per data unit. A burst error has two or more bit errors per data unit.

• Redundancy is the concept of sending extra bits for use in error detection. • Three common redundancy methods are parity check, cyclic redundancy check

(CRC), and checksum. • An extra bit (parity bit) is added to the data unit in the parity check. • The parity check can detect only an odd number of errors; it cannot detect an even

number of errors. • In the two-dimensional parity check, a redundant data unit follows n data units. • CRC, a powerful redundancy checking technique, appends a sequence of

redundant bits derived from binary division to the data unit.



• The divisor in the CRC generator is often represented as an algebraic poly-nomial.

• Errors are corrected through retransmission and by forward error correction. • The Hamming code is an error correction method using redundant bits. The

number of bits is a function of the length of the data bits. • In the Hamming code, for a data unit of m bits, use the formula 2 r >= m +r +1 to

determine r, the number of redundant bits needed. • By rearranging the order of bit transmission of the data units, the Hamming code

can correct burst errors. Data Link Controls and Protocols

• Flow control is the regulation of the sender’s data rate so that the receiver buffer does not become overwhelmed.

• Error control is both error detection and error correction. • In Stop-and-Wait ARQ, the sender sends a frame and waits for an

acknowledgment from the receiver before sending the next frame. • In Go-Back-N ARQ, multiple frames can be in transit at the same time. If there is

an error, retransmission begins with the last unacknowledged frame even if subsequent frames have arrived correctly. Duplicate frames are discarded.

• In Selective Repeat ARQ, multiple frames can be in transit at the same time. If there is an error, only the unacknowledged frame is retransmitted.

• Flow control mechanisms with sliding windows have control variables at both sender and receiver sites.

• Piggybacking couples an acknowledgment with a data frame. • The bandwidth-delay product is a measure of the number of bits a system can

have in transit. • HDLC is a protocol that implements ARQ mechanisms. It supports

communication over point-to-point or multipoint links. • HDLC stations communicate in normal response mode (NRM) or asynchronous

balanced mode (ABM). • HDLC protocol defines three types of frames: the information frame (I-frame),

the supervisory frame (S-frame), and the unnumbered frame (U-frame). • HDLC handle data transparency by adding a 0 whenever there are five

consecutive 1s following a 0. This is called bit stuffing.

Chapter 2 Network Protocols



Point to Point Access PPP • The Point-to-Point Protocol (PPP) was designed to provide a dedicated line for

users who need Internet access via a telephone line or a cable TV connection. • A PPP connection goes through these phases: idle, establishing, authenticating

(optional), networking, and terminating. • At the data link layer, PPP employs a version of HDLC. • The Link Control Protocol (LCP) is responsible for establishing, maintaining,

configuring, and terminating links. • Password Authentication Protocol (PAP) and Challenge Handshake

Authentication Protocol (CHAP) are two protocols used for authentication in PPP. • PAP is a two-step process. The user sends authentication identification and a

password. The system determines the validity of the information sent. • CHAP is a three-step process. The system sends a value to the user. The user

manipulates the value and sends its result. The system verifies the result. • Network Control Protocol (NCP) is a set of protocols to allow the encapsulation

of data coming from network layer protocols; each set is specific for a network layer protocol that requires the services of PPP.

• Internetwork Protocol Control Protocol (IPCP), an NCP protocol, establishes and terminates a network layer connection for IP packets.

Multiple Accesses

• Medium access methods can be categorized as random, controlled, or channelized.

• In the carrier sense multiple-access (CSMA) method, a station must listen to the medium prior to sending data onto the line.

• A persistence strategy defines the procedure to follow when a station senses an occupied medium.

• Carrier sense multiple access with collision detection (CSMA/CD) is CSMA with a postcollision procedure.

• Carrier sense multiple access with collision avoidance (CSMA/CA) is CSMA with procedures that avoid a collision.

• Reservation, polling, and token passing are controlled-access methods. • In the reservation access method, a station reserves a slot for data by setting its

flag in a reservation frame. • In the polling access method, a primary station controls transmissions to and from

secondary stations. • In the token-passing access method, a station that has control of a frame called a

token can send data.



• Channelization is a multiple-access method in which the available bandwidth of a link is shared in time, frequency, or through code, between stations on a network.

• FDMA, TDMA, and CDMA are channelization methods. • In FDMA, the bandwith is divided into bands; each band is reserved fro the use of

a specific station. • In TDMA, the bandwidth is not divided into bands; instead the bandwidth is

timeshared. • In CDMA, the bandwidth is not divided into bands, yet data from all inputs are

transmitted simultaneously. • CDMA is based on coding theory and uses sequences of numbers called chips.

The sequences are generated using Walsh tables. Host to Host Delivery Internetworking Addressing and Routing

• There are two popular approaches to packet switching: the datagram approach and the virtual circuit approach.

• In the datagram approach, each packet is treated independently of all other packets.

• At the network layer, a global addressing system that uniquely identifies every host and router is necessary for delivery of a packet from network to network.

• The Internet address (or IP address) is 32 bits (for IPv4) that uniquely and universally defines a host or router on the internet.

• The portion of the IP address that identifies the network is called the netid. • The portion of the IP address that identifies the host or router on the network

is called the hostid. • There are five classes of IP addresses. Classes A, B, and C differ in the

number of hosts allowed per network. Class D is for multicasting, and class E is reserved.

• The class of a network is easily determined by examination of the first byte. • Unicast communication is one source sending a packet to one destination. • Multicast communication is one source sending a packet to multiple

destinations. • Sub-netting divides one large network into several smaller ones. • Sub-netting adds an intermediate level of hierarchy in IP addressing. • Default masking is a process that extracts the network address from an IP

address. • Subnet masking is a process that extracts the sub-network address from an IP

address • Super-netting combines several networks into one large one. • In classless addressing, there are variable-length blocks that belong to no

class. The entire address space is divided into blocks based on organization needs.



• The first address and the mask in classless addressing can define the whole block.

• A mask can be expressed in slash notation which is a slash followed by the number of 1s in the mask.

• Every computer attached to the Internet must know its IP address, the IP address of a router, the IP address of a name server, and its subnet mask (if it is part of a subnet).

• DHCP is a dynamic configuration protocol with two databases. • The DHCP server issues a lease for an IP address to a client for a specific

period of time. • Network address translation (NAT) allows a private network to use a set of

private addresses for internal communication and a set of global Internet addresses for external communication.

• NAT uses translation tables to route messages. • The IP protocol is a connectionless protocol. Every packet is independent and

has no relationship to any other packet. • Every host or router has a routing table to route IP packets. • In next-hop routing, instead of a complete list of the stops the packet must

make, only the address of the next hop is listed in the routing table. • In network-specific routing, all hosts on a network share one entry in the

routing table. • In host-specific routing, the full IP address of a host is given in the routing

table. • In default routing, a router is assigned to receive all packets with no match in

the routing table. • A static routing table's entries are updated manually by an administrator. • Classless addressing requires hierarchial and geographic routing to prevent

immense routing tables. There are two popular approaches to packet switching: the datagram approach and the virtual circuit approach.

• In the datagram approach, each packet is treated independently of all other packets.

• At the network layer, a global addressing system that uniquely identifies every host and router is necessary for delivery of a packet from network to network.

• The Internet address (or IP address) is 32 bits (for IPv4) that uniquely and universally defines a host or router on the internet.

• The portion of the IP address that identifies the network is called the netid. • The portion of the I address that identifies the host or router on the network is

called the hostid. • There are five classes of IP addresses. Classes A, B, and C differ in the

number of hosts allowed per network. Class D is for multicasting, and class E is reserved.

• The class of a network is easily determined by examination of the first byte. • Unicast communication is one source sending a packet to one destination.



• Multicast communication is one source sending a packet to multiple destinations.

• Sub-netting divides one large network into several smaller ones. • Sub-netting adds an intermediate level of hierarchy in IP addressing. • Default masking is a process that extracts the network address from an IP

address. • Subnet masking is a process that extracts the sub-network address from an IP

address • Super-netting combines several networks into one large one. • In classless addressing, there are variable-length blocks that belong to no

class. The entire address space is divided into blocks based on organization needs. • The first address and the mask in classless addressing can define the whole

block. • A mask can be expressed in slash notation which is a slash followed by the

number of 1s in the mask. • Every computer attached to the Internet must know its IP address, the IP

address of a router, the IP address of a name server, and its subnet mask (if it is part of a subnet).

• DHCP is a dynamic configuration protocol with two databases. • The DHCP server issues a lease for an IP address to a client for a specific

period of time. • Network address translation (NAT) allows a private network to use a set of

private addresses for internal communication and a set of global Internet addresses for external communication.

• NAT uses translation tables to route messages. • The IP protocol is a connectionless protocol. Every packet is independent and

has no relationship to any other packet. • Every host or router has a routing table to route IP packets. • In next-hop routing, instead of a complete list of the stops the packet must

make, only the address of the next hop is listed in the routing table. • In network-specific routing, all hosts on a network share one entry in the

routing table. • In host-specific routing, the full IP address of a host is given in the routing

table. • In default routing, a router is assigned to receive all packets with no match in

the routing table. • A static routing table's entries are updated manually by an administrator. • Classless addressing requires hierarchical and geographic routing to prevent

immense routing tables.



• Network Layer Protocols ARP IPV4 and IPV6 The Address Resolution Protocol (ARP) is a dynamic mapping method that finds a physical address, given an IP address.

• An ARP request is broadcast to all devices on the network. • An ARP reply is unicast to the host requesting the mapping. • IP is an unreliable connectionless protocol responsible for source-to-destination

delivery. • Packets in the IP layer are called datagrams. • A datagram consists of a header (20 to 60 bytes) and data. • The MTU is the maximum number of bytes that a data link protocol can

excapsulate. MTUs vary from protocol to protocol. • Fragmentation is the division of a datagram into smaller units to accommodate the

MTU of a data link protocol. • The fields in the IP header that relate to fragmentation are the identification

number, the fragmentation flags, and the fragmentation offset. • The Internet Control Message Protocol (ICMP) sends five types of error-reporting

messages and four pairs of query messages to support the unreliable and connectionless Internet Protocol (IP).

• ICMP messages are encapsulated in IP datagrams. • The destination-unreachable error message is sent to the source host when a

datagram is undeliverable. • The source-quench error message is sent in an effort to alleviate congestion. • The time-exceeded message notifies a source host that (1) the time-to-live field

has reached zero or (2) fragments of a message have not arrived in a set amount of time.

• The parameter-problem message notifies a host that there is a problem in the header field of a datagram.

• The redirection message is sent to make the routing table of a host more effective. • The echo-request and echo-reply messages test the connectivity between two

systems. • The time-stamp-request and time-stamp-reply messages can determine the

roundtrip time between two systems or the difference in time between two systems.

• The address-mask request and address-mask reply messages are used to obtain the subnet mask.

• The router-solicitation and router-advertisement messages allow hosts to update their routing tables.

• IPv6, the latest verstion of the Internet Protocol, has a 128-bit address space, a resource allocation, and increased security measures.

• IPv6 uses hexadecimal colon notation with abbreviation methods available. • Three strategies used to make the transition from version 4 to version 6 are dual

stack, tunneling, and header translation.

Chapter 3 TCP/IP (Transmission Control Protocol / Internet Protocol)


TCP/IP (Transmission Control Protocol / Internet Protocol)

• UDP and TCP are transport-layer protocols that create a process-to-process communication.

• UDP is an unreliable and connectionless protocol that requires little overhead and offers fast delivery.

• In the client-server paradigm, an application program on the local host, called the client, needs services from an application program on the remote host, called a server.

• Each application program has a unique port number that distinguishes it from other programs running at the same time on the same machine.

• The client program is assigned a random port number called the ephemeral port number.

• The server program is assigned a universal port number called a well-known port number.

• The combination of the IP address and the port number, called the socket address, uniquely defines a process and a host.

• The UDP packet is called a user datagram. • UDP has no flow control mechanism. • Transmission Control Protocol (TCP) is a connection-oriented, reliable, stream

transport-layer protocol in the Internet model. • The unit of data transfer between two devices using TCP software is called a

segment; it has 20 to 60 bytes of header, followed by data from the application program.

• TCP uses a sliding window mechanism for flow control. • Error detection is handled in TCP by the checksum, acknowledgment, and time-

out. • Corrupted and lost segments are retransmitted, and duplicate segments are

discarded. • TCP uses four timers—retransmission, persistence, keep-alive, and time-waited—

in its operation. • Connection establishment requires three steps; connection termination normally

requires four steps. • TCP software is implemented as a finite state machine. • The TCP window size is determined by the receiver.

Chapter 4 ARP Address Resolution Protocol


Chapter 4 Address Resolution Protocol (ARP) In computer networking, the Address Resolution Protocol (ARP) is the method for finding a host's hardware address when only its network layer address is known. Due to the overwhelming prevalence of IPv4 and Ethernet, ARP is primarily used to translate IP addresses to Ethernet MAC addresses. It is also used for IP over other LAN technologies, such as Token Ring, FDDI, or IEEE 802.11, and for IP over ATM. ARP is used in four cases of two hosts communicating:

1. When two hosts are on the same network and one desires to send a packet to the other

2. When two hosts are on different networks and must use a gateway/router to reach the other host

3. When a router needs to forward a packet for one host through another router

4. When a router needs to forward a packet from one host to the destination host on the same network

The first case is used when two hosts are on the same physical network (that is, they can directly communicate without going through a router). The last three cases are the most used over the Internet as two computers on the internet are typically separated by more than 3 hops. Imagine computer A sends a packet to computer D and there are two routers, B & C, between them. Case 2 covers A sending to B; case 3 covers B sending to C; and case 4 covers C sending to D. ARP is defined in RFC 826. Variants of the ARP protocol

1. ARP was not originally designed as an IP-only protocol although today it is primarily used to map IP addresses to MAC addresses.

2. ARP can be used to resolve MAC addresses to many different Layer 3 protocols addresses. ARP has also been adapted to resolve other kinds of Layer 2 addresses; for example, ATMARP is used to resolve ATM NSAP addresses in the Classical IP over ATM protocol.

ARP Mediation ARP Mediation refers to the process of resolving Layer 2 addresses when different resolution protocols are used on either circuit, for e.g. ATM on one end and Ethernet on the other. Inverse ARP The Inverse Address Resolution Protocol, also known as Inverse ARP or InARP, is a protocol used for obtaining Layer 3 addresses (e.g. IP addresses) of other stations from



Layer 2 addresses (e.g. the DLCI in Frame Relay networks). It is primarily used in Frame Relay and ATM networks, where Layer 2 addresses of virtual circuits are sometimes obtained from Layer 2 signalling, and the corresponding Layer 3 addresses must be available before these virtual circuits can be used.. Comparison between ARP and InARP ARP translates Layer 3 addresses to Layer 2 addresses, therefore InARP can be viewed as its inverse. In addition, InARP is actually implemented as an extension to ARP. The packet formats are the same, only the operation code and the filled fields differ. Reverse ARP (RARP), like InARP, also translates Layer 2 addresses to Layer 3 addresses. However, RARP is used to obtain the Layer 3 address of the requesting station itself, while in InARP the requesting station already knows its own Layer 2 and Layer 3 addresses, and it is querying the Layer 3 address of another station. RARP has since been abandoned in favor of BOOTP which was subsequently replaced by DHCP. Packet structure The following is the packet structure used for ARP requests and replies. On Ethernet networks, these packets use an EtherType of 0x0806, and are sent to the broadcast MAC address of FF:FF:FF:FF:FF:FF. Note that the packet structure shown in the table has SHA, SPA, THA, & TPA as 32-bit words but this is just for convenience — their actual lengths are determined by the hardware & protocol length fields. + Bits 0 - 7 8 - 15 16 - 31 0 Hardware type (HTYPE) Protocol type (PTYPE) 32 Hardware length (HLEN) Protocol length (PLEN) Operation (OPER) 64 Sender hardware address (SHA) ? Sender protocol address (SPA) ? Target hardware address (THA) ? Target protocol address (TPA) Hardware type (HTYPE) Each data link layer protocol is assigned a number used in this field. For example, Ethernet is 1. Protocol type (PTYPE) Each protocol is assigned a number used in this field. For example, IPv4 is 0x0800. Hardware length (HLEN) Length in bytes of a hardware address. Ethernet addresses are 6 bytes long. Protocol length (PLEN) Length in bytes of a logical address. IPv4 address are 4 bytes long. Operation Specifies the operation the sender is performing: 1 for request, and 2 for reply. Sender hardware address (SHA) Hardware address of the sender. Sender protocol address (SPA) Protocol address of the sender.



Target hardware address (THA) Hardware address of the intended receiver. This field is zero on request. Target protocol address (TPA) Protocol address of the intended receiver. Request + Bits 0 - 7 8 - 15 16 - 31 0 Hardware type = 1 Protocol type = 0x0800 32 Hardware length = 6 Protocol length = 4 Operation = 1 64 SHA (first 32 bits) = 0x000958D8 96 SHA (last 16 bits) = 0x1122 SPA (first 16 bits) = 0x0A0A 128 SPA (last 16 bits) = 0x0A7B THA (first 16 bits) = 0x0000 160 THA (last 32 bits) = 0x00000000 192 TPA = 0x0A0A0A8C If a host with IPv4 address of 10.10.10.123 and MAC address of 00:09:58:D8:11:22 wants to send a packet to another host at 10.10.10.140 but it does not know the MAC address then it must send an ARP request to discover the address. The packet shown shows what would be broadcast over the local network. If the host 10.10.10.140 is running and available then it would receive the ARP request and send the appropriate reply. Reply + Bits 0 - 7 8 - 15 16 - 31 0 Hardware type = 1 Protocol type = 0x0800 32 Hardware length = 6 Protocol length = 4 Operation = 2 64 SHA (first 32 bits) = 0x000958D8 96 SHA (last 16 bits) = 0x33AA SPA (first 16 bits) = 0x0A0A 128 SPA (last 16 bits) = 0x0A8C THA (first 16 bits) = 0x0009 160 THA (last 32 bits) = 0x58D81122 192 TPA = 0x0A0A0A7B Given the scenario laid out in the request section, if the host 10.10.10.140 has a MAC address of 00:09:58:D8:33:AA then it would send the shown reply packet. Note that the sender and target address blocks have been swapped (the sender of the reply is the target of the request; the target of the reply is the sender of the request). Furthermore the host 10.10.10.140 has filled in its MAC address in the sender hardware address. Any hosts on the same network as these two hosts would also see the request (since it is a BroadCast) so they are able to cache information about the source of the request. The ARP reply (if any) is directed only to the originator of the request so information in the ARP reply is not available to other hosts on the same network. ARP Announcements An ARP announcement (also known as "Gratuitous ARP") is a packet (usually an ARP Request [1]) containing a valid SHA and SPA for the host which sent it, with TPA equal



to SPA. Such a request is not intended to solicit a reply, but merely updates the ARP caches of other hosts which receive the packet. This is commonly done by many operating systems on startup, and helps to resolve problems which would otherwise occur if, for example, a network card had recently been changed (changing the IP address to MAC address mapping) and other hosts still had the old mapping in their ARP cache. ARP announcements are also used for 'defending' IP addresses in the RFC3927 (Zeroconf) protocol. Abstract The implementation of protocol P on a sending host S decides, through protocol P's routing mechanism, that it wants to transmit to a target host T located some place on a connected piece of 10Mbit Ethernet cable. To actually transmit the Ethernet packet a 48.bit Ethernet address must be generated. The addresses of hosts within protocol P are not always compatible with the corresponding Ethernet address (being different lengths or values). Presented here is a protocol that allows dynamic distribution of the information needed to build tables to translate an address A in protocol P's address space into a 48.bit Ethernet address. Generalizations have been made which allow the protocol to be used for non-10Mbit Ethernet hardware. Some packet radio networks are examples of such hardware. -------------------------------------------------------------------- The protocol proposed here is the result of a great deal of discussion with several other people, most notably J. Noel Chiappa, Yogen Dalal, and James E. Kulp, and helpful comments from David Moon. [The purpose of this RFC is to present a method of Converting Protocol Addresses (e.g., IP addresses) to Local Network Addresses (e.g., Ethernet addresses). This is a issue of general concern in the ARPA Internet community at this time. The method proposed here is presented for your consideration and comment. This is not the specification of a Internet Standard.] Notes: ------ This protocol was originally designed for the DEC/Intel/Xerox 10Mbit Ethernet. It has been generalized to allow it to be used for other types of networks. Much of the discussion will be directed toward the 10Mbit Ethernet. Generalizations, where applicable, will follow the Ethernet-specific discussion. DOD Internet Protocol will be referred to as Internet.



Numbers here are in the Ethernet standard, which is high byte first. This is the opposite of the byte addressing of machines such as PDP-11s and VAXes. Therefore, special care must be taken with the opcode field (ar$op) described below. An agreed upon authority is needed to manage hardware name space values (see below). Until an official authority exists, requests should be submitted to David C. Plummer Symbolics, Inc. 243 Vassar Street Cambridge, Massachusetts 02139 Alternatively, network mail can be sent to DCP@MIT-MC. The Problem: The world is a jungle in general, and the networking game contributes many animals. At nearly every layer of a network architecture there are several potential protocols that could be used. For example, at a high level, there is TELNET and SUPDUP for remote login. Somewhere below that there is a reliable byte stream protocol, which might be CHAOS protocol, DOD TCP, Xerox BSP or DECnet. Even closer to the hardware is the logical transport layer, which might be CHAOS, DOD Internet, Xerox PUP, or DECnet. The 10Mbit Ethernet allows all of these protocols (and more) to coexist on a single cable by means of a type field in the Ethernet packet header. However, the 10Mbit Ethernet requires 48.bit addresses on the physical cable, yet most protocol addresses are not 48.bits long, nor do they necessarily have any relationship to the 48.bit Ethernet address of the hardware. For example, CHAOS addresses are 16.bits, DOD Internet addresses are 32.bits, and Xerox PUP addresses are 8.bits. A protocol is needed to dynamically distribute the correspondences between a <protocol, address> pair and a 48.bit Ethernet address. Motivation: Use of the 10Mbit Ethernet is increasing as more manufacturers supply interfaces that conform to the specification published by DEC, Intel and Xerox. With this increasing availability, more and more software is being written for these interfaces. There are two alternatives: (1) Every implementor invents his/her own method to do some form of address resolution, or (2) every implementor uses a standard so that his/her code can be distributed to other systems without need for modification. This proposal attempts to set the standard. Definitions: Define the following for referring to the values put in the TYPE field of the Ethernet packet header: ether_type$XEROX_PUP, ether_type$DOD_INTERNET, ether_type$CHAOS,



and a new one: ether_type$ADDRESS_RESOLUTION. Also define the following values (to be discussed later): ares_op$REQUEST (= 1, high byte transmitted first) and ares_op$REPLY (= 2), and ares_hrd$Ethernet (= 1). Packet format: To communicate mappings from <protocol, address> pairs to 48.bit Ethernet addresses, a packet format that embodies the Address Resolution protocol is needed. The format of the packet follows. Ethernet transmission layer (not necessarily accessible to the user): 48.bit: Ethernet address of destination 48.bit: Ethernet address of sender 16.bit: Protocol type = ether_type $ ADDRESS_RESOLUTION Ethernet packet data: 16.bit: (ar$hrd) Hardware address space (e.g., Ethernet, Packet Radio Net.) 16.bit: (ar$pro) Protocol address space. For Ethernet hardware, this is from the set of type fields ether_typ$<protocol>. 8.bit: (ar$hln) byte length of each hardware address 8.bit: (ar$pln) byte length of each protocol address 16.bit: (ar$op) opcode (ares_op$REQUEST | ares_op$REPLY) nbytes: (ar$sha) Hardware address of sender of this packet, n from the ar$hln field. mbytes: (ar$spa) Protocol address of sender of this packet, m from the ar$pln field. nbytes: (ar$tha) Hardware address of target of this packet (if known). mbytes: (ar$tpa) Protocol address of target. Packet Generation: As a packet is sent down through the network layers, routing determines the protocol address of the next hop for the packet and on which piece of hardware it expects to find the station with the immediate target protocol address. In the case of the 10Mbit Ethernet, address resolution is needed and some lower layer (probably the hardware driver) must consult the Address Resolution module (perhaps implemented in the Ethernet support module) to convert the <protocol type, target protocol address> pair to a 48.bit Ethernet address. The Address Resolution module tries to find this pair in a table. If it finds the pair, it gives the corresponding 48.bit Ethernet address back to the caller



(hardware driver) which then transmits the packet. If it does not, it probably informs the caller that it is throwing the packet away (on the assumption the packet will be retransmitted by a higher network layer), and generates an Ethernet packet with a type field of ether_type$ADDRESS_RESOLUTION. The Address Resolution module then sets the ar$hrd field to ares_hrd$Ethernet, ar$pro to the protocol type that is being resolved, ar$hln to 6 (the number of bytes in a 48.bit Ethernet address), ar$pln to the length of an address in that protocol, ar$op to ares_op$REQUEST, ar$sha with the 48.bit ethernet address of itself, ar$spa with the protocol address of itself, and ar$tpa with the protocol address of the machine that is trying to be accessed. It does not set ar$tha to anything in particular, because it is this value that it is trying to determine. It could set ar$tha to the broadcast address for the hardware (all ones in the case of the 10Mbit Ethernet) if that makes it convenient for some aspect of the implementation. It then causes this packet to be broadcast to all stations on the Ethernet cable originally determined by the routing mechanism. Packet Reception: When an address resolution packet is received, the receiving Ethernet module gives the packet to the Address Resolution module which goes through an algorithm similar to the following. Negative conditionals indicate an end of processing and a discarding of the packet. ?Do I have the hardware type in ar$hrd? Yes: (almost definitely) [optionally check the hardware length ar$hln] ?Do I speak the protocol in ar$pro? Yes: [optionally check the protocol length ar$pln] Merge_flag := false If the pair <protocol type, sender protocol address> is already in my translation table, update the sender hardware address field of the entry with the new information in the packet and set Merge_flag to true. ?Am I the target protocol address? Yes: If Merge_flag is false, add the triplet <protocol type, sender protocol address, sender hardware address> to the translation table. ?Is the opcode ares_op$REQUEST? (NOW look at the opcode!!) Yes: Swap hardware and protocol fields, putting the local hardware and protocol addresses in the sender fields. Set the ar$op field to ares_op$REPLY



Send the packet to the (new) target hardware address on the same hardware on which the request was received. Notice that the <protocol type, sender protocol address, sender hardware address> triplet is merged into the table before the opcode is looked at. This is on the assumption that communcation is bidirectional; if A has some reason to talk to B, then B will probably have some reason to talk to A. Notice also that if an entry already exists for the protocol type, sender protocol address> pair, then the new hardware address supersedes the old one. Related Issues gives some motivation for this. Generalization: The ar$hrd and ar$hln fields allow this protocol and packet format to be used for non-10Mbit Ethernets. For the 10Mbit Ethernet <ar$hrd, ar$hln> takes on the value <1, 6>. For other hardware networks, the ar$pro field may no longer correspond to the Ethernet type field, but it should be associated with the protocol whose address resolution is being sought. Why is it done this way?? Periodic broadcasting is definitely not desired. Imagine 100 workstations on a single Ethernet, each broadcasting address resolution information once per 10 minutes (as one possible set of parameters). This is one packet every 6 seconds. This is almost reasonable, but what use is it? The workstations aren't generally going to be talking to each other (and therefore have 100 useless entries in a table); they will be mainly talking to a mainframe, file server or bridge, but only to a small number of other workstations (for interactive conversations, for example). The protocol described in this paper distributes information as it is needed and only once (probably) per boot of a machine. This format does not allow for more than one resolution to be done in the same packet. This is for simplicity. If things were multiplexed the packet format would be considerably harder to digest, and much of the information could be gratuitous. Think of a bridge that talks four protocols telling a workstation all four protocol addresses, three of which the workstation will probably never use. This format allows the packet buffer to be reused if a reply is generated; a reply has the same length as a request, and several of the fields are the same. The value of the hardware field (ar$hrd) is taken from a list for this purpose. Currently the only defined value is for the 10Mbit Ethernet (ares_hrd$Ethernet = 1). There has been talk of using this protocol for Packet Radio Networks as well, and this will require another value as will other future hardware mediums that wish to use this protocol. For the 10Mbit Ethernet, the value in the protocol field (ar$pro) is taken from the set ether_type$. This is a natural reuse of the assigned protocol types. Combining this with



the opcode (ar$op) would effectively halve the number of protocols that can be resolved under this protocol and would make a monitor/debugger more complex (see Network Monitoring and Debugging below). It is hoped that we will never see 32768 protocols, but Murphy made some laws which don't allow us to make this assumption. In theory, the length fields (ar$hln and ar$pln) are redundant, since the length of a protocol address should be determined by the hardware type (found in ar$hrd) and the protocol type (found in ar$pro). It is included for optional consistency checking, and for network monitoring and debugging (see below). The opcode is to determine if this is a request (which may cause a reply) or a reply to a previous request. 16 bits for this is overkill, but a flag (field) is needed. The sender hardware address and sender protocol address are absolutely necessary. It is these fields that get put in a translation table. The target protocol address is necessary in the request form of the packet so that a machine can determine whether or not to enter the sender information in a table or to send a reply. It is not necessarily needed in the reply form if one assumes a reply is only provoked by a request. It is included for completeness, network monitoring, and to simplify the suggested processing algorithm described above (which does not look at the opcode until AFTER putting the sender information in a table). The target hardware address is included for completeness and network monitoring. It has no meaning in the request form, since it is this number that the machine is requesting. Its meaning in the reply form is the address of the machine making the request. In some implementations (which do not get to look at the 14.byte ethernet header, for example) this may save some register shuffling or stack space by sending this field to the hardware driver as the hardware destination address of the packet. There are no padding bytes between addresses. The packet data should be viewed as a byte stream in which only 3 byte pairs are defined to be words (ar$hrd, ar$pro and ar$op) which are sent most significant byte first (Ethernet/PDP-10 byte style). Network monitoring and debugging: The above Address Resolution protocol allows a machine to gain knowledge about the higher level protocol activity (e.g., CHAOS, Internet, PUP, DECnet) on an Ethernet cable. It can determine which Ethernet protocol type fields are in use (by value) and the protocol addresses within each protocol type. In fact, it is not necessary for the monitor to speak any of the higher level protocols involved. It goes something like this: When a monitor receives an Address Resolution packet, it always enters the <protocol type, sender protocol address, sender hardware address> in a table. It can determine the length of the hardware and protocol address from the ar$hln and ar$pln fields of the packet. If the opcode is a REPLY the monitor can then throw the packet away. If the



opcode is a REQUEST and the target protocol address matches the protocol address of the monitor, the monitor sends a REPLY as it normally would. The monitor will only get one mapping this way, since the REPLY to the REQUEST will be sent directly to the requesting host. The monitor could try sending its own REQUEST, but this could get two monitors into a REQUEST sending loop, and care must be taken. Because the protocol and opcode are not combined into one field, the monitor does not need to know which request opcode is associated with which reply opcode for the same higher level protocol. The length fields should also give enough information to enable it to "parse" a protocol addresses, although it has no knowledge of what the protocol addresses mean. A working implementation of the Address Resolution protocol can also be used to debug a non-working implementation. Presumably a hardware driver will successfully broadcast a packet with Ethernet type field of ether_type$ADDRESS_RESOLUTION. The format of the packet may not be totally correct, because initial implementations may have bugs, and table management may be slightly tricky. Because requests are broadcast a monitor will receive the packet and can display it for debugging if desired. An Example: Let there exist machines X and Y that are on the same 10Mbit Ethernet cable. They have Ethernet address EA(X) and EA(Y) and DOD Internet addresses IPA(X) and IPA(Y) . Let the Ethernet type of Internet be ET(IP). Machine X has just been started, and sooner or later wants to send an Internet packet to machine Y on the same cable. X knows that it wants to send to IPA(Y) and tells the hardware driver (here an Ethernet driver) IPA(Y). The driver consults the Address Resolution module to convert <ET(IP), IPA(Y)> into a 48.bit Ethernet address, but because X was just started, it does not have this information. It throws the Internet packet away and instead creates an ADDRESS RESOLUTION packet with (ar$hrd) = ares_hrd$Ethernet (ar$pro) = ET(IP) (ar$hln) = length(EA(X)) (ar$pln) = length(IPA(X)) (ar$op) = ares_op$REQUEST (ar$sha) = EA(X) (ar$spa) = IPA(X) (ar$tha) = don't care (ar$tpa) = IPA(Y) and broadcasts this packet to everybody on the cable. Machine Y gets this packet, and determines that it understands the hardware type (Ethernet), that it speaks the indicated protocol (Internet) and that the packet is for it ((ar$tpa)=IPA(Y)). It enters (probably replacing any existing entry) the information that <ET(IP), IPA(X)> maps to EA(X). It then notices that it is a request, so it swaps fields,



putting EA(Y) in the new sender Ethernet address field (ar$sha), sets the opcode to reply, and sends the packet directly (not broadcast) to EA(X). At this point Y knows how to send to X, but X still doesn't know how to send to Y. Machine X gets the reply packet from Y, forms the map from <ET(IP), IPA(Y)> to EA(Y), notices the packet is a reply and throws it away. The next time X's Internet module tries to send a packet to Y on the Ethernet, the translation will succeed, and the packet will (hopefully) arrive. If Y's Internet module then wants to talk to X, this will also succeed since Y has remembered the information from X's request for Address Resolution. Related issue: It may be desirable to have table aging and/or timeouts. The implementation of these is outside the scope of this protocol. Here is a more detailed description (thanks to MOON@SCRC@MIT-MC). If a host moves, any connections initiated by that host will work, assuming its own address resolution table is cleared when it moves. However, connections initiated to it by other hosts will have no particular reason to know to discard their old address. However, 48.bit Ethernet addresses are supposed to be unique and fixed for all time, so they shouldn't change. A host could "move" if a host name (and address in some other protocol) were reassigned to a different physical piece of hardware. Also, as we know from experience, there is always the danger of incorrect routing information accidentally getting transmitted through hardware or software error; it should not be allowed to persist forever. Perhaps failure to initiate a connection should inform the Address Resolution module to delete the information on the basis that the host is not reachable, possibly because it is down or the old translation is no longer valid. Or perhaps receiving of a packet from a host should reset a timeout in the address resolution entry used for transmitting packets to that host; if no packets are received from a host for a suitable length of time, the address resolution entry is forgotten. This may cause extra overhead to scan the table for each incoming packet. Perhaps a hash or index can make this faster. The suggested algorithm for receiving address resolution packets tries to lessen the time it takes for recovery if a host does move. Recall that if the <protocol type, sender protocol address> is already in the translation table, then the sender hardware address supersedes the existing entry. Therefore, on a perfect Ethernet where a broadcast REQUEST reaches all stations on the cable, each station will be get the new hardware address. Another alternative is to have a daemon perform the timeouts. After a suitable time, the daemon considers removing an entry. It first sends (with a small number of retransmissions if needed) an address resolution packet with opcode REQUEST directly to the Ethernet address in the table. If a REPLY is not seen in a short amount of time, the entry is deleted. The request is sent directly so as not to bother every station on the



Ethernet. Just forgetting entries will likely cause useful information to be forgotten, which must be regained. Since hosts don't transmit information about anyone other than themselves, rebooting a host will cause its address mapping table to be up to date. Bad information can't persist forever by being passed around from machine to machine; the only bad information that can exist is in a machine that doesn't know that some other machine has changed its 48.bit Ethernet address. Perhaps manually resetting (or clearing) the address mapping table will suffice. This issue clearly needs more thought if it is believed to be important. It is caused by any address resolution-like protocol.

Chapter 5 RARP Reverse Address Resolution Protocol


Chapter 5 Reverse Address Resolution Protocol (RARP) Reverse Address Resolution Protocol (RARP) is a network layer protocol used to resolve an IP address from a given hardware address (such as an Ethernet address). It has been rendered obsolete by BOOTP and the more modern DHCP, which both support a much greater feature set than RARP. The primary limitations of RARP are that each MAC must be manually configured on a central server, and that the protocol only conveys an IP address. This leaves configuration of subnetting, gateways, and other information to other protocols or the user. Another limitation of RARP compared to BOOTP or DHCP is that it is a non-IP protocol. This means that like ARP it can't be handled by the TCP/IP stack on the client, but is instead implemented separately. RARP is the complement of ARP. RARP is described in RFC 903. In computing, BOOTP, short for Bootstrap Protocol, is a UDP network protocol used by a network client to obtain its IP address automatically. This is usually done in the bootstrap process of computers or operating systems running on them. The BOOTP servers assign the IP address from a pool of addresses to each client. The protocol was originally defined in RFC 951. BOOTP enables 'diskless workstation' computers to obtain an IP address prior to loading any advanced operating system. Historically, it has been used for Unix-like diskless workstations (which also obtained the location of their boot image using this protocol) and also by corporations to roll out a pre-configured client (e.g. Windows) installation to newly purchased PCs. Originally requiring the use of a boot floppy disk to establish the initial network connection, the protocol became embedded in the BIOS of some network cards themselves (such as 3c905c) and in many modern motherboards thus allowing direct network booting. Recently those with an interest in diskless stand-alone media center PCs have shown new interest in this method of booting a Windows operating system (see eg. Personal Computer World, Feb 2005, pg 156 'Putting the Boot in'). DHCP (Dynamic Host Configuration Protocol) is a more advanced protocol based on BOOTP, but is far more complex to implement. Most DHCP servers also offer BOOTP support. The Dynamic Host Configuration Protocol (DHCP) is a set of rules used by a communications device such as a computer, router or networking adapter to allow the device to request and obtain an IP address from a server, which has a list of addresses available for assignment.



Introduction DHCP is a protocol used by networked computers (clients) to obtain IP addresses and other parameters such as the default gateway, subnet mask, and IP addresses of DNS servers from a DHCP server. It facilitates access to a network because these settings would otherwise have to be made manually for the client to participate in the network. The DHCP server ensures that all IP addresses are unique, e.g., no IP address is assigned to a second client while the first client's assignment is valid (its lease has not expired). Thus IP address pool management is done by the server and not by a human network administrator. DHCP emerged as a standard protocol in October 1993. As of 2006, RFC 2131 provides the latest ([dated March 1997]) DHCP definition. DHCP functionally became a successor to the older BOOTP protocol, whose leases were given for infinite time and did not support options. Due to the backward-compatibility of DHCP, very few networks continue to use pure BOOTP. The latest non-standard of the protocol, describing DHCPv6 (DHCP in an IPv6 environment), appeared in July 2003 as RFC 3315. Overview The Dynamic Host Configuration Protocol (DHCP) automates the assignment of IP addresses, subnet masks, default gateway, and other IP parameters. The assignment occurs when the DHCP-configured machine boots up or regains connectivity to the network. The DHCP client sends out a query requesting a response from a DHCP server on the locally attached network. The query is typically initiated immediately after booting up and before the client initiates any IP based communication with other hosts. The DHCP server then replies to the client with its assigned IP address, subnet mask, DNS server and default gateway information. The assignment of the IP address generally expires after a predetermined period of time, at which point the DHCP client and server renegotiate a new IP address from the server's predefined pool of addresses. Typical intervals range from one hour to several months, and can, if desired, be set to infinite (never expire). The length of time the address is available to the device it was assigned to is called a lease, and is determined by the server. Configuring firewall rules to accommodate access from machines who receive their IP addresses via DHCP is therefore more difficult because the remote IP address will vary from time to time. Administrators must usually allow access to the entire remote DHCP subnet for a particular TCP/UDP port. Most home routers and firewalls are configured in the factory to be DHCP servers for a home network. An alternative to a home router is to use a computer as a DHCP server. ISPs generally use DHCP to assign clients individual IP addresses.



DHCP is a broadcast-based protocol. As with other types of broadcast traffic, it does not cross a router unless specifically configured to do so. Users who desire this capability must configure their routers to pass DHCP traffic across UDP ports 67 and 68. Home users, however, will practically never need this functionality.

Chapter 6 RIP Routing Information Protocol


Routing Information Protocol

A set of routing protocols that are used within an autonomous system are referred to as interior gateway protocols (IGP).

In contrast an exterior gateway protocol is for determining network reachability between autonomous systems (AS) and make use of IGPs to resolve route within an AS.

The interior gateway protocols can be divided into two categories: 1) Distance-vector routing protocol and 2) Link-state routing protocol.

Types of Interior gateway protocols

Distance-vector routing protocol

They use the Bellman-Ford algorithm to calculate paths. In Distance-vector routing protocols each router does not posses information about the full network topology. It advertises its distances from other routers and receives similar advertisements from other routers. Using these routing advertisements each router populates its routing table. In the next advertisement cycle, a router advertises updated information from its routing table. This process continues until the routing tables of each router converge to stable values.

This set of protocols has the disadvantage of slow convergence, however, they are usually simple to handle and are well suited for use with small networks. Some examples of distance-vector routing protocols are:

1. Routing Information Protocol (RIP) 2. Interior Gateway Routing Protocol (IGRP)

Link-state routing protocol

In the case of Link-state routing protocols, each node possesses information about the complete network topology. Each node then independently calculates the best next hop from it for every possible destination in the network using local information of the topology. The collection of best next hops forms the routing table for the node.

This contrasts with distance-vector routing protocols, which work by having each node share its routing table with its neighbors. In a link-state protocol, the only information passed between the nodes is information used to construct the connectivity maps.

Example of Link-state routing protocols are:

1. Open Shortest Path First (OSPF)



2. Intermediate system to intermediate system (IS-IS)

Background

The Routing Information Protocol, or RIP, as it is more commonly called, is one of the most enduring of all routing protocols. RIP is also one of the more easily confused protocols because a variety of RIP-like routing protocols proliferated, some of which even used the same name! RIP and the myriad RIP-like protocols were based on the same set of algorithms that use distance vectors to mathematically compare routes to identify the best path to any given destination address. These algorithms emerged from academic research that dates back to 1957.

Today's open standard version of RIP, sometimes referred to as IP RIP, is formally defined in two documents: Request For Comments (RFC) 1058 and Internet Standard (STD) 56. As IP-based networks became both more numerous and greater in size, it became apparent to the Internet Engineering Task Force (IETF) that RIP needed to be updated. Consequently, the IETF released RFC 1388 in January 1993, which was then superceded in November 1994 by RFC 1723, which describes RIP 2 (the second version of RIP). These RFCs described an extension of RIP's capabilities but did not attempt to obsolete the previous version of RIP. RIP 2 enabled RIP messages to carry more information, which permitted the use of a simple authentication mechanism to secure table updates. More importantly, RIP 2 supported subnet masks, a critical feature that was not available in RIP.

This chapter summarizes the basic capabilities and features associated with RIP. Topics include the routing update process, RIP routing metrics, routing stability, and routing timers.

Routing Updates

RIP sends routing-update messages at regular intervals and when the network topology changes. When a router receives a routing update that includes changes to an entry, it updates its routing table to reflect the new route. The metric value for the path is increased by 1, and the sender is indicated as the next hop. RIP routers maintain only the best route (the route with the lowest metric value) to a destination. After updating its routing table, the router immediately begins transmitting routing updates to inform other network routers of the change. These updates are sent independently of the regularly scheduled updates that RIP routers send.

RIP Routing Metric

RIP uses a single routing metric (hop count) to measure the distance between the source and a destination network. Each hop in a path from source to destination is assigned a hop count value, which is typically 1. When a router receives a routing update that contains a new or changed destination network entry, the router adds 1 to the metric value



indicated in the update and enters the network in the routing table. The IP address of the sender is used as the next hop.

RIP Stability Features

RIP prevents routing loops from continuing indefinitely by implementing a limit on the number of hops allowed in a path from the source to a destination. The maximum number of hops in a path is 15. If a router receives a routing update that contains a new or changed entry, and if increasing the metric value by 1 causes the metric to be infinity (that is, 16), the network destination is considered unreachable. The downside of this stability feature is that it limits the maximum diameter of a RIP network to less than 16 hops.

RIP includes a number of other stability features that are common to many routing protocols. These features are designed to provide stability despite potentially rapid changes in a network's topology. For example, RIP implements the split horizon and holddown mechanisms to prevent incorrect routing information from being propagated.

RIP Timers

RIP uses numerous timers to regulate its performance. These include a routing-update timer, a route-timeout timer, and a route-flush timer. The routing-update timer clocks the interval between periodic routing updates. Generally, it is set to 30 seconds, with a small random amount of time added whenever the timer is reset. This is done to help prevent congestion, which could result from all routers simultaneously attempting to update their neighbors. Each routing table entry has a route-timeout timer associated with it. When the route-timeout timer expires, the route is marked invalid but is retained in the table until the route-flush timer expires.

Packet Formats

The following section focuses on the IP RIP and IP RIP 2 packet formats illustrated in Figures 44-1 and 44-2. Each illustration is followed by descriptions of the fields illustrated.

RIP Packet Format

Figure 47-1 illustrates the IP RIP packet format.

Figure 47-1 An IP RIP Packet Consists of Nine Fields



The following descriptions summarize the IP RIP packet format fields illustrated in Figure 47-1:

• Command—Indicates whether the packet is a request or a response. The request asks that a router send all or part of its routing table. The response can be an unsolicited regular routing update or a reply to a request. Responses contain routing table entries. Multiple RIP packets are used to convey information from large routing tables.

• Version number—Specifies the RIP version used. This field can signal different potentially incompatible versions.

• Zero—This field is not actually used by RFC 1058 RIP; it was added solely to provide backward compatibility with prestandard varieties of RIP. Its name comes from its defaulted value: zero.

• Address-family identifier (AFI)—Specifies the address family used. RIP is designed to carry routing information for several different protocols. Each entry has an address-family identifier to indicate the type of address being specified. The AFI for IP is 2.

• Address—Specifies the IP address for the entry.

• Metric—Indicates how many internetwork hops (routers) have been traversed in the trip to the destination. This value is between 1 and 15 for a valid route, or 16 for an unreachable route.

Note Up to 25 occurrences of the AFI, Address, and Metric fields are permitted in a single IP RIP packet. (Up to 25 destinations can be listed in a single RIP packet.)

RIP 2 Packet Format



The RIP 2 specification (described in RFC 1723) allows more information to be included in RIP packets and provides a simple authentication mechanism that is not supported by RIP. Figure 47-2 shows the IP RIP 2 packet format.

Figure 47-2 An IP RIP 2 Packet Consists of Fields Similar to Those of an IP RIP Packet

The following descriptions summarize the IP RIP 2 packet format fields illustrated in Figure 47-2:

• Command—Indicates whether the packet is a request or a response. The request asks that a router send all or a part of its routing table. The response can be an unsolicited regular routing update or a reply to a request. Responses contain routing table entries. Multiple RIP packets are used to convey information from large routing tables.

• Version—Specifies the RIP version used. In a RIP packet implementing any of the RIP 2 fields or using authentication, this value is set to 2.

• Unused—Has a value set to zero.

• Address-family identifier (AFI)—Specifies the address family used. RIPv2's AFI field functions identically to RFC 1058 RIP's AFI field, with one exception: If the AFI for the first entry in the message is 0xFFFF, the remainder of the entry contains authentication information. Currently, the only authentication type is simple password.

• Route tag—Provides a method for distinguishing between internal routes (learned by RIP) and external routes (learned from other protocols).

• IP address—Specifies the IP address for the entry.

• Subnet mask—Contains the subnet mask for the entry. If this field is zero, no subnet mask has been specified for the entry.

• Next hop—Indicates the IP address of the next hop to which packets for the entry should be forwarded.

• Metric—Indicates how many internetwork hops (routers) have been traversed in the trip to the destination. This value is between 1 and 15 for a valid route, or 16 for an unreachable route.



The Routing Information Protocol (RIP) The Routing Information Protocol (RIP) described here is loosely based on the program "routed", distributed with the 4.3 Berkeley Software Distribution. However, there are several other implementations of what is supposed to be the same protocol. Unfortunately, these various implementations disagree in various details. The specifications here represent a combination of features taken from various implementations. We believe that a program designed according to this document will interoperate with routed, and with all other implementations of RIP of which we are aware. Note that this description adopts a different view than most existing implementations about when metrics should be incremented. By making a corresponding change in the metric used for a local network, we have retained compatibility with other existing implementations. See section 3.6 for details on this issue. 1. Introduction This memo describes one protocol in a series of routing protocols based on the Bellman-Ford (or distance vector) algorithm. This algorithm has been used for routing computations in computer networks since the early days of the ARPANET. The particular packet formats and protocol described here is based on the program "routed", which is included with the Berkeley distribution of Unix. It has become a de facto standard for exchange of routing information among gateways and hosts. It is implemented for this purpose by most commercial vendors of IP gateways. Note, however, that many of these vendors have their own protocols, which are used among their own gateways. This protocol is most useful as an "interior gateway protocol". In a nationwide network such as the current Internet, it is very unlikely that a single routing protocol will used for the whole network. Rather, the network will be organized as a collection of "autonomous systems". An autonomous system will in general be administered by a single entity, or at least will have some reasonable degree of technical and administrative control. Each autonomous system will have its own routing technology. This may well be different for different autonomous systems. The routing protocol used within an autonomous system is referred to as an interior gateway protocol, or "IGP". A separate protocol is used to interface among the autonomous systems. The earliest such protocol, still used in the Internet, is"EGP" (exterior gateway protocol). Such protocols are now usually referred to as inter-AS routing protocols. RIP was designed to work with moderate-size networks using reasonably homogeneous technology. Thus it is suitable as an IGP for many campuses and for regional networks using serial lines whose speeds do not vary widely.



It is not intended for use in more complex environments. For more information on the context into which RIP is expected to fit, see Braden and Postel [3]. RIP is one of a class of algorithms known as "distance vector algorithms". The earliest description of this class of algorithms known to the author is in Ford and Fulkerson [6]. Because of this, they are sometimes known as Ford-Fulkerson algorithms. The term Bellman-Ford is also used. It comes from the fact that the formulation is based on Bellman's equation, the basis of "dynamic programming". (For a standard introduction to this area, see [1].) The presentation in this document is closely based on [2]. This text contains an introduction to the mathematics of routing algorithms. It describes and justifies several variants of the algorithm presented here, as well as a number of other related algorithms. The basic algorithms described in this protocol were used in computer routing as early as 1969 in the ARPANET. However, the specific ancestry of this protocol is within the Xerox network protocols. The PUP protocols (see [4]) used the Gateway Information Protocol to exchange routing information. A somewhat updated version of this protocol was adopted for the Xerox Network Systems (XNS) architecture, with the name Routing Information Protocol. (See [7].) Berkeley's routed is largely the same as the Routing Information Protocol, with XNS addresses replaced by a more general address format capable of handling IP and other types of address, and with routing updates limited to one every 30 seconds. Because of this similarity, the term Routing Information Protocol (or just RIP) is used to refer to both the XNS protocol and the protocol used by routed. RIP is intended for use within the IP-based Internet. The Internet is organized into a number of networks connected by gateways. The networks may be either point-to-point links or more complex networks such as Ethernet or the ARPANET. Hosts and gateways are presented with IP datagrams addressed to some host. Routing is the method by which the host or gateway decides where to send the datagram. It may be able to send the datagram directly to the destination, if that destination is on one of the networks that are directly connected to the host or gateway. However, the interesting case is when the destination is not directly reachable. In this case, the host or gateway attempts to send the datagram to a gateway that is nearer the destination. The goal of a routing protocol is very simple: It is to supply the information that is needed to do routing. 1.1. Limitations of the protocol This protocol does not solve every possible routing problem. As mentioned above, it is primary intended for use as an IGP, in reasonably homogeneous networks of moderate size. In addition, the following specific limitations should be mentioned: The protocol is limited to networks whose longest path involves 15 hops. The designers believe that the basic protocol design is inappropriate for larger networks. Note that this statement of the limit assumes that a cost of 1 is used for each network. This is the way RIP is normally configured. If the system administrator chooses to use larger costs, the upper bound of 15 can easily become a problem.



The protocol depends upon "counting to infinity" to resolve certain unusual situations. (This will be explained in the next section.) If the system of networks has several hundred networks, and a routing loop was formed involving all of them, the resolution of the loop would require either much time (if the frequency of routing updates were limited) or bandwidth (if updates were sent whenever changes were detected). Such a loop would consume a large amount of network bandwidth before the loop was corrected. We believe that in realistic cases, this will not be a problem except on slow lines. Even then, the problem will be fairly unusual, since various precautions are taken that should prevent these problems in most cases. This protocol uses fixed "metrics" to compare alternative routes. It is not appropriate for situations where routes need to be chosen based on real-time parameters such a measured delay, reliability, or load. The obvious extensions to allow metrics of this type are likely to introduce instabilities of a sort that the protocol is not designed to handle. 1.2. Organization of this document The main body of this document is organized into two parts, which occupy the next two sections: A conceptual development and justification of distance vector algorithms in general. The actual protocol description. Each of these two sections can largely stand on its own. Section 2 attempts to give an informal presentation of the mathematical underpinnings of the algorithm. Note that the presentation follows a "spiral" method. An initial, fairly simple algorithm is described. Then refinements are added to it in successive sections. Section 3 is the actual protocol description. Except where specific references are made to section 2, it should be possible to implement RIP entirely from the specifications given in section 3. Distance Vector Algorithms Routing is the task of finding a path from a sender to a desired destination. In the IP "Catenet model" this reduces primarily to a matter of finding gateways between networks. As long as a message remains on a single network or subnet, any routing problems are solved by technology that is specific to the network. For example, the Ethernet and the ARPANET each define a way in which any sender can talk to any specified destination within that one network. IP routing comes in primarily when messages must go from a sender on one such network to a destination on a different one. In that case, the message must pass through gateways connecting the networks. If the networks are not adjacent, the message may pass through several intervening networks, and the gateways connecting them. Once the message gets to a gateway that is on the same network as the destination, that network's own technology is used to get to the destination.



Throughout this section, the term "network" is used generically to cover a single broadcast network (e.g., an Ethernet), a point to point line, or the ARPANET. The critical point is that a network is treated as a single entity by IP. Either no routing is necessary (as with a point to point line), or that routing is done in a manner that is transparent to IP, allowing IP to treat the entire network as a single fully-connected system (as with an Ethernet or the ARPANET). Note that the term "network" is used in a somewhat different way in discussions of IP addressing. A single IP network number may be assigned to a collection of networks, with "subnet" addressing being used to describe the individual networks. In effect, we are using the term "network" here to refer to subnets in cases where subnet addressing is in use. A number of different approaches for finding routes between networks are possible. One useful way of categorizing these approaches is on the basis of the type of information the gateways need to exchange in order to be able to find routes. Distance vector algorithms are based on the exchange of only a small amount of information. Each entity (gateway or host) that participates in the routing protocol is assumed to keep information about all of the destinations within the system. Generally, information about all entities connected to one network is summarized by a single entry, which describes the route to all destinations on that network. This summarization is possible because as far as IP is concerned, routing within a network is invisible. Each entry in this routing database includes the next gateway to which datagrams destined for the entity should be sent. In addition, it includes a "metric" measuring the total distance to the entity. Distance is a somewhat generalized concept, which may cover the time delay in getting messages to the entity, the dollar cost of sending messages to it, etc. Distance vector algorithms get their name from the fact that it is possible to compute optimal routes when the only information exchanged is the list of these distances. Furthermore, information is only exchanged among entities that are adjacent, that is, entities that share a common network. Although routing is most commonly based on information about networks, it is sometimes necessary to keep track of the routes to individual hosts. The RIP protocol makes no formal distinction between networks and hosts. It simply describes exchange of information about destinations, which may be either networks or hosts. (Note however, that it is possible for an implementer to choose not to support host routes. See section 3.2.) In fact, the mathematical developments are most conveniently thought of in terms of routes from one host or gateway to another. When discussing the algorithm in abstract terms, it is best to think of a routing entry for a network as an abbreviation for routing entries for all of the entities connected to that network. This sort of abbreviation makes sense only because we think of networks as having no internal structure that is visible at the IP level. Thus, we will generally assign the same distance to every entity in a given network. We said above that each entity keeps a routing database with one entry for every possible destination in the system. An actual implementation is likely to need to keep the following information about each destination: address: in IP implementations of these algorithms, this will be the IP address of the host or network.



- Gateway: the first gateway along the route to the destination. - Interface: the physical network, which must be used to reach the first gateway. - Metric: a number, indicating the distance to the destination. - Timer: the amount of time since the entry was last updated. In addition, various flags and other internal information will probably be included. This database is initialized with a description of the entities that are directly connected to the system. It is updated according to information received in messages from neighboring gateways. The most important information exchanged by the hosts and gateways is that carried in update messages. Each entity that participates in the routing scheme sends update messages that describe the routing database as it currently exists in that entity. It is possible to maintain optimal routes for the entire system by using only information obtained from neighboring entities. The algorithm used for that will be described in the next section. As we mentioned above, the purpose of routing is to find a way to get datagrams to their ultimate destinations. Distance vector algorithms are based on a table giving the best route to every destination in the system. Of course, in order to define which route is best, we have to have some way of measuring goodness. This is referred to as the "metric". In simple networks, it is common to use a metric that simply counts how many gateways a message must go through. In more complex networks, a metric is chosen to represent the total amount of delay that the message suffers, the cost of sending it, or some other quantity, which may be minimized. The main requirement is that it must be possible to represent the metric as a sum of "costs" for individual hops. Formally, if it is possible to get from entity i to entity j directly (i.e., without passing through another gateway between), then a cost, d(i,j), is associated with the hop between i and j. In the normal case where all entities on a given network are considered to be the same, d(i,j) is the same for all destinations on a given network, and represents the cost of using that network. To get the metric of a complete route, one just adds up the costs of the individual hops that make up the route. For the purposes of this memo, we assume that the costs are positive integers. Let D(i,j) represent the metric of the best route from entity i to entity j. It should be defined for every pair of entities. d(i,j) represents the costs of the individual steps. Formally, let d(i,j) represent the cost of going directly from entity i to entity j. It is infinite if i and j are not immediate neighbors. (Note that d(i,i) is infinite. That is, we don't consider there to be a direct connection from a node to itself.) Since costs are additive, it is easy to show that the best metric must be described by D(i,i) = 0, all i D(i,j) = min [d(i,k) + D(k,j)], otherwise k



and that the best routes start by going from i to those neighbors k for which d(i,k) + D(k,j) has the minimum value. (These things can be shown by induction on the number of steps in the routes.) Note that we can limit the second equation to k's that are immediate neighbors of i. For the others, d(i,k) is infinite, so the term involving them can never be the minimum. It turns out that one can compute the metric by a simple algorithm based on this. Entity i gets its neighbors k to send it their estimates of their distances to the destination j. When i gets the estimates from k, it adds d(i,k) to each of the numbers. This is simply the cost of traversing the network between i and k. Now and then i compares the values from all of its neighbors and picks the smallest. A proof is given in [2] that this algorithm will converge to the correct estimates of D(i,j) in finite time in the absence of topology changes. The authors make very few assumptions about the order in which the entities send each other their information, or when the min is recomputed. Basically, entities just can't stop sending updates or recomputing metrics, and the networks can't delay messages forever. (Crash of a routing entity is a topology change.) Also, their proof does not make any assumptions about the initial estimates of D(i,j), except that they must be non-negative. The fact that these fairly weak assumptions are good enough is important. Because we don't have to make assumptions about when updates are sent, it is safe to run the algorithm asynchronously. That is, each entity can send updates according to its own clock. The network can drop updates, as long as they don't all get dropped. Because we don't have to make assumptions about the starting condition, the algorithm can handle changes. When the system changes, the routing algorithm starts moving to a new equilibrium, using the old one as its starting point. It is important that the algorithm will converge in finite time no matter what the starting point. Otherwise certain kinds of changes might lead to non-convergent behavior. The statement of the algorithm given above (and the proof) assumes that each entity keeps copies of the estimates that come from each of its neighbors, and now and then does a min over all of the neighbors. In fact real implementations don't necessarily do that. They simply remember the best metric seen so far, and the identity of the neighbor that sent it. They replace this information whenever they see a better (smaller) metric. This allows them to compute the minimum incrementally, without having to store data from all of the neighbors. There is one other difference between the algorithm as described in texts and those used in real protocols such as RIP: the description above would have each entity include an entry for itself, showing a distance of zero. In fact this is not generally done. Recall that all entities on a network are normally summarized by a single entry for the network. Consider the situation of a host or gateway G that is connected to network A. C represents the cost of using network A (usually a metric of one). (Recall that we are assuming that the internal structure of a network is not visible to IP, and thus the cost of going between any two entities on it is the same.) In principle, G should get a message from every other entity H on network A, showing a cost of 0 to get from that entity to



itself. G would then compute C + 0 as the distance to H. Rather than having G look at all of these identical messages, it simply starts out by making an entry for network A in its table, and assigning it a metric of C. This entry for network A should be thought of as summarizing the entries for all other entities on network A. The only entity on A that can't be summarized by that common entry is G itself, since the cost of going from G to G is 0, not C. But since we never need those 0 entries, we can safely get along with just the single entry for network A. Note one other implication of this strategy: because we don't need to use the 0 entries for anything, hosts that do not function as gateways don't need to send any update messages. Clearly hosts that don't function as gateways (i.e., hosts that are connected to only one network) can have no useful information to contribute other than their own entry D(i,i) = 0. As they have only the one interface, it is easy to see that a route to any other network through them will simply go in that interface and then come right back out it. Thus the cost of such a route will be greater than the best cost by at least C. Since we don't need the 0 entries, non- gateways need not participate in the routing protocol at all. Let us summarize what a host or gateway G does. For each destination in the system, G will keep a current estimate of the metric for that destination (i.e., the total cost of getting to it) and the identity of the neighboring gateway on whose data that metric is based. If the destination is on a network that is directly connected to G, then G simply uses an entry that shows the cost of using the network, and the fact that no gateway is needed to get to the destination. It is easy to show that once the computation has converged to the correct metrics, the neighbor that is recorded by this technique is in fact the first gateway on the path to the destination. (If there are several equally good paths, it is the first gateway on one of them.) This combination of destination, metric, and gateway is typically referred to as a route to the destination with that metric, using that gateway. The method so far only has a way to lower the metric, as the existing metric is kept until a smaller one shows up. It is possible that the initial estimate might be too low. Thus, there must be a way to increase the metric. It turns out to be sufficient to use the following rule: suppose the current route to a destination has metric D and uses gateway G. If a new set of information arrived from some source other than G, only update the route if the new metric is better than D. But if a new set of information arrives from G itself, always update D to the new value. It is easy to show that with this rule, the incremental update process produces the same routes as a calculation that remembers the latest information from all the neighbors and does an explicit minimum. (Note that the discussion so far assumes that the network configuration is static. It does not allow for the possibility that a system might fail.) Done 12 2 07 To summarize, here is the basic distance vector algorithm as it has been developed so far. (Note that this is not a statement of the RIP protocol. There are several refinements still to be added.) The



following procedure is carried out by every entity that participates in the routing protocol. This must include all of the gateways in the system. Hosts that are not gateways may participate as well. Keep a table with an entry for every possible destination in the system. The entry contains the distance D to the destination, and the first gateway G on the route to that network. Conceptually, there should be an entry for the entity itself, with metric 0, but this is not actually included. Periodically, send a routing update to every neighbor. The update is a set of messages that contain all of the information from the routing table. It contains an entry for each destination, with the distance shown to that destination. When a routing update arrives from a neighbor G', add the cost associated with the network that is shared with G'. (This should be the network over which the update arrived.) Call the resulting distance D'. Compare the resulting distances with the current routing table entries. If the new distance D' for N is smaller than the existing value D, adopt the new route. That is, change the table entry for N to have metric D' and gateway G'. If G' is the gateway from which the existing route came, i.e., G' = G, then use the new metric even if it is larger than the old one. 2.1. Dealing with changes in topology The discussion above assumes that the topology of the network is fixed. In practice, gateways and lines often fail and come back up. To handle this possibility, we need to modify the algorithm slightly. The theoretical version of the algorithm involved a minimum over all immediate neighbors. If the topology changes, the set of neighbors changes. Therefore, the next time the calculation is done, the change will be reflected. However, as mentioned above, actual implementations use an incremental version of the minimization. Only the best route to any given destination is remembered. If the gateway involved in that route should crash, or the network connection to it break, the calculation might never reflect the change. The algorithm as shown so far depends upon a gateway notifying its neighbors if its metrics change. If the gateway crashes, then it has no way of notifying neighbors of a change. In order to handle problems of this kind, distance vector protocols must make some provision for timing out routes. The details depend upon the specific protocol. As an example, in RIP every gateway that participates in routing sends an update message to all its neighbors once every 30 seconds. Suppose the current route for network N uses gateway G. If we don't hear from G for 180 seconds, we can assume



that either the gateway has crashed or the network connecting us to it has become unusable. Thus, we mark the route as invalid. When we hear from another neighbor that has a valid route to N, the valid route will replace the invalid one. Note that we wait for 180 seconds before timing out a route even though we expect to hear from each neighbor every 30 seconds. Unfortunately, messages are occasionally lost by networks. Thus, it is probably not a good idea to invalidate a route based on a single missed message. As we will see below, it is useful to have a way to notify neighbors that there currently isn't a valid route to some network. RIP, along with several other protocols of this class, does this through a normal update message, by marking that network as unreachable. A specific metric value is chosen to indicate an unreachable destination; that metric value is larger than the largest valid metric that we expect to see. In the existing implementation of RIP, 16 is used. This value is normally referred to as "infinity", since it is larger than the largest valid metric. 16 may look like a surprisingly small number. It is chosen to be this small for reasons that we will see shortly. In most implementations, the same convention is used internally to flag a route as invalid. 2.2. Preventing instability The algorithm as presented up to this point will always allow a host or gateway to calculate a correct routing table. However, that is still not quite enough to make it useful in practice. The proofs referred to above only show that the routing tables will converge to the correct values in finite time. They do not guarantee that this time will be small enough to be useful, nor do they say what will happen to the metrics for networks that become inaccessible. It is easy enough to extend the mathematics to handle routes becoming inaccessible. The convention suggested above will do that. We choose a large metric value to represent "infinity". This value must be large enough that no real metric would ever get that large. For the purposes of this example, we will use the value 16. Suppose a network becomes inaccessible. All of the immediately neighboring gateways time out and set the metric for that network to 16. For purposes of analysis, we can assume that all the neighboring gateways have gotten a new piece of hardware that connects them directly to the vanished network, with a cost of 16. Since that is the only connection to the vanished network, all the other gateways in the system will converge to new routes that go through one of those gateways. It is easy to see that once convergence has happened, all the gateways will have metrics of at least 16 for the vanished



network. Gateways one hop away from the original neighbors would end up with metrics of at least 17; gateways two hops away would end up with at least 18, etc. As these metrics are larger than the maximum metric value, they are all set to 16. It is obvious that the system will now converge to a metric of 16 for the vanished network at all gateways. Unfortunately, the question of how long convergence will take is not amenable to quite so simple an answer. Before going any further, it will be useful to look at an example (taken from [2]). Note, by the way, that what we are about to show will not happen with a correct implementation of RIP. We are trying to show why certain features are needed. Note that the letters correspond to gateways, and the lines to networks. A-----B \ / \ \ / | C / all networks have cost 1, except | / for the direct link from C to D, which |/ has cost 10 D |<=== target network Each gateway will have a table showing a route to each network. However, for purposes of this illustration, we show only the routes from each gateway to the network marked at the bottom of the diagram. D: directly connected, metric 1 B: route via D, metric 2 C: route via B, metric 3 A: route via B, metric 3 Now suppose that the link from B to D fails. The routes should now adjust to use the link from C to D. Unfortunately, it will take a while for this to this to happen. The routing changes start when B notices that the route to D is no longer usable. For simplicity, the chart below assumes that all gateways send updates at the same time. The chart shows the metric for the target network, as it appears in the routing table at each gateway. time ------> D: dir, 1 dir, 1 dir, 1 dir, 1 ... dir, 1 dir, 1 B: unreach C, 4 C, 5 C, 6 C, 11 C, 12



C: B, 3 A, 4 A, 5 A, 6 A, 11 D, 11 A: B, 3 C, 4 C, 5 C, 6 C, 11 C, 12 dir = directly connected unreach = unreachable Here's the problem: B is able to get rid of its failed route using a timeout mechanism. But vestiges of that route persist in the system for a long time. Initially, A and C still think they can get to D via B. So, they keep sending updates listing metrics of 3. In the next iteration, B will then claim that it can get to D via either A or C. Of course, it can't. The routes being claimed by A and C are now gone, but they have no way of knowing that yet. And even when they discover that their routes via B have gone away, they each think there is a route available via the other. Eventually the system converges, as all the mathematics claims it must. But it can take some time to do so. The worst case is when a network becomes completely inaccessible from some part of the system. In that case, the metrics may increase slowly in a pattern like the one above until they finally reach infinity. For this reason, the problem is called "counting to infinity". You should now see why "infinity" is chosen to be as small as possible. If a network becomes completely inaccessible, we want counting to infinity to be stopped as soon as possible. Infinity must be large enough that no real route is that big. But it shouldn't be any bigger than required. Thus the choice of infinity is a tradeoff between network size and speed of convergence in case counting to infinity happens. The designers of RIP believed that the protocol was unlikely to be practical for networks with a diameter larger than 15. There are several things that can be done to prevent problems like this. The ones used by RIP are called "split horizon with poisoned reverse", and "triggered updates". 2.2.1. Split horizon Note that some of the problem above is caused by the fact that A and C are engaged in a pattern of mutual deception. Each claims to be able to get to D via the other. This can be prevented by being a bit more careful about where information is sent. In particular, it is never useful to claim reachability for a destination network to the neighbor(s) from which the route was learned. "Split horizon" is a scheme for avoiding problems caused by including routes in updates



sent to the gateway from which they were learned. The "simple split horizon" scheme omits routes learned from one neighbor in updates sent to that neighbor. "Split horizon with poisoned reverse" includes such routes in updates, but sets their metrics to infinity. If A thinks it can get to D via C, its messages to C should indicate that D is unreachable. If the route through C is real, then C either has a direct connection to D, or a connection through some other gateway. C's route can't possibly go back to A, since that forms a loop. By telling C that D is unreachable, A simply guards against the possibility that C might get confused and believe that there is a route through A. This is obvious for a point to point line. But consider the possibility that A and C are connected by a broadcast network such as an Ethernet, and there are other gateways on that network. If A has a route through C, it should indicate that D is unreachable when talking to any other gateway on that network. The other gateways on the network can get to C themselves. They would never need to get to C via A. If A's best route is really through C, no other gateway on that network needs to know that A can reach D. This is fortunate, because it means that the same update message that is used for C can be used for all other gateways on the same network. Thus, update messages can be sent by broadcast. In general, split horizon with poisoned reverse is safer than simple split horizon. If two gateways have routes pointing at each other, advertising reverse routes with a metric of 16 will break the loop immediately. If the reverse routes are simply not advertised, the erroneous routes will have to be eliminated by waiting for a timeout. However, poisoned reverse does have a disadvantage: it increases the size of the routing messages. Consider the case of a campus backbone connecting a number of different buildings. In each building, there is a gateway connecting the backbone to a local network. Consider what routing updates those gateways should broadcast on the backbone network. All that the rest of the network really needs to know about each gateway is what local networks it is connected to. Using simple split horizon, only those routes would appear in update messages sent by the gateway to the backbone network. If split horizon with poisoned reverse is used, the gateway must mention all routes that it learns from the backbone, with metrics of 16. If the system is large, this can result in a large update message, almost all of whose entries indicate unreachable networks. In a static sense, advertising reverse routes with a metric of 16 provides no additional information. If there are many gateways on one broadcast network, these extra entries can use significant



bandwidth. The reason they are there is to improve dynamic behavior. When topology changes, mentioning routes that should not go through the gateway as well as those that should can speed up convergence. However, in some situations, network managers may prefer to accept somewhat slower convergence in order to minimize routing overhead. Thus implementors may at their option implement simple split horizon rather than split horizon with poisoned reverse, or they may provide a configuration option that allows the network manager to choose which behavior to use. It is also permissible to implement hybrid schemes that advertise some reverse routes with a metric of 16 and omit others. An example of such a scheme would be to use a metric of 16 for reverse routes for a certain period of time after routing changes involving them, and thereafter omitting them from updates. 2.2.2. Triggered updates Split horizon with poisoned reverse will prevent any routing loops that involve only two gateways. However, it is still possible to end up with patterns in which three gateways are engaged in mutual deception. For example, A may believe it has a route through B, B through C, and C through A. Split horizon cannot stop such a loop. This loop will only be resolved when the metric reaches infinity and the network involved is then declared unreachable. Triggered updates are an attempt to speed up this convergence. To get triggered updates, we simply add a rule that whenever a gateway changes the metric for a route, it is required to send update messages almost immediately, even if it is not yet time for one of the regular update message. (The timing details will differ from protocol to protocol. Some distance vector protocols, including RIP, specify a small time delay, in order to avoid having triggered updates generate excessive network traffic.) Note how this combines with the rules for computing new metrics. Suppose a gateway's route to destination N goes through gateway G. If an update arrives from G itself, the receiving gateway is required to believe the new information, whether the new metric is higher or lower than the old one. If the result is a change in metric, then the receiving gateway will send triggered updates to all the hosts and gateways directly connected to it. They in turn may each send updates to their neighbors. The result is a cascade of triggered updates. It is easy to show which gateways and hosts are involved in the cascade. Suppose a gateway G times out a route to destination N. G will send triggered updates to all of its neighbors. However, the only neighbors who will believe the new information are those whose routes for N go through G. The other gateways and hosts will see this as information about a new route that is worse than the one they are already using, and ignore it.



The neighbors whose routes go through G will update their metrics and send triggered updates to all of their neighbors. Again, only those neighbors whose routes go through them will pay attention. Thus, the triggered updates will propagate backwards along all paths leading to gateway G, updating the metrics to infinity. This propagation will stop as soon as it reaches a portion of the network whose route to destination N takes some other path. If the system could be made to sit still while the cascade of triggered updates happens, it would be possible to prove that counting to infinity will never happen. Bad routes would always be removed immediately, and so no routing loops could form. Unfortunately, things are not so nice. While the triggered updates are being sent, regular updates may be happening at the same time. Gateways that haven't received the triggered update yet will still be sending out information based on the route that no longer exists. It is possible that after the triggered update has gone through a gateway, it might receive a normal update from one of these gateways that hasn't yet gotten the word. This could reestablish an orphaned remnant of the faulty route. If triggered updates happen quickly enough, this is very unlikely. However, counting to infinity is still possible. 3. Specifications for the protocol RIP is intended to allow hosts and gateways to exchange information for computing routes through an IP-based network. RIP is a distance vector protocol. Thus, it has the general features described in section 2. RIP may be implemented by both hosts and gateways. As in most IP documentation, the term "host" will be used here to cover either. RIP is used to convey information about routes to "destinations", which may be individual hosts, networks, or a special destination used to convey a default route. Any host that uses RIP is assumed to have interfaces to one or more networks. These are referred to as its "directly-connected networks". The protocol relies on access to certain information about each of these networks. The most important is its metric or "cost". The metric of a network is an integer between 1 and 15 inclusive. It is set in some manner not specified in this protocol. Most existing implementations always use a metric of 1. New implementations should allow the system administrator to set the cost of each network. In addition to the cost, each network will have an IP network number and a subnet mask associated with it. These are to be set by the system administrator in a manner not specified in this



protocol. Note that the rules specified in section 3.2 assume that there is a single subnet mask applying to each IP network, and that only the subnet masks for directly-connected networks are known. There may be systems that use different subnet masks for different subnets within a single network. There may also be instances where it is desirable for a system to know the subnets masks of distant networks. However, such situations will require modifications of the rules which govern the spread of subnet information. Such modifications raise issues of interoperability, and thus must be viewed as modifying the protocol. Each host that implements RIP is assumed to have a routing table. This table has one entry for every destination that is reachable through the system described by RIP. Each entry contains at least the following information: - The IP address of the destination. - A metric, which represents the total cost of getting a datagram from the host to that destination. This metric is the sum of the costs associated with the networks that would be traversed in getting to the destination. - The IP address of the next gateway along the path to the destination. If the destination is on one of the directly-connected networks, this item is not needed. - A flag to indicate that information about the route has changed recently. This will be referred to as the "route change flag." - Various timers associated with the route. See section 3.3 for more details on them. The entries for the directly-connected networks are set up by the host, using information gathered by means not specified in this protocol. The metric for a directly-connected network is set to the cost of that network. In existing RIP implementations, 1 is always used for the cost. In that case, the RIP metric reduces to a simple hop-count. More complex metrics may be used when it is desirable to show preference for some networks over others, for example because of differences in bandwidth or reliability. Implementors may also choose to allow the system administrator to



enter additional routes. These would most likely be routes to hosts or networks outside the scope of the routing system. Entries for destinations other these initial ones are added and updated by the algorithms described in the following sections. In order for the protocol to provide complete information on routing, every gateway in the system must participate in it. Hosts that are not gateways need not participate, but many implementations make provisions for them to listen to routing information in order to allow them to maintain their routing tables. 3.1. Message formats RIP is a UDP-based protocol. Each host that uses RIP has a routing process that sends and receives datagrams on UDP port number 520. All communications directed at another host's RIP processor are sent to port 520. All routing update messages are sent from port 520. Unsolicited routing update messages have both the source and destination port equal to 520. Those sent in response to a request are sent to the port from which the request came. Specific queries and debugging requests may be sent from ports other than 520, but they are directed to port 520 on the target machine. There are provisions in the protocol to allow "silent" RIP processes. A silent process is one that normally does not send out any messages. However, it listens to messages sent by others. A silent RIP might be used by hosts that do not act as gateways, but wish to listen to routing updates in order to monitor local gateways and to keep their internal routing tables up to date. (See [5] for a discussion of various ways that hosts can keep track of network topology.) A gateway that has lost contact with all but one of its networks might choose to become silent, since it is effectively no longer a gateway. However, this should not be done if there is any chance that neighboring gateways might depend upon its messages to detect that the failed network has come back into operation. (The 4BSD routed program uses routing packets to monitor the operation of point-to- point links.) The packet format is shown in Figure 1. Format of datagrams containing network information. Field sizes are given in octets. Unless otherwise specified, fields contain binary integers, in normal Internet order with the most-significant octet first. Each tick mark represents one bit.



0 1 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | command (1) | version (1) | must be zero (2) | +---------------+---------------+-------------------------------+ | address family identifier (2) | must be zero (2) | +-------------------------------+-------------------------------+ | IP address (4) | +---------------------------------------------------------------+ | must be zero (4) | +---------------------------------------------------------------+ | must be zero (4) | +---------------------------------------------------------------+ | metric (4) | +---------------------------------------------------------------+ . . . The portion of the datagram from address family identifier through metric may appear up to 25 times. IP address is the usual 4-octet Internet address, in network order. Figure 1. Packet format Every datagram contains a command, a version number, and possible arguments. This document describes version 1 of the protocol. Details of processing the version number are described in section 3.4. The command field is used to specify the purpose of this datagram. Here is a summary of the commands implemented in version 1: 1 - request A request for the responding system to send all or part of its routing table. 2 - response A message containing all or part of the sender's routing table. This message may be sent in response to a request or poll, or it may be an update message generated by the sender. 3 - traceon Obsolete. Messages containing this command are to be ignored. 4 - traceoff Obsolete. Messages containing this command are to be ignored.



5 - reserved This value is used by Sun Microsystems for its own purposes. If new commands are added in any succeeding version, they should begin with 6. Messages containing this command may safely be ignored by implementations that do not choose to respond to it. For request and response, the rest of the datagram contains a list of destinations, with information about each. Each entry in this list contains a destination network or host, and the metric for it. The packet format is intended to allow RIP to carry routing information for several different protocols. Thus, each entry has an address family identifier to indicate what type of address is specified in that entry. This document only describes routing for Internet networks. The address family identifier for IP is 2. None of the RIP implementations available to the author implement any other type of address. However, to allow for future development, implementations are required to skip entries that specify address families that are not supported by the implementation. (The size of these entries will be the same as the size of an entry specifying an IP address.) Processing of the message continues normally after any unsupported entries are skipped. The IP address is the usual Internet address, stored as 4 octets in network order. The metric field must contain a value between 1 and 15 inclusive, specifying the current metric for the destination, or the value 16, which indicates that the destination is not reachable. Each route sent by a gateway supercedes any previous route to the same destination from the same gateway. The maximum datagram size is 512 octets. This includes only the portions of the datagram described above. It does not count the IP or UDP headers. The commands that involve network information allow information to be split across several datagrams. No special provisions are needed for continuations, since correct results will occur if the datagrams are processed individually. 3.2. Addressing considerations As indicated in section 2, distance vector routing can be used to describe routes to individual hosts or to networks. The RIP protocol allows either of these possibilities. The destinations appearing in request and response messages can be networks, hosts, or a special code used to indicate a default address. In general, the kinds of routes actually used will depend upon the routing strategy used for the particular network. Many networks are set up so that routing



information for individual hosts is not needed. If every host on a given network or subnet is accessible through the same gateways, then there is no reason to mention individual hosts in the routing tables. However, networks that include point to point lines sometimes require gateways to keep track of routes to certain hosts. Whether this feature is required depends upon the addressing and routing approach used in the system. Thus, some implementations may choose not to support host routes. If host routes are not supported, they are to be dropped when they are received in response messages. (See section 3.4.2.) The RIP packet formats do not distinguish among various types of address. Fields that are labeled "address" can contain any of the following: host address subnet number network number 0, indicating a default route Entities that use RIP are assumed to use the most specific information available when routing a datagram. That is, when routing a datagram, its destination address must first be checked against the list of host addresses. Then it must be checked to see whether it matches any known subnet or network number. Finally, if none of these match, the default route is used. When a host evaluates information that it receives via RIP, its interpretation of an address depends upon whether it knows the subnet mask that applies to the net. If so, then it is possible to determine the meaning of the address. For example, consider net 128.6. It has a subnet mask of 255.255.255.0. Thus 128.6.0.0 is a network number, 128.6.4.0 is a subnet number, and 128.6.4.1 is a host address. However, if the host does not know the subnet mask, evaluation of an address may be ambiguous. If there is a non-zero host part, there is no clear way to determine whether the address represents a subnet number or a host address. As a subnet number would be useless without the subnet mask, addresses are assumed to represent hosts in this situation. In order to avoid this sort of ambiguity, hosts must not send subnet routes to hosts that cannot be expected to know the appropriate subnet mask. Normally hosts only know the subnet masks for directly-connected networks. Therefore, unless special provisions have been made, routes to a subnet must not be sent outside the network of which the subnet is a part. This filtering is carried out by the gateways at the "border" of the



subnetted network. These are gateways that connect that network with some other network. Within the subnetted network, each subnet is treated as an individual network. Routing entries for each subnet are circulated by RIP. However, border gateways send only a single entry for the network as a whole to hosts in other networks. This means that a border gateway will send different information to different neighbors. For neighbors connected to the subnetted network, it generates a list of all subnets to which it is directly connected, using the subnet number. For neighbors connected to other networks, it makes a single entry for the network as a whole, showing the metric associated with that network. (This metric would normally be the smallest metric for the subnets to which the gateway is attached.) Similarly, border gateways must not mention host routes for hosts within one of the directly-connected networks in messages to other networks. Those routes will be subsumed by the single entry for the network as a whole. We do not specify what to do with host routes for "distant" hosts (i.e., hosts not part of one of the directly- connected networks). Generally, these routes indicate some host that is reachable via a route that does not support other hosts on the network of which the host is a part. The special address 0.0.0.0 is used to describe a default route. A default route is used when it is not convenient to list every possible network in the RIP updates, and when one or more closely- connected gateways in the system are prepared to handle traffic to the networks that are not listed explicitly. These gateways should create RIP entries for the address 0.0.0.0, just as if it were a network to which they are connected. The decision as to how gateways create entries for 0.0.0.0 is left to the implementor. Most commonly, the system administrator will be provided with a way to specify which gateways should create entries for 0.0.0.0. However, other mechanisms are possible. For example, an implementor might decide that any gateway that speaks EGP should be declared to be a default gateway. It may be useful to allow the network administrator to choose the metric to be used in these entries. If there is more than one default gateway, this will make it possible to express a preference for one over the other. The entries for 0.0.0.0 are handled by RIP in exactly the same manner as if there were an actual network with this address. However, the entry is used to route any datagram whose destination address does not match any other network in the table. Implementations are not required to support this convention. However, it is strongly recommended. Implementations that do not support 0.0.0.0 must ignore entries with this address.



In such cases, they must not pass the entry on in their own RIP updates. System administrators should take care to make sure that routes to 0.0.0.0 do not propagate further than is intended. Generally, each autonomous system has its own preferred default gateway. Thus, routes involving 0.0.0.0 should generally not leave the boundary of an autonomous system. The mechanisms for enforcing this are not specified in this document. 3.3. Timers This section describes all events that are triggered by timers. Every 30 seconds, the output process is instructed to generate a complete response to every neighboring gateway. When there are many gateways on a single network, there is a tendency for them to synchronize with each other such that they all issue updates at the same time. This can happen whenever the 30 second timer is affected by the processing load on the system. It is undesirable for the update messages to become synchronized, since it can lead to unnecessary collisions on broadcast networks. Thus, implementations are required to take one of two precautions. - The 30-second updates are triggered by a clock whose rate is not affected by system load or the time required to service the previous update timer. - The 30-second timer is offset by addition of a small random time each time it is set. There are two timers associated with each route, a "timeout" and a "garbage-collection time". Upon expiration of the timeout, the route is no longer valid. However, it is retained in the table for a short time, so that neighbors can be notified that the route has been dropped. Upon expiration of the garbage-collection timer, the route is finally removed from the tables. The timeout is initialized when a route is established, and any time an update message is received for the route. If 180 seconds elapse from the last time the timeout was initialized, the route is considered to have expired, and the deletion process which we are about to describe is started for it. Deletions can occur for one of two reasons: (1) the timeout expires, or (2) the metric is set to 16 because of an update received from the current gateway. (See section 3.4.2 for a discussion processing



updates from other gateways.) In either case, the following events happen: - The garbage-collection timer is set for 120 seconds. - The metric for the route is set to 16 (infinity). This causes the route to be removed from service. - A flag is set noting that this entry has been changed, and the output process is signalled to trigger a response. Until the garbage-collection timer expires, the route is included in all updates sent by this host, with a metric of 16 (infinity). When the garbage-collection timer expires, the route is deleted from the tables. Should a new route to this network be established while the garbage- collection timer is running, the new route will replace the one that is about to be deleted. In this case the garbage-collection timer must be cleared. See section 3.5 for a discussion of a delay that is required in carrying out triggered updates. Although implementation of that delay will require a timer, it is more natural to discuss it in section 3.5 than here. 3.4. Input processing This section will describe the handling of datagrams received on UDP port 520. Before processing the datagrams in detail, certain general format checks must be made. These depend upon the version number field in the datagram, as follows: 0 Datagrams whose version number is zero are to be ignored. These are from a previous version of the protocol, whose packet format was machine-specific. 1 Datagrams whose version number is one are to be processed as described in the rest of this specification. All fields that are described above as "must be zero" are to be checked. If any such field contains a non-zero value, the entire message is to be ignored. >1 Datagrams whose version number are greater than one are to be processed as described in the rest of this specification. All fields that are described above as



"must be zero" are to be ignored. Future versions of the protocol may put data into these fields. Version 1 implementations are to ignore this extra data and process only the fields specified in this document. After checking the version number and doing any other preliminary checks, processing will depend upon the value in the command field. 3.4.1. Request Request is used to ask for a response containing all or part of the host's routing table. [Note that the term host is used for either host or gateway, in most cases it would be unusual for a non-gateway host to send RIP messages.] Normally, requests are sent as broadcasts, from a UDP source port of 520. In this case, silent processes do not respond to the request. Silent processes are by definition processes for which we normally do not want to see routing information. However, there may be situations involving gateway monitoring where it is desired to look at the routing table even for a silent process. In this case, the request should be sent from a UDP port number other than 520. If a request comes from port 520, silent processes do not respond. If the request comes from any other port, processes must respond even if they are silent. The request is processed entry by entry. If there are no entries, no response is given. There is one special case. If there is exactly one entry in the request, with an address family identifier of 0 (meaning unspecified), and a metric of infinity (i.e., 16 for current implementations), this is a request to send the entire routing table. In that case, a call is made to the output process to send the routing table to the requesting port. Except for this special case, processing is quite simple. Go down the list of entries in the request one by one. For each entry, look up the destination in the host's routing database. If there is a route, put that route's metric in the metric field in the datagram. If there isn't a route to the specified destination, put infinity (i.e., 16) in the metric field in the datagram. Once all the entries have been filled in, set the command to response and send the datagram back to the port from which it came. Note that there is a difference in handling depending upon whether the request is for a specified set of destinations, or for a complete routing table. If the request is for a complete host table, normal output processing is done. This includes split horizon (see section 2.2.1) and subnet hiding (section 3.2), so that certain entries from



the routing table will not be shown. If the request is for specific entries, they are looked up in the host table and the information is returned. No split horizon processing is done, and subnets are returned if requested. We anticipate that these requests are likely to be used for different purposes. When a host first comes up, it broadcasts requests on every connected network asking for a complete routing table. In general, we assume that complete routing tables are likely to be used to update another host's routing table. For this reason, split horizon and all other filtering must be used. Requests for specific networks are made only by diagnostic software, and are not used for routing. In this case, the requester would want to know the exact contents of the routing database, and would not want any information hidden. 3.4.2. Response Responses can be received for several different reasons: response to a specific query regular updates triggered updates triggered by a metric change Processing is the same no matter how responses were generated. Because processing of a response may update the host's routing table, the response must be checked carefully for validity. The response must be ignored if it is not from port 520. The IP source address should be checked to see whether the datagram is from a valid neighbor. The source of the datagram must be on a directly-connected network. It is also worth checking to see whether the response is from one of the host's own addresses. Interfaces on broadcast networks may receive copies of their own broadcasts immediately. If a host processes its own output as new input, confusion is likely, and such datagrams must be ignored (except as discussed in the next paragraph). Before actually processing a response, it may be useful to use its presence as input to a process for keeping track of interface status. As mentioned above, we time out a route when we haven't heard from its gateway for a certain amount of time. This works fine for routes that come from another gateway. It is also desirable to know when one of our own directly-connected networks has failed. This document does not specify any particular method for doing this, as such methods depend upon the characteristics of the network and the hardware interface to it. However, such methods often involve



listening for datagrams arriving on the interface. Arriving datagrams can be used as an indication that the interface is working. However, some caution must be used, as it is possible for interfaces to fail in such a way that input datagrams are received, but output datagrams are never sent successfully. Now that the datagram as a whole has been validated, process the entries in it one by one. Again, start by doing validation. If the metric is greater than infinity, ignore the entry. (This should be impossible, if the other host is working correctly. Incorrect metrics and other format errors should probably cause alerts or be logged.) Then look at the destination address. Check the address family identifier. If it is not a value which is expected (e.g., 2 for Internet addresses), ignore the entry. Now check the address itself for various kinds of inappropriate addresses. Ignore the entry if the address is class D or E, if it is on net 0 (except for 0.0.0.0, if we accept default routes) or if it is on net 127 (the loopback network). Also, test for a broadcast address, i.e., anything whose host part is all ones on a network that supports broadcast, and ignore any such entry. If the implementor has chosen not to support host routes (see section 3.2), check to see whether the host portion of the address is non-zero; if so, ignore the entry. Recall that the address field contains a number of unused octets. If the version number of the datagram is 1, they must also be checked. If any of them is nonzero, the entry is to be ignored. (Many of these cases indicate that the host from which the message came is not working correctly. Thus some form of error logging or alert should be triggered.) Update the metric by adding the cost of the network on which the message arrived. If the result is greater than 16, use 16. That is, metric = MIN (metric + cost, 16) Now look up the address to see whether this is already a route for it. In general, if not, we want to add one. However, there are various exceptions. If the metric is infinite, don't add an entry. (We would update an existing one, but we don't add new entries with infinite metric.) We want to avoid adding routes to hosts if the host is part of a net or subnet for which we have at least as good a route. If neither of these exceptions applies, add a new entry to the routing database. This includes the following actions: - Set the destination and metric to those from the datagram.



- Set the gateway to be the host from which the datagram came. - Initialize the timeout for the route. If the garbage- collection timer is running for this route, stop it. (See section 3.3 for a discussion of the timers.) - Set the route change flag, and signal the output process to trigger an update (see 3.5). If there is an existing route, first compare gateways. If this datagram is from the same gateway as the existing route, reinitialize the timeout. Next compare metrics. If the datagram is from the same gateway as the existing route and the new metric is different than the old one, or if the new metric is lower than the old one, do the following actions: - adopt the route from the datagram. That is, put the new metric in, and set the gateway to be the host from which the datagram came. - Initialize the timeout for the route. - Set the route change flag, and signal the output process to trigger an update (see 3.5). - If the new metric is 16 (infinity), the deletion process is started. If the new metric is 16 (infinity), this starts the process for deleting the route. The route is no longer used for routing packets, and the deletion timer is started (see section 3.3). Note that a deletion is started only when the metric is first set to 16. If the metric was already 16, then a new deletion is not started. (Starting a deletion sets a timer. The concern is that we do not want to reset the timer every 30 seconds, as new messages arrive with an infinite metric.) If the new metric is the same as the old one, it is simplest to do nothing further (beyond reinitializing the timeout, as specified above). However, the 4BSD routed uses an additional heuristic here. Normally, it is senseless to change to a route with the same metric as the existing route but a different gateway. If the existing route is showing signs of timing out, though, it may be better to switch to



an equally-good alternative route immediately, rather than waiting for the timeout to happen. (See section 3.3 for a discussion of timeouts.) Therefore, if the new metric is the same as the old one, routed looks at the timeout for the existing route. If it is at least halfway to the expiration point, routed switches to the new route. That is, the gateway is changed to the source of the current message. This heuristic is optional. Any entry that fails these tests is ignored, as it is no better than the current route. 3.5. Output Processing This section describes the processing used to create response messages that contain all or part of the routing table. This processing may be triggered in any of the following ways: - by input processing when a request is seen. In this case, the resulting message is sent to only one destination. - by the regular routing update. Every 30 seconds, a response containing the whole routing table is sent to every neighboring gateway. (See section 3.3.) - by triggered updates. Whenever the metric for a route is changed, an update is triggered. (The update may be delayed; see below.) Before describing the way a message is generated for each directly- connected network, we will comment on how the destinations are chosen for the latter two cases. Normally, when a response is to be sent to all destinations (that is, either the regular update or a triggered update is being prepared), a response is sent to the host at the opposite end of each connected point-to-point link, and a response is broadcast on all connected networks that support broadcasting. Thus, one response is prepared for each directly-connected network and sent to the corresponding (destination or broadcast) address. In most cases, this reaches all neighboring gateways. However, there are some cases where this may not be good enough. This may involve a network that does not support broadcast (e.g., the ARPANET), or a situation involving dumb gateways. In such cases, it may be necessary to specify an actual list of neighboring hosts and gateways, and send a datagram to each one explicitly. It is left to the implementor to determine whether such a mechanism is needed, and to define how the list is specified.



Triggered updates require special handling for two reasons. First, experience shows that triggered updates can cause excessive loads on networks with limited capacity or with many gateways on them. Thus the protocol requires that implementors include provisions to limit the frequency of triggered updates. After a triggered update is sent, a timer should be set for a random time between 1 and 5 seconds. If other changes that would trigger updates occur before the timer expires, a single update is triggered when the timer expires, and the timer is then set to another random value between 1 and 5 seconds. Triggered updates may be suppressed if a regular update is due by the time the triggered update would be sent. Second, triggered updates do not need to include the entire routing table. In principle, only those routes that have changed need to be included. Thus messages generated as part of a triggered update must include at least those routes that have their route change flag set. They may include additional routes, or all routes, at the discretion of the implementor; however, when full routing updates require multiple packets, sending all routes is strongly discouraged. When a triggered update is processed, messages should be generated for every directly-connected network. Split horizon processing is done when generating triggered updates as well as normal updates (see below). If, after split horizon processing, a changed route will appear identical on a network as it did previously, the route need not be sent; if, as a result, no routes need be sent, the update may be omitted on that network. (If a route had only a metric change, or uses a new gateway that is on the same network as the old gateway, the route will be sent to the network of the old gateway with a metric of infinity both before and after the change.) Once all of the triggered updates have been generated, the route change flags should be cleared. If input processing is allowed while output is being generated, appropriate interlocking must be done. The route change flags should not be changed as a result of processing input while a triggered update message is being generated. The only difference between a triggered update and other update messages is the possible omission of routes that have not changed. The rest of the mechanisms about to be described must all apply to triggered updates. Here is how a response datagram is generated for a particular directly-connected network:



The IP source address must be the sending host's address on that network. This is important because the source address is put into routing tables in other hosts. If an incorrect source address is used, other hosts may be unable to route datagrams. Sometimes gateways are set up with multiple IP addresses on a single physical interface. Normally, this means that several logical IP networks are being carried over one physical medium. In such cases, a separate update message must be sent for each address, with that address as the IP source address. Set the version number to the current version of RIP. (The version described in this document is 1.) Set the command to response. Set the bytes labeled "must be zero" to zero. Now start filling in entries. To fill in the entries, go down all the routes in the internal routing table. Recall that the maximum datagram size is 512 bytes. When there is no more space in the datagram, send the current message and start a new one. If a triggered update is being generated, only entries whose route change flags are set need be included. See the description in Section 3.2 for a discussion of problems raised by subnet and host routes. Routes to subnets will be meaningless outside the network, and must be omitted if the destination is not on the same subnetted network; they should be replaced with a single route to the network of which the subnets are a part. Similarly, routes to hosts must be eliminated if they are subsumed by a network route, as described in the discussion in Section 3.2. If the route passes these tests, then the destination and metric are put into the entry in the output datagram. Routes must be included in the datagram even if their metrics are infinite. If the gateway for the route is on the network for which the datagram is being prepared, the metric in the entry is set to 16, or the entire entry is omitted. Omitting the entry is simple split horizon. Including an entry with metric 16 is split horizon with poisoned reverse. See Section 2.2 for a more complete discussion of these alternatives. 3.6. Compatibility The protocol described in this document is intended to interoperate with routed and other existing implementations of RIP. However, a different viewpoint is adopted about when to increment the metric than was used in most previous implementations. Using the previous



perspective, the internal routing table has a metric of 0 for all directly-connected networks. The cost (which is always 1) is added to the metric when the route is sent in an update message. By contrast, in this document directly-connected networks appear in the internal routing table with metrics equal to their costs; the metrics are not necessarily 1. In this document, the cost is added to the metrics when routes are received in update messages. Metrics from the routing table are sent in update messages without change (unless modified by split horizon). These two viewpoints result in identical update messages being sent. Metrics in the routing table differ by a constant one in the two descriptions. Thus, there is no difference in effect. The change was made because the new description makes it easier to handle situations where different metrics are used on directly-attached networks. Implementations that only support network costs of one need not change to match the new style of presentation. However, they must follow the description given in this document in all other ways. 4. Control functions This section describes administrative controls. These are not part of the protocol per se. However, experience with existing networks suggests that they are important. Because they are not a necessary part of the protocol, they are considered optional. However, we strongly recommend that at least some of them be included in every implementation. These controls are intended primarily to allow RIP to be connected to networks whose routing may be unstable or subject to errors. Here are some examples: It is sometimes desirable to limit the hosts and gateways from which information will be accepted. On occasion, hosts have been misconfigured in such a way that they begin sending inappropriate information. A number of sites limit the set of networks that they allow in update messages. Organization A may have a connection to organization B that they use for direct communication. For security or performance reasons A may not be willing to give other organizations access to that connection. In such cases, A should not include B's networks in updates that A sends to third parties.



Here are some typical controls. Note, however, that the RIP protocol does not require these or any other controls. - a neighbor list - the network administrator should be able to define a list of neighbors for each host. A host would accept response messages only from hosts on its list of neighbors. - allowing or disallowing specific destinations - the network administrator should be able to specify a list of destination addresses to allow or disallow. The list would be associated with a particular interface in the incoming or outgoing direction. Only allowed networks would be mentioned in response messages going out or processed in response messages coming in. If a list of allowed addresses is specified, all other addresses are disallowed. If a list of disallowed addresses is specified, all other addresses are allowed.

Chapter 7 OSPF Open Shortest Path First


A set of routing protocols that are used within an autonomous system are referred to as interior gateway protocols (IGP).

In contrast an exterior gateway protocol is for determining network reachability between autonomous systems (AS) and make use of IGPs to resolve route within an AS.

The interior gateway protocols can be divided into two categories: 1) Distance-vector routing protocol and 2) Link-state routing protocol.

Types of Interior gateway protocols

Distance-vector routing protocol

They use the Bellman-Ford algorithm to calculate paths. In Distance-vector routing protocols each router does not posses information about the full network topology. It advertises its distances from other routers and receives similar advertisements from other routers. Using these routing advertisements each router populates its routing table. In the next advertisement cycle, a router advertises updated information from its routing table. This process continues until the routing tables of each router converge to stable values.

This set of protocols has the disadvantage of slow convergence, however, they are usually simple to handle and are well suited for use with small networks. Some examples of distance-vector routing protocols are:

1. Routing Information Protocol (RIP) 2. Interior Gateway Routing Protocol (IGRP)

[edit] Link-state routing protocol

In the case of Link-state routing protocols, each node possesses information about the complete network topology. Each node then independently calculates the best next hop from it for every possible destination in the network using local information of the topology. The collection of best next hops forms the routing table for the node.

This contrasts with distance-vector routing protocols, which work by having each node share its routing table with its neighbors. In a link-state protocol, the only information passed between the nodes is information used to construct the connectivity maps.

Example of Link-state routing protocols are:

1. Open Shortest Path First (OSPF) 2. Intermediate system to intermediate system (IS-IS)



Open Shortest Path First

The Open Shortest Path First (OSPF) protocol is a link-state, hierarchical interior gateway protocol (IGP) for network routing. Dijkstra's algorithm is used to calculate the shortest path tree. It uses cost as its routing metric. A link state database is constructed of the network topology which is identical on all routers in the area.

OSPF is perhaps the most widely used IGP in large networks. It can operate securely, using MD5 to authenticate peers before forming adjacencies, and before accepting link-state advertisements (LSA). A natural successor to the Routing Information Protocol (RIP), it was VLSM-capable or classless from its inception. A newer version of OSPF (OSPFv3) now supports IPv6 as well. Multicast extensions to OSPF, the Multicast Open Shortest Path First (MOSPF) protocols, have been defined, but these are not widely used at present. OSPF can "tag" routes, and propagate the tags along with the routes.

An OSPF network can be broken up into smaller networks. A special area called the backbone area forms the core of the network, and other areas are connected to it. Inter-area routing goes via the backbone. All areas must connect to the backbone; if no direct connection is possible, a virtual link may be established.

Routers in the same broadcast domain or at each end of a point-to-point telecommunications link form adjacencies when they have detected each other. This detection occurs when a router "sees" itself in a hello packet. This is called a two way state and is the most basic relationship. The routers elect a designated router (DR) and a backup designated router (BDR) which act as a hub to reduce traffic between routers. OSPF uses both unicast and multicast to send "hello packets" and link state updates. Multicast addresses 224.0.0.5 and 224.0.0.6 are reserved for OSPF. In contrast to the Routing Information Protocol (RIP) or the Border Gateway Protocol (BGP), OSPF does not use TCP or UDP but uses IP directly, via IP protocol 89.

Background

Open Shortest Path First (OSPF) is a routing protocol developed for Internet Protocol (IP) networks by the Interior Gateway Protocol (IGP) working group of the Internet Engineering Task Force (IETF). The working group was formed in 1988 to design an IGP based on the Shortest Path First (SPF) algorithm for use in the Internet. Similar to the Interior Gateway Routing Protocol (IGRP), OSPF was created because in the mid-1980s, the Routing Information Protocol (RIP) was increasingly incapable of serving large, heterogeneous internetworks. This chapter examines the OSPF routing environment, underlying routing algorithm, and general protocol components.

OSPF was derived from several research efforts, including Bolt, Beranek, and Newman's (BBN's) SPF algorithm developed in 1978 for the ARPANET (a landmark packet-switching network developed in the early 1970s by BBN), Dr. Radia Perlman's research



on fault-tolerant broadcasting of routing information (1988), BBN's work on area routing (1986), and an early version of OSI's Intermediate System-to-Intermediate System (IS-IS) routing protocol.

OSPF has two primary characteristics. The first is that the protocol is open, which means that its specification is in the public domain. The OSPF specification is published as Request For Comments (RFC) 1247. The second principal characteristic is that OSPF is based on the SPF algorithm, which sometimes is referred to as the Dijkstra algorithm, named for the person credited with its creation.

OSPF is a link-state routing protocol that calls for the sending of link-state advertisements (LSAs) to all other routers within the same hierarchical area. Information on attached interfaces, metrics used, and other variables is included in OSPF LSAs. As OSPF routers accumulate link-state information, they use the SPF algorithm to calculate the shortest path to each node.

As a link-state routing protocol, OSPF contrasts with RIP and IGRP, which are distance-vector routing protocols. Routers running the distance-vector algorithm send all or a portion of their routing tables in routing-update messages to their neighbors.

Routing Hierarchy

Unlike RIP, OSPF can operate within a hierarchy. The largest entity within the hierarchy is the autonomous system (AS), which is a collection of networks under a common administration that share a common routing strategy. OSPF is an intra-AS (interior gateway) routing protocol, although it is capable of receiving routes from and sending routes to other ASs.

An AS can be divided into a number of areas, which are groups of contiguous networks and attached hosts. Routers with multiple interfaces can participate in multiple areas. These routers, which are called Area Border Routers, maintain separate topological databases for each area.

A topological database is essentially an overall picture of networks in relationship to routers. The topological database contains the collection of LSAs received from all routers in the same area. Because routers within the same area share the same information, they have identical topological databases.

The term domain sometimes is used to describe a portion of the network in which all routers have identical topological databases. Domain is frequently used interchangeably with AS.

An area's topology is invisible to entities outside the area. By keeping area topologies separate, OSPF passes less routing traffic than it would if the AS were not partitioned.



Area partitioning creates two different types of OSPF routing, depending on whether the source and the destination are in the same or different areas. Intra-area routing occurs when the source and destination are in the same area; interarea routing occurs when they are in different areas.

An OSPF backbone is responsible for distributing routing information between areas. It consists of all Area Border Routers, networks not wholly contained in any area, and their attached routers. Figure 46-1 shows an example of an internetwork with several areas.

In the figure, routers 4, 5, 6, 10, 11, and 12 make up the backbone. If Host H1 in Area 3 wants to send a packet to Host H2 in Area 2, the packet is sent to Router 13, which forwards the packet to Router 12, which sends the packet to Router 11. Router 11 then forwards the packet along the backbone to Area Border Router 10, which sends the packet through two intra-area routers (Router 9 and Router 7) to be forwarded to Host H2.

The backbone itself is an OSPF area, so all backbone routers use the same procedures and algorithms to maintain routing information within the backbone that any area router would. The backbone topology is invisible to all intra-area routers, as are individual area topologies to the backbone.

Areas can be defined in such a way that the backbone is not contiguous. In this case, backbone connectivity must be restored through virtual links. Virtual links are configured between any backbone routers that share a link to a nonbackbone area and function as if they were direct links.

Figure 46-1 An OSPF AS Consists of Multiple Areas Linked by Routers



AS border routers running OSPF learn about exterior routes through exterior gateway protocols (EGPs), such as Exterior Gateway Protocol (EGP) or Border Gateway Protocol (BGP), or through configuration information. For more information about these protocols, see Chapter 39, "Border Gateway Protocol."

SPF Algorithm

The Shortest Path First (SPF) routing algorithm is the basis for OSPF operations. When an SPF router is powered up, it initializes its routing-protocol data structures and then waits for indications from lower-layer protocols that its interfaces are functional.

After a router is assured that its interfaces are functioning, it uses the OSPF Hello protocol to acquire neighbors, which are routers with interfaces to a common network. The router sends hello packets to its neighbors and receives their hello packets. In



addition to helping acquire neighbors, hello packets also act as keepalives to let routers know that other routers are still functional.

On multiaccess networks (networks supporting more than two routers), the Hello protocol elects a designated router and a backup designated router. Among other things, the designated router is responsible for generating LSAs for the entire multiaccess network. Designated routers allow a reduction in network traffic and in the size of the topological database.

When the link-state databases of two neighboring routers are synchronized, the routers are said to be adjacent. On multiaccess networks, the designated router determines which routers should become adjacent. Topological databases are synchronized between pairs of adjacent routers. Adjacencies control the distribution of routing-protocol packets, which are sent and received only on adjacencies.

Each router periodically sends an LSA to provide information on a router's adjacencies or to inform others when a router's state changes. By comparing established adjacencies to link states, failed routers can be detected quickly, and the network's topology can be altered appropriately. From the topological database generated from LSAs, each router calculates a shortest-path tree, with itself as root. The shortest-path tree, in turn, yields a routing table.

Packet Format

All OSPF packets begin with a 24-byte header, as illustrated in Figure 46-2.

Figure 46-2 OSPF Packets Consist of Nine Fields

The following descriptions summarize the header fields illustrated in Figure 46-2.

• Version number—Identifies the OSPF version used.

• Type—Identifies the OSPF packet type as one of the following:

– Hello—Establishes and maintains neighbor relationships.

– Database description—Describes the contents of the topological database. These messages are exchanged when an adjacency is initialized.



– Link-state request—Requests pieces of the topological database from neighbor routers. These messages are exchanged after a router discovers (by examining database-description packets) that parts of its topological database are outdated.

– Link-state update—Responds to a link-state request packet. These messages also are used for the regular dispersal of LSAs. Several LSAs can be included within a single link-state update packet.

– Link-state acknowledgment—Acknowledges link-state update packets.

• Packet length—Specifies the packet length, including the OSPF header, in bytes.

• Router ID—Identifies the source of the packet.

• Area ID—Identifies the area to which the packet belongs. All OSPF packets are associated with a single area.

• Checksum—Checks the entire packet contents for any damage suffered in transit.

• Authentication type—Contains the authentication type. All OSPF protocol exchanges are authenticated. The authentication type is configurable on per-area basis.

• Authentication—Contains authentication information.

• Data—Contains encapsulated upper-layer information.

Additional OSPF Features

Additional OSPF features include equal-cost, multipath routing, and routing based on upper-layer type-of-service (TOS) requests. TOS-based routing supports those upper-layer protocols that can specify particular types of service. An application, for example, might specify that certain data is urgent. If OSPF has high-priority links at its disposal, these can be used to transport the urgent datagram.

OSPF supports one or more metrics. If only one metric is used, it is considered to be arbitrary, and TOS is not supported. If more than one metric is used, TOS is optionally supported through the use of a separate metric (and, therefore, a separate routing table) for each of the eight combinations created by the three IP TOS bits (the delay, throughput, and reliability bits). For example, if the IP TOS bits specify low delay, low throughput, and high reliability, OSPF calculates routes to all destinations based on this TOS designation.

IP subnet masks are included with each advertised destination, enabling variable-length subnet masks. With variable-length subnet masks, an IP network can be broken into



many subnets of various sizes. This provides network administrators with extra network-configuration flexibility.

Review Questions

Q—When using OSPF, can you have two areas attached to each other where only one AS has an interface in Area 0?

A—Yes, you can. This describes the use of a virtual path. One area has an interface in Area 0 (legal), and the other AS is brought up and attached off an ABR in Area 1, so we'll call it Area 2. Area 2 has no interface in Area 0, so it must have a virtual path to Area 0 through Area 1. When this is in place, Area 2 looks like it is directly connected to Area 0. When Area 1 wants to send packets to Area 2, it must send them to Area 0, which in turn redirects them back through Area 1 using the virtual path to Area 2.

Q—Area 0 contains five routers (A, B, C, D, and E), and Area 1 contains three routers (R, S, and T). What routers does Router T know exists? Router S is the ABR.

A—Router T knows about routers R and S only. Likewise, Router S only knows about R and T, as well as routers to the ABR in Area 0. The AS's separate the areas so that router updates contain only information needed for that AS.

Chapter 8 BGP Border Gateway Protocol


The Border Gateway Protocol (BGP) The Border Gateway Protocol (BGP): The Border Gateway Protocol (BGP) is the core routing protocol of the Internet. It works by maintaining a table of IP networks or 'prefixes' which designate network reachability among autonomous systems (AS). It is described as a path vector protocol. BGP does not use traditional IGP metrics, but makes routing decisions based on path, network policies and/or rule sets. From January 2006, the current version of BGP, version 4, is codified in RFC 4271. BGP supports Classless Inter-Domain Routing and uses route aggregation to decrease the size of routing tables. Since 1994, version four of the protocol has been in use on the Internet. All previous versions are now obsolete. BGP was created to replace the EGP routing protocol to allow fully decentralized routing in order to allow the removal of the NSFNet Internet backbone network. This allowed the Internet to become a truly decentralized system. Very large private IP networks can also make use of BGP. An example would be the joining of a number of large Open Shortest Path First (OSPF) networks where OSPF by itself would not scale to size. Another reason to use BGP would be multihoming a network for better redundancy. Most Internet users do not use BGP directly. However, since most Internet service providers must use BGP to establish routing between one another (especially if they are multihomed), it is one of the most important protocols of the Internet. Compare this with Signaling System #7, which is the inter-provider core call setup protocol on the PSTN. BGP operation BGP neighbors, or peers, are established by manual configuration between routers to create a TCP session on port 179. A BGP speaker will periodically send 19-byte keep-alive messages to maintain the connection (every 60 seconds by default). Among routing protocols, BGP is unique in using TCP as its transport protocol. When BGP is running inside an autonomous system (AS), it is referred to as Internal BGP (IBGP Interior Border Gateway Protocol). IBGP routes have an administrative distance of 200. When BGP runs between ASs, it is called External BGP (EBGP Exterior Border Gateway Protocol), and it has an administrative distance of 20. A BGP router that routes IBGP traffic is called a transit router. Routers that sit on the boundary of an AS and that use EBGP to exchange information with the ISP are border or edge routers. In the simplest arrangement all routers within a single AS and participating in BGP routing must be configured in a full mesh: each router must be configured as peer to every other router. This causes scaling problems, since the number of required connections grows quadratically with the number of routers involved. To get around this, two solutions are built into BGP: route reflectors (RFC 2796) and confederations (RFC 3065).



Route reflectors reduce the number of connections required in an AS. A single router (or two for redundancy) can be made a route reflector: other routers in the AS need only be configured as peers to them. Confederations are used in very large networks where a large AS can be configured to encompass smaller more manageable internal ASs. Confederations can be used in conjunction with route reflectors. Finite state machine In order to make decisions in its operations with other BGP peers, a BGP peer uses a simple finite state machine that consists of six states: Idle, Connect, Active, OpenSent, OpenConfirm, and Established. For each peer-to-peer session, a BGP implementation maintains a state variable that tracks which of these six states the session is in. The BGP definition defines the messages that each peer should exchange in order to change the session from one state to another.

Introduction The Border Gateway Protocol (BGP) is an inter-autonomous system routing protocol. It is built on experience gained with EGP as defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as described in RFC 1092 [2] and RFC 1093 [3].



The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems. This network reachability information includes information on the autonomous systems (AS's) that traffic must transit to reach these networks. This information is sufficient to construct a graph of AS connectivity from which routing loops may be pruned and policy decisions at an AS level may be enforced. BGP runs over a reliable transport level protocol. This eliminates the need to implement explicit update fragmentation, retransmission, acknowledgement, and sequencing. Any authentication scheme used by the transport protocol may be used in addition to BGP's own authentication mechanisms. The initial BGP implementation is based on TCP [4], however any reliable transport may be used. A message passing protocol such as VMTP [5] might be more natural for BGP. TCP will be used, however, since it is present in virtually all commercial routers and hosts. In the following descriptions the phrase "transport protocol connection" can be understood to refer to a TCP connection. BGP uses TCP port 179 for establishing its connections. 2. Summary of Operation Two hosts form a transport protocol connection between one another. They exchange messages to open and confirm the connection parameters. The initial data flow is the entire BGP routing table. Incremental updates are sent as the routing tables change. Keep-alive messages are sent periodically to ensure the liveness of the connection. Notification messages are sent in response to errors or special conditions. If a connection encounters an error condition, a notification message is sent and the connection is optionally closed. The hosts executing the Border Gateway Protocol need not be routers. A non-routing host could exchange routing information with routers via EGP or even an interior routing protocol. That non-routing host could then use BGP to exchange routing information with a border gateway in another autonomous system. The implications and applications of this architecture are for further study. If a particular AS has more than one BGP gateway, then all these gateways should have a consistent view of routing. A consistent view of the interior routes of the autonomous system is provided by the intra-AS routing protocol. A consistent view of the routes exterior to the AS may be provided in a variety of ways. One way is to use the BGP protocol to exchange routing information between the BGP gateways within a single AS. In this case, in order to maintain consist routing information, these gateways MUST have direct BGP sessions with each other (the BGP sessions should form a complete graph). Note that this requirement does not imply that all BGP Gateways



within a single AS must have direct links to each other; other methods may be used to ensure consistent routing information. 3. Message Formats This section describes message formats and actions to be taken when errors are detected while processing these messages. Messages are sent over a reliable transport protocol connection. A message is processed after it is entirely received. The maximum message size is 1024 bytes. All implementations are required to support this maximum message size. The smallest message that may be sent consists of a BGP header without a data portion, or 8 bytes. The phrase "the BGP connection is closed" means that the transport protocol connection has been closed and that all resources for that BGP connection have been deallocated. Routing table entries associated with the remote peer are marked as invalid. This information is passed to other BGP peers before being deleted from the system. 3.1 Message Header Format Each message has a fixed size header. There may or may not be a data portion following the header, depending on the message type. The layout of these fields is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Marker | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Type | Hold Time | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Marker: 16 bits The Marker field is 16 bits of all ones. This field is used to mark the start of a message. If the first two bytes of a message are not all ones then we have a synchronization error and the BGP connection should be closed after sending a notification message with opcode 5 (connection not synchronized). No notification data is sent. Length: 16 bits The Length field is 16 bits. It is the total length of the message, including header, in bytes. If an illegal length is encountered (more than 1024 bytes or less than 8 bytes), a notification message with opcode 6 (bad message length) and two data bytes of the bad length should be sent and the BGP connection closed. Version: 8 bits The Version field is 8 bits of protocol version number. The current BGP version number is 1. If a bad version number is found, a notification message with opcode 8 (bad version



number) should be sent and the BGP connection closed. The bad version number should be included in one byte of notification data. Type: 8 bits The Type field is 8 bits of message type code. The following type codes are defined: 1 - OPEN 2 - UPDATE 3 - NOTIFICATION 4 - KEEPALIVE 5 - OPEN CONFIRM If an unrecognized type value is found, a notification message with opcode 7 (bad type code) and data consisting of the byte of type field in question should be sent and the BGP connection closed. Hold Timer: 16 bits. This field contains the number of seconds that may elapse since receiving a BGP KEEPALIVE or BGP UPDATE message from our BGP peer before we declare an error and close the BGP connection. OPEN Message Format After a transport protocol connection is established, the first message sent by either side is an OPEN message. If the OPEN message is acceptable, an OPEN CONFIRM message confirming the OPEN is sent back. Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION messages may be exchanged. In addition to the fixed size BGP header, the OPEN message contains the following fields. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Autonomous System | Link Type | Auth. Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Authentication Data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ My Autonomous System: 16 bits This field is our 16 bit autonomous system number. If there is a problem with this field, a notification message with opcode 9(invalid AS field) should be sent and the BGP connection closed.No notification data is sent.



Link Type: 8 bits The Link Type field is a single octet containing one of the following codes defining our position in the AS graph relative to our peer. 0 - INTERNAL 1 - UP 2 - DOWN 3 - H-LINK UP indicates the peer is higher in the AS hierarchy, DOWN indicates lower, and H-LINK indicates at the same level. INTERNAL indicates that the peer is another BGP speaking host in our autonomous system. INTERNAL links are used to keep AS routing information consistent with an AS with multiple border gateways. If the Link Type field is unacceptable, a notification message with opcode 1 (link type error in open) and data consisting of the expected link type should be sent and the BGP connection closed. The acceptable values for the Link Type fields of two BGP peers are discussed below. Authentication Code: 8 bits The Authentication Code field is an octet whose value describes the authentication mechanism being used. A value of zero indicates no BGP authentication. Note that a separate authentication mechanism may be used in establishing the transport level connection. If the authentication code is not recognized, a notification message with opcode 2 (unknown authentication code) and no data is sent and the BGP connection is closed. Authentication Data: variable length The Authentication Data field is a variable length field containing authentication data. If the value of Authentication Code field is zero, the Authentication Data field has zero length. If authentication fails, a notification message with opcode 3 (authentication failure) and no data is sent and the BGP connection is closed. 3.3 OPEN CONFIRM Message Format An OPEN CONFIRM message is sent after receiving an OPEN message. This completes the BGP connection setup. UPDATE, NOTIFICATION, and KEEPALIVE messages may now be exchanged. An OPEN CONFIRM message consists of a BGP header with an OPEN CONFIRM type code. There is no data in an OPEN CONFIRM message. 3.4 UPDATE Message Format UPDATE messages are used to transfer routing information between BGP peers. The information in the UPDATE packet can be used to construct a graph describing the



relationships of the various autonomous systems. By applying rules to be discussed, routing information loops and some other anomalies may be detected and removed from the inter-AS routing. Whenever an error in a UPDATE message is detected, a notification message is sent with opcode 4 (bad update), a two byte subcode describing the nature of the problem, and a data field consisting of as much of the UPDATE message data portion as possible. UPDATE messages have the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Gateway | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AS count | Direction | AS Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | repeat (Direction, AS Number) pairs AS count times | / / / / | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Net Count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Network | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Metric | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | repeat (Network, Metric) pairs Net Count times | / / / / | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Gateway: 32 bits. The Gateway field is the address of a gateway that has routes to the Internet networks listed in the rest of the UPDATE message. This gateway MUST belong to the same AS as the BGP peer who advertises it. If there is a problem with the gateway field, a notification message with subcode 6 (invalid gateway field) is sent. AS count: 8 bits. This field is the count of Direction and AS Number pairs in this UPDATE message. If an incorrect AS count field is detected, subcode 1 (invalid AS count) is specified in the notification message.



Direction: 8 bits The Direction field is an octet containing the direction taken by the routing information when exiting the AS defined by the succeeding AS Number field. The following values are defined. 1 - UP (went up a link in the graph) 2 - DOWN (went down a link in the graph) 3 - H_LINK (horizontal link in the graph) 4 - EGP_LINK (EGP derived information) 5 - INCOMPLETE (incomplete information) There is a special provision to pass exterior learned (non-BGP) routes over BGP. If an EGP learned route is passed over BGP, then the Direction field is set to EGP-LINK and the AS Number field is set to the AS number of the EGP peer that advertised this route. All other exterior-learned routes (non-BGP and non-EGP) may be passed by setting AS Number field to zero and Direction field to INCOMPLETE. If the direction code is not recognized, a notification message with subcode 2 (invalid direction code) is sent. AS Number: 16 bits This field is the AS number that transmitted the routing information. If there is a problem with this AS number, a notification message with subcode 3 (invalid autonomous system) is sent. Net Count: 16 bits. he Net Count field is the number of Metric and Network field airs which follow this field. If there is a problem with this ield, a notification with subcode 7 (invalid net count field) is ent. Network: 32 bits The Network field is four bytes of Internet network number. If there is a problem with the network field, a notification message with subcode 8 (invalid network field) is sent. Metric: 16 bits The Metric field is 16 bits of an unspecified metric. BGP metrics are comparable ONLY if routes have exactly the same AS path. A metric of all ones indicates the network is unreachable. In all other cases the metric field is MEANINGLESS and MUST BE IGNORED. There are no illegal metric values. 3.5 NOTIFICATION Message Format NOTIFICATION messages are sent when an error condition is detected. The BGP connection is closed shortly after sending the notification message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Opcode | Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +



| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Opcode: 16 bits The Opcode field describes the type of NOTIFICATION. The following opcodes have been defined. 1 (*) - link type error in open. Data is one byte of proper link type. 2 (*) - unknown authentication code. No data. 3 (*) - authentication failure. No data. 4 - update error. See below for data description. 5 (*) - connection out of sync. No data. 6 (*) - invalid message length. Data is two bytes of bad length. 7 (*) - invalid message type. Data is one byte of bad message type. 8 (*) - invalid version number. Data is one byte of bad version. 9 (*) - invalid AS field in OPEN. No data. 10 (*) - BGP Cease. No data. The starred opcodes in the list above are considered fatal errors and cause transport connection termination. The update error (opcode 4) has as data 16 bits of subcode followed by the last UPDATE message in question. After the subcode comes as much of the data portion of the UPDATE in question as possible. The following subcodes are defined: 1 - invalid AS count 2 - invalid direction code 3 - invalid autonomous system 4 - EGP_LINK or INCOMPLETE_LINK link type at other than the end of the AS path list 5 - routing loop 6 - invalid gateway field 7 - invalid Net Count field 8 - invalid network field Data: variable The Data field contains zero or more bytes of data to be used in diagnosing the reason for the NOTIFICATION. The contents of the Data field depend upon the opcode. See the opcode descriptions above for more details. 3.6 KEEPALIVE Message Format BGP does not use any transport protocol based keepalive mechanism to determine if peers are reachable. Instead KEEPALIVE messages are exchanged between peers often enough as not to cause the hold time (as advertised in the BGP header) to expire. A



reasonable minimum frequency of KEEPALIVE exchange would be one third of the Hold Time interval. As soon as the Hold Time associated with BGP peer has expired, the BGP connection is closed and BGP deallocates all resources associated with this peer. The KEEPALIVE message is a BGP header without any data. 4. BGP Finite State machine. This section specifies BGP operation in terms of a Finite State Machine (FSM). Following is a brief summary and overview of BGP operations by state as determined by this FSM. A condensed version of the BGP FSM is found in Appendix 1. Initially BGP is in the BGP Idle state. BGP Idle state: In this state BGP refuses all incoming BGP connections. No resources are allocated to the BGP neighbor. In response to the Start event (initiated by either system or operator) the local system initializes all BGP resources and changes its state to BGP Active. BGP Active state: In this state BGP is trying to acquire a BGP neighbor by opening a transport protocol connection. If the transport protocol open fails (for example, retransmission timeout), BGP stays in the BGP_Active state. Otherwise, the local system sends an OPEN message to its peer, and changes its state to BGP_OpenSent. Since the hold time of the peer is still undetermined, the hold time is initialized to some large value. In response to the Stop event (initiated by either system or operator) the local system releases all BGP resources and changes its state to BGP_Idle. BGP_OpenSent state: In this state BGP waits for an OPEN message from its peer. When n OPEN message is received, all fields are checked for correctness. If the initial BGP header checking detects an error, BGP deallocates all resources associated with this peer and returns to the BGP_Active state. Otherwise, the Link Type, Authentication Code, and Authentication Data fields are checked for correctness. If the link type is incorrect, a NOTIFICATION message with opcode 1 (link type error in open) is sent. The following combination of link type fields are correct; all other combinations are invalid. Our view Peer view UP DOWN DOWN UP INTERNAL INTERNAL



H-LINK H-LINK If the link between two peers is INTERNAL, then AS number of both peers must be the same. Otherwise, a NOTIFICATION message with opcode 1 (link type error in open) is sent. If both peers have the same AS number and the link type between these peers is not INTERNAL, then a NOTIFICATION message with opcode 1 (link type error in open) is sent. If the value of the Authentication Code field is zero, any information in the Authentication Data field (if present) is ignored. If the Authentication Code field is non-zero it is checked for known authentication codes. If authentication code is unknown, then the BGP NOTIFICATION message with opcode 2 (unknown authentication code) is sent. If the Authentication Code value is non-zero, then the corresponding authentication procedure is invoked. The default alues are a zero Authentication Code and no Authentication Data. If any of the above tests detect an error, the local system closes the BGP connection and changes its state to BGP_Idle. If there are no errors in the BGP OPEN message, BGP sends an OPEN CONFIRM message and goes into the BGP_OpenConfirm state. At this point the hold timer which was originally set to some arbitrary large value (see above) is replaced with the value indicated in the OPEN message. If disconnect notification is received from the underlying transport protocol or if the hold time expires, the local system closes the BGP connection and changes its state to BGP_Idle. BGP_OpenConfirm state: In this state BGP waits for an OPEN CONFIRM message. As soon as this message is received, BGP changes its state to BGP_Established. If the hold timer expires before an OPEN CONFIRM message is received, the local system closes the BGP connection and changes its state to BGP_Idle. BGP_Established state: In the BGP_Established state BGP can exchange UPDATE, NOTIFICATION, and KEEPALIVE messages with its peer. If disconnect notification is received from the underlying transport protocol or if the hold time expires, the local system closes the BGP connection and changes its state to BGP_Idle.



In response to the Stop event initiated by either the system or operator, the local system sends a NOTIFICATION message with opcode 10 (BGP Cease), closes the BGP connection, and changes its state to BGP_Idle. 5. UPDATE Message Handling A BGP UPDATE message may be received only in the BGP_Established state. When a BGP UPDATE message is received, each field is checked for validity. When a NOTIFICATION message is sent regarding an UPDATE, the opcode is always 4 (update error), the subcode depends on the type of error, and the rest of the data field is as much as possible of the data portion of the UPDATE that caused the error. If the Gateway field is incorrect, a BGP NOTIFICATION message is sent with subcode 6 (invalid gateway field). All information in this UPDATE message is discarded. If the AS Count field is less than or equal to zero, a BGP NOTIFICATION is sent with subcode 1 (invalid AS count). Otherwise, the complete AS path is extracted and checked as described below. If one of the Direction fields in the AS route list is not defined, a BGP NOTIFICATION message is with subcode 2 (invalid direction code). If one of the AS Number fields in the AS route list is incorrect, a BGP NOTIFICATION message is sent with subcode 3 (invalid autonomous system). If either a EGP_LINK or a INCOMPLETE_LINK link type occurs at other than the end of the AS path, a BGP NOTIFICATION message is sent with subcode 4 (EGP_LINK or INCOMPLETE_LINK link type at other than the end of the AS path list). If none of the above tests failed, the full AS route is checked for AS loops. AS loop detection is done by scanning the full AS route and checking that each AS in this route occurs only once. If an AS loop is detected, a BGP NOTIFICATION message is sent with subcode 5 (routing loop). If any of the above errors are detected, no further processing is done. Otherwise, the complete AS path is correct and the rest of the UPDATE message is processed. If the Net Count field is incorrect, a BGP NOTIFICATION message is sent with subcode 7 (invalid Net Count field). Each network and metric pair listed in the BGP UPDATE message is checked for a valid network number. If the Network field is incorrect, a BGP Notification message is sent with subcode 8 (invalid network field). No checking is done on the metric field. It is up to a particular implementation to decide whether to continue processing or terminate it upon the first incorrect network.



If the network, its complete AS path, and the gateway are correct,then the route is compared with other routes to the same network. If the new route is better than the current one, then it is flooded to other BGP peers as follows: If the BGP UPDATE was received over the INTERNAL link, it is not propagated over any other INTERNAL link. This restriction is due to the fact that all BGP gateways within a single AS form a completely connected graph (see above). Before sending a BGP UPDATE message over the non-INTERNAL links, check the AS path to insure that doing so would not cause a routing loop. The BGP UPDATE message is then propagated (subject to the local policy restrictions) over any of the non-INTERNAL link of a routing loop would not result. - If the BGP UPDATE message is propagated over a non-INTERNAL link, then the current AS number and link type of the link over which it is going to be propagated is prepended to the full AS path and the AS count field is incremented by 1. If the BGP UPDATE message is propagated over an INTERNAL link, then the full AS path passed unmodified and the AS count stays the same. The Gateway field is replaced with the sender's own address. 6. Acknowledgements We would like to express our thanks to Len Bosack (cisco Systems), Jeff Honig (Cornell University) and all members of the IWG task force for their contributions to this document. Appendix 1 BGP FSM State Transitions and Actions. This Appendix discusses the transitions between states in the BGP FSM in response to BGP events. The following is the list of these states and events. BGP States: 1 - BGP_Idle 2 - BGP_Active 3 - BGP_OpenSent 4 - BGP_OpenConfirm 5 - BGP_Established BGP Events: 1 - BGP Start 2 - BGP Transport connection open 3 - BGP Transport connection closed



4 - BGP Transport connection open failed 5 - Receive OPEN message 6 - Receive OPEN CONFIRM message 7 - Receive KEEPALIVE message 8 - Receive UPDATE messages 9 - Receive NOTIFICATION message 10 - Holdtime timer expired 11 - KeepAlive timer expired 12 - Receive CEASE message 13 - BGP Stop The following table describes the state transitions of the BGP FSM and the actions triggered by these transitions. Event Actions Message Sent Next State -------------------------------------------------------------------- BGP_Idle (1) 1 Initialize resources none 2 BGP_Active (2) 2 Initialize resources OPEN 3 4 none none 2 13 Release resources none 1 BGP_OpenSent(3) 3 none none 1 5 Process OPEN is OK OPEN CONFIRM 4 Process OPEN Message failed NOTIFICATION 1 11 Restart KeepAlive timer KEEPALIVE 3 13 Release resources none 1 BGP_OpenConfirm (4) 6 Complete initialization none 5 3 none none 1 10 Close transport connection none 1 11 Restart KeepAlive timer KEEPALIVE 4 13 Release resources none 1 BGP_Established (5) 7 Process KEEPALIVE none 5 8 Process UPDATE is OK UPDATE 5 Process UPDATE failed NOTIFICATION 5 9 Process NOTIFICATION none 5 10 Close transport connection none 1 11 Restart KeepAlive timer KEEPALIVE 5 12 Close transport connection NOTIFICATION 1 13 Release resources none 1



-------------------------------------------------------------------- All other state-event combinations are considered fatal errors and cause the termination of the BGP transport connection (if necessary) and a transition to the BGP_Idle state. The following is a condensed version of the above state transition table. Events|BGP_Idle BGP_Active BGP_OpenSent BGP_OpenConfirm BGP_Estab | (1) | (2) | (3) | (4) | (5) |------------------------------------------------------------- 1 | 2 | | | | | | | | | 2 | | 3 | | | | | | | | 3 | | | 1 | 1 | | | | | | 4 | | 2 | | | | | | | | 5 | | | 4 or 1 | | | | | | | 6 | | | | 5 | | | | | | 7 | | | | | 5 | | | | | 8 | | | | | 5 | | | | | 9 | | | | | 5 | | | | | 10 | | | | 1 | 1 | | | | | 11 | | | 3 | 4 | 5 | | | | | 12 | | | | | 1 | | | | | 13 | | 1 | 1 | 1 | 1 | | | | | --------------------------------------------------------------

Chapter 9 Introduction to Network Programming


Introduction to Network Programming Network programming is complex: one has to deal with a variety of protocols (IP, ICMP, UDP, TCP etc), concurrency, packet loss, host failure, timeouts, the Sockets API for the protocols, and subtle portability issues. The protocols are typically described in RFCs using informal prose and pseudo-code to characterise the behaviour of the systems involved. That informality has benefits, but inevitably these descriptions are somewhat ambiguous and incomplete. The protocols are hard to design and implement correctly; testing conformance against the standards is challenging; and understanding the many obscure corner cases and the failure semantics requires considerable expertise. Ideally we would have the best of both worlds: protocol descriptions that are simultaneously: 1. Clear, accessible to a broad community and easy to modify; 2. Unambiguous, characterising exactly what behaviour is specified; 3. Sufficiently loose, characterising exactly what is not specified, and hence what is left to implementers (especially, permitting high-performance implementations without over-constraining their structure); and 4. Directly usable as a basis for conformance testing, not read-and-forget documents. In this work we have established a practical technique for rigorous protocol specification, in HOL, that makes this ideal attainable for protocols as complex as TCP. We describe specification idioms that are rich enough to express the subtleties of TCP endpoint behaviour and that scale to the full protocol, all while remaining readable. We also describe novel tools for automated conformance testing between specifications and real-world implementations. To develop the technique, and to demonstrate its feasibility, we have produced a post-hoc specification of existing protocols: a mathematically rigorous and experimentally validated characterisation of the behaviour of TCP, UDP, and the Sockets API, as implemented in practice. The resulting specification may be useful in its own right in several ways. It has been extensively annotated to make it usable as a reference for TCP/IP stack implementers and Sockets API users, supplementing the existing informal standards and texts. It can also provide a basis for high-fidelity conformance testing of future implementations, and a basis for design (and conceivably formal proof) of higher-level communication layers. Perhaps more significantly, the work demonstrates that it would be feasible to carry out similar rigorous specification work for new protocols, in a tolerably light-weight style, both at design-time and during standardisation. We believe the increased clarity and precision over informal specifications, and the possibility of automated specification-based testing, would make this very much worthwhile, leading to clearer protocol designs and higher-quality implementations. We discuss some simple ways in which protocols could be designed to make testing computationally straightforward.

Chapter 9 Introduction to Network Programming


1.3. What are Sockets? Sockets are just like "worm holes" in science fiction. When things go into one end, they (should) come out of the other. Different kinds of sockets have different properties. Sockets are either connection- oriented or connectionless. Connection-oriented sockets allow for data to flow back and forth as needed, while connectionless sockets (also known as datagram sockets) allow only one message at a time to be transmitted, without an open connection. There are also different socket families. The two most common are AF_INET for Internet connections, and AF_UNIX for Unix IPC (interprocess communication). As stated earlier, this FAQ deals only with AF_INET sockets. 1.4. How do Sockets Work? The implementation is left up to the vendor of your particular Unix, but from the point of view of the programmer, connection-oriented sockets work a lot like files, or pipes. The most noticeable difference, once you have your file descriptor is that read() or write() calls may actually read or write fewer bytes than requested. If this happens, then you will have to make a second call for the rest of the data. There are examples of this in the source code that accompanies the faq.

Chapter 10 Socket Programming


Client-Server Model: Socket Interface

• In the client-server model, the client runs a program to request a service and the server runs a program to provide the service. These two programs communicate with each other.

• One server program can provide services for many client programs. • Clients can be run either iteratively (one at a time) or concurrently (many at a

time). • Servers can handle clients either iteratively (one at a time) or concurrently (many

at a time). • A connectionless iterative server uses UDP as its transport layer protocol and can

serve one client at a time. • A connection-oriented concurrent server uses TCP as its transport layer protocol

and can serve many clients at the same time. • When the operating system executes a program, an instance of the program, called

a process, is created. • If two application programs, one running on a local system and the other running

on the remote system, need to communicate with each other, a network program is required.

• The socket interface is a set of declarations, definitions, and procedures for writing cleint-server programs.

• The communication structure needed for socket programming is called a socket. • A stream socket is used with a connection-oriented protocol such as TCP. • A datagram socket is used with a connectionless protocolsuch as UDP. • A raw socket is sued by protocols such as ICMP or OSPF that directly use the

services of IP.

The Unix input/output (I/O) system follows a paradigm usually referred to as Open-Read-Write-Close. Before a user process can perform I/O operations, it calls Open to specify and obtain permissions for the file or device to be used. Once an object has been opened, the user process makes one or more calls to Read or Write data. Read reads data from the object and transfers it to the user process, while Write transfers data from the user process to the object. After all transfer operations are complete, the user process calls Close to inform the operating system that it has finished using that object.

When facilities for InterProcess Communication (IPC) and networking were added to Unix, the idea was to make the interface to IPC similar to that of file I/O. In Unix, a process has a set of I/O descriptors that one reads from and writes to. These descriptors may refer to files, devices, or communication channels (sockets). The lifetime of a



descriptor is made up of three phases: creation (open socket), reading and writing (receive and send to socket), and destruction (close socket).

The IPC interface in BSD-like versions of Unix is implemented as a layer over the network TCP and UDP protocols. Message destinations are specified as socket addresses; each socket address is a communication identifier that consists of a port number and an Internet address.

The IPC operations are based on socket pairs, one belonging to a communication process. IPC is done by exchanging some data through transmitting that data in a message between a socket in one process and another socket in another process. When messages are sent, the messages are queued at the sending socket until the underlying network protocol has transmitted them. When they arrive, the messages are queued at the receiving socket until the receiving process makes the necessary calls to receive them.

TCP/IP and UDP/IP communications

There are two communication protocols that one can use for socket programming: datagram communication and stream communication.

Datagram communication:

The datagram communication protocol, known as UDP (user datagram protocol), is a connectionless protocol, meaning that each time you send datagrams, you also need to send the local socket descriptor and the receiving socket's address. As you can tell, additional data must be sent each time a communication is made.

Stream communication:

The stream communication protocol is known as TCP (transfer control protocol). Unlike UDP, TCP is a connection-oriented protocol. In order to do communication over the TCP protocol, a connection must first be established between the pair of sockets. While one of the sockets listens for a connection request (server), the other asks for a connection (client). Once two sockets have been connected, they can be used to transmit data in both (or either one of the) directions.

Sockets programming in Java: Writing your own client/server applications can be done seamlessly using Java This tutorial presents an introduction to sockets programming over TCP/IP networks and shows how to write client/server applications in Java.



A bit of history The Unix input/output (I/O) system follows a paradigm usually referred to as Open-Read-Write-Close. Before a user process can perform I/O operations, it calls Open to specify and obtain permissions for the file or device to be used. Once an object has been opened, the user process makes one or more calls to Read or Write data. Read reads data from the object and transfers it to the user process, while Write transfers data from the user process to the object. After all transfer operations are complete, the user process calls Close to inform the operating system that it has finished using that object. When facilities for InterProcess Communication (IPC) and networking were added to Unix, the idea was to make the interface to IPC similar to that of file I/O. In Unix, a process has a set of I/O descriptors that one reads from and writes to. These descriptors may refer to files, devices, or communication channels (sockets). The lifetime of a descriptor is made up of three phases: creation (open socket), reading and writing (receive and send to socket), and destruction (close socket). The IPC interface in BSD-like versions of Unix is implemented as a layer over the network TCP and UDP protocols. Message destinations are specified as socket addresses; each socket address is a communication identifier that consists of a port number and an Internet address. import java.io.*; import java.net.*; public class smtpClient { public static void main(String[] args) { // declaration section: // smtpClient: our client socket // os: output stream // is: input stream Socket smtpSocket = null; DataOutputStream os = null; DataInputStream is = null; // Initialization section: // Try to open a socket on port 25 // Try to open input and output streams try { smtpSocket = new Socket("hostname", 25); os = new DataOutputStream(smtpSocket.getOutputStream()); is = new DataInputStream(smtpSocket.getInputStream()); } catch (UnknownHostException e) { System.err.println("Don't know about host: hostname"); } catch (IOException e) { System.err.println("Couldn't get I/O for the connection to: hostname"); }



// If everything has been initialized then we want to write some data // to the socket we have opened a connection to on port 25 if (smtpSocket != null && os != null && is != null) { try { // The capital string before each colon has a special meaning to SMTP // you may want to read the SMTP specification, RFC1822/3 os.writeBytes("HELO\n"); os.writeBytes("MAIL From: [email protected]\n"); os.writeBytes("RCPT To: [email protected]\n"); os.writeBytes("DATA\n"); os.writeBytes("From: [email protected]\n"); os.writeBytes("Subject: testing\n"); os.writeBytes("Hi there\n"); // message body os.writeBytes("\n.\n"); os.writeBytes("QUIT"); // keep on reading from/to the socket till we receive the "Ok" from SMTP, // once we received that then we want to break. String responseLine; while ((responseLine = is.readLine()) != null) { System.out.println("Server: " + responseLine); if (responseLine.indexOf("Ok") != -1) { break; } } // clean up: // close the output stream // close the input stream // close the socket os.close(); is.close(); smtpSocket.close(); } catch (UnknownHostException e) { System.err.println("Trying to connect to unknown host: " + e); } catch (IOException e) { System.err.println("IOException: " + e); } } } }

When programming a client, you must follow these four steps:

• Open a socket. • Open an input and output stream to the socket. • Read from and write to the socket according to the server's protocol.



• Clean up.

These steps are pretty much the same for all clients. The only step that varies is step three, since it depends on the server you are talking to.

2. Echo server

Now let's write a server. This server is very similar to the echo server running on port 7. Basically, the echo server receives text from the client and then sends that exact text back to the client. This is just about the simplest server you can write. Note that this server handles only one client. Try to modify it to handle multiple clients using threads.

The IPC operations are based on socket pairs, one belonging to a communication process. IPC is done by exchanging some data through transmitting that data in a message between a socket in one process and another socket in another process. When messages are sent, the messages are queued at the sending socket until the underlying network protocol has transmitted them. When they arrive, the messages are queued at the receiving socket until the receiving process makes the necessary calls to receive them. TCP/IP and UDP/IP communications There are two communication protocols that one can use for socket programming: datagram communication and stream communication. Datagram communication: The datagram communication protocol, known as UDP (user datagram protocol), is a connectionless protocol, meaning that each time you send datagrams, you also need to send the local socket descriptor and the receiving socket's address. As you can tell, additional data must be sent each time a communication is made. Stream communication: The stream communication protocol is known as TCP (transfer control protocol). Unlike UDP, TCP is a connection-oriented protocol. In order to do communication over the TCP protocol, a connection must first be established between the pair of sockets. While one of the sockets listens for a connection request (server), the other asks for a connection (client). Once two sockets have been connected, they can be used to transmit data in both (or either one of the) directions.

Now, you might ask what protocol you should use -- UDP or TCP?



This depends on the client/server application you are writing. The following discussion shows the differences between the UDP and TCP protocols; this might help you decide which protocol you should use.

In UDP, as you have read above, every time you send a datagram, you have to send the local descriptor and the socket address of the receiving socket along with it. Since TCP is a connection-oriented protocol, on the other hand, a connection must be established before communications between the pair of sockets start. So there is a connection setup time in TCP.

In UDP, there is a size limit of 64 kilobytes on datagrams you can send to a specified location, while in TCP there is no limit. Once a connection is established, the pair of sockets behaves like streams: All available data are read immediately in the same order in which they are received.

UDP is an unreliable protocol -- there is no guarantee that the datagrams you have sent will be received in the same order by the receiving socket. On the other hand, TCP is a reliable protocol; it is guaranteed that the packets you send will be received in the order in which they were sent.

In short, TCP is useful for implementing network services -- such as remote login (rlogin, telnet) and file transfer (FTP) -- which require data of indefinite length to be transferred. UDP is less complex and incurs fewer overheads. It is often used in implementing client/server applications in distributed systems built over local area networks.

Programming sockets in Java

In this section we will answer the most frequently asked questions about programming sockets in Java. Then we will show some examples of how to write client and server applications.

Note: In this tutorial we will show how to program sockets in Java using the TCP/IP protocol only since it is more widely used than UDP/IP. Also: All the classes related to sockets are in the java.net package, so make sure to import that package when you program sockets.

How do I open a socket?

If you are programming a client, then you would open a socket like this:

Socket MyClient; MyClient = new Socket("Machine name", PortNumber);



Where Machine name is the machine you are trying to open a connection to, and PortNumber is the port (a number) on which the server you are trying to connect to is running. When selecting a port number, you should note that port numbers between 0 and 1,023 are reserved for privileged users (that is, super user or root). These port numbers are reserved for standard services, such as email, FTP, and HTTP. When selecting a port number for your server, select one that is greater than 1,023!

In the example above, we didn't make use of exception handling, however, it is a good idea to handle exceptions. (From now on, all our code will handle exceptions!) The above can be written as:

Socket MyClient; try { MyClient = new Socket("Machine name", PortNumber); } catch (IOException e) { System.out.println(e); }

If you are programming a server, then this is how you open a socket:

ServerSocket MyService; try { MyServerice = new ServerSocket(PortNumber); } catch (IOException e) { System.out.println(e); }

When implementing a server you also need to create a socket object from the ServerSocket in order to listen for and accept connections from clients.

Socket clientSocket = null; try { serviceSocket = MyService.accept(); } catch (IOException e) { System.out.println(e); }

How do I create an input stream?



On the client side, you can use the DataInputStream class to create an input stream to receive response from the server:

DataInputStream input; try { input = new DataInputStream(MyClient.getInputStream()); } catch (IOException e) { System.out.println(e); }

The class DataInputStream allows you to read lines of text and Java primitive data types in a portable way. It has methods such as read, readChar, readInt, readDouble, and readLine,. Use whichever function you think suits your needs depending on the type of data that you receive from the server.

On the server side, you can use DataInputStream to receive input from the client:

DataInputStream input; try { input = new DataInputStream(serviceSocket.getInputStream()); } catch (IOException e) { System.out.println(e); }

How do I create an output stream?

On the client side, you can create an output stream to send information to the server socket using the class PrintStream or DataOutputStream of java.io:

PrintStream output; try { output = new PrintStream(MyClient.getOutputStream()); } catch (IOException e) { System.out.println(e); }

The class PrintStream has methods for displaying textual representation of Java primitive data types. Its Write and println methods are important here. Also, you may want to use the DataOutputStream:



DataOutputStream output; try { output = new DataOutputStream(MyClient.getOutputStream()); } catch (IOException e) { System.out.println(e); }

The class DataOutputStream allows you to write Java primitive data types; many of its methods write a single Java primitive type to the output stream. The method writeBytes is a useful one.

On the server side, you can use the class PrintStream to send information to the client.

PrintStream output; try { output = new PrintStream(serviceSocket.getOutputStream()); } catch (IOException e) { System.out.println(e); }

Note: You can use the class DataOutputStream as mentioned above.

How do I close sockets?

You should always close the output and input stream before you close the socket.

On the client side:

try { output.close(); input.close(); MyClient.close(); } catch (IOException e) { System.out.println(e); }

On the server side:

try {



output.close(); input.close(); serviceSocket.close(); MyService.close(); } catch (IOException e) { System.out.println(e); }

Examples

In this section we will write two applications: a simple SMTP (simple mail transfer protocol) client, and a simple echo server.

1. SMTP client

Let's write an SMTP (simple mail transfer protocol) client -- one so simple that we have all the data encapsulated within the program. You may change the code around to suit your needs. An interesting modification would be to change it so that you accept the data from the command-line argument and also get the input (the body of the message) from standard input. Try to modify it so that it behaves the same as the mail program that comes with Unix.

import java.io.*; import java.net.*; public class echo3 { public static void main(String args[]) { // declaration section: // declare a server socket and a client socket for the server // declare an input and an output stream ServerSocket echoServer = null; String line; DataInputStream is; PrintStream os; Socket clientSocket = null; // Try to open a server socket on port 9999 // Note that we can't choose a port less than 1023 if we are not // privileged users (root) try { echoServer = new ServerSocket(9999); } catch (IOException e) { System.out.println(e); } // Create a socket object from the ServerSocket to listen and accept // connections.



// Open input and output streams try { clientSocket = echoServer.accept(); is = new DataInputStream(clientSocket.getInputStream()); os = new PrintStream(clientSocket.getOutputStream()); // As long as we receive data, echo that data back to the client. while (true) { line = is.readLine(); os.println(line); } } catch (IOException e) { System.out.println(e); } } }

Conclusion

Programming client/server applications is challenging and fun, and programming this kind of application in Java is easier than doing it in other languages, such as C. Socket programming in Java is seamless.

The java.net package provides a powerful and flexible infrastructure for network programming, so you are encouraged to refer to that package if you would like to know the classes that are provided.

Sun.* packages have some good classes for networking, however you are not encouraged to use those classes at the moment because they may change in the next release. Also, some of the classes are not portable across all platforms.

Chapter 12 Introduction to Distributed Computing with RMI


Introduction

This is a brief introduction to Java Remote Method Invocation (RMI). Java RMI is a mechanism that allows one to invoke a method on an object that exists in another address space. The other address space could be on the same machine or a different one. The RMI mechanism is basically an object-oriented RPC mechanism. CORBA is another object-oriented RPC mechanism. CORBA differs from Java RMI in a number of ways:

1. CORBA is a language-independent standard. 2. CORBA includes many other mechanisms in its standard (such as a standard for

TP monitors) none of which are part of Java RMI. 3. There is also no notion of an "object request broker" in Java RMI.

Java RMI has recently been evolving toward becoming more compatible with CORBA. In particular, there is now a form of RMI called RMI/IIOP ("RMI over IIOP") that uses the Internet Inter-ORB Protocol (IIOP) of CORBA as the underlying protocol for RMI communication.

This tutorial attempts to show the essence of RMI, without discussing any extraneous features. Sun has provided a Guide to RMI, but it includes a lot of material that is not relevant to RMI itself. For example, it discusses how to incorporate RMI into an Applet, how to use packages and how to place compiled classes in a different directory than the source code. All of these are interesting in themselves, but they have nothing at all to do with RMI. As a result, Sun's guide is unnecessarily confusing. Moreover, Sun's guide and examples omit a number of details that are important for RMI.

There are three processes that participate in supporting remote method invocation.

1. The Client is the process that is invoking a method on a remote object. 2. The Server is the process that owns the remote object. The remote object is an

ordinary object in the address space of the server process. 3. The Object Registry is a name server that relates objects with names. Objects are

registered with the Object Registry. Once an object has been registered, one can use the Object Registry to obtain access to a remote object using the name of the object.

In this tutorial, we will give an example of a Client and a Server that solve the classical "Hello, world!" problem. You should try extracting the code that is presented and running it on your own computer.

There are two kinds of classes that can be used in Java RMI.

1. A Remote class is one whose instances can be used remotely. An object of such a class can be referenced in two different ways:



1. Within the address space where the object was constructed, the object is an ordinary object which can be used like any other object.

2. Within other address spaces, the object can be referenced using an object handle. While there are limitations on how one can use an object handle compared to an object, for the most part one can use object handles in the same way as an ordinary object.

For simplicity, an instance of a Remote class will be called a remote object.

2. A Serializable class is one whose instances can be copied from one address space to another. An instance of a Serializable class will be called a serializable object. In other words, a serializable object is one that can be marshaled. Note that this concept has no connection to the concept of serializability in database management systems.

If a serializable object is passed as a parameter (or return value) of a remote method invocation, then the value of the object will be copied from one address space to the other. By contrast if a remote object is passed as a parameter (or return value), then the object handle will be copied from one address space to the other.

One might naturally wonder what would happen if a class were both Remote and Serializable. While this might be possible in theory, it is a poor design to mix these two notions as it makes the design difficult to understand.

Serializable Classes

We now consider how to design Remote and Serializable classes. The easier of the two is a Serializable class. A class is Serializable if it implements the java.io.Serializable interface. Subclasses of a Serializable class are also Serializable. Many of the standard classes are Serializable, so a subclass of one of these is automatically also Serializable. Normally, any data within a Serializable class should also be Serializable. Although there are ways to include non-serializable objects within a serializable objects, it is awkward to do so. See the documentation of java.io.Serializable for more information about this.

Using a serializable object in a remote method invocation is straightforward. One simply passes the object using a parameter or as the return value. The type of the parameter or return value is the Serializable class. Note that both the Client and Server programs must have access to the definition of any Serializable class that is being used. If the Client and Server programs are on different machines, then class definitions of Serializable classes may have to be downloaded from one machine to the other. Such a download could violate system security. This problem is discussed in the Security section.



The only Serializable class that will be used in the "Hello, world!" example is the String class, so no problems with security arise.

Remote Classes and Interfaces

Next consider how to define a Remote class. This is more difficult than defining a Serializable class. A Remote class has two parts: the interface and the class itself. The Remote interface must have the following properties:

1. The interface must be public. 2. The interface must extend the interface java.rmi.Remote. 3. Every method in the interface must declare that it throws

java.rmi.RemoteException. Other exceptions may also be thrown.

The Remote class itself has the following properties:

1. It must implement a Remote interface. 2. It should extend the java.rmi.server.UnicastRemoteObject class. Objects of

such a class exist in the address space of the server and can be invoked remotely. While there are other ways to define a Remote class, this is the simplest way to ensure that objects of a class can be used as remote objects. See the documentation of the java.rmi.server package for more information.

3. It can have methods that are not in its Remote interface. These can only be invoked locally.

Unlike the case of a Serializable class, it is not necessary for both the Client and the Server to have access to the definition of the Remote class. The Server requires the definition of both the Remote class and the Remote interface, but the Client only uses the Remote interface. Roughly speaking, the Remote interface represents the type of an object handle, while the Remote class represents the type of an object. If a remote object is being used remotely, its type must be declared to be the type of the Remote interface, not the type of the Remote class.

In the example program, we need a Remote class and its corresponding Remote interface. We call these Hello and HelloInterface, respectively. Here is the file HelloInterface.java:

import java.rmi.*; /** * Remote Interface for the "Hello, world!" example. */ public interface HelloInterface extends Remote { /** * Remotely invocable method. * @return the message of the remote object, such as "Hello, world!". * @exception RemoteException if the remote invocation fails. */



public String say() throws RemoteException; }

Here is the file Hello.java: import java.rmi.*; import java.rmi.server.*; /** * Remote Class for the "Hello, world!" example. */ public class Hello extends UnicastRemoteObject implements HelloInterface { private String message; /** * Construct a remote object * @param msg the message of the remote object, such as "Hello, world!". * @exception RemoteException if the object handle cannot be constructed. */ public Hello (String msg) throws RemoteException { message = msg; } /** * Implementation of the remotely invocable method. * @return the message of the remote object, such as "Hello, world!". * @exception RemoteException if the remote invocation fails. */ public String say() throws RemoteException { return message; } }

All of the Remote interfaces and classes should be compiled using javac. Once this has been completed, the stubs and skeletons for the Remote interfaces should be compiled by using the rmic stub compiler. The stub and skeleton of the example Remote interface are compiled with the command:

rmic Hello

The only problem one might encounter with this command is that rmic might not be able to find the files Hello.class and HelloInterface.class even though they are in the same directory where rmic is being executed. If this happens to you, then try setting the CLASSPATH environment variable to the current directory, as in the following command:

setenv CLASSPATH .

If your CLASSPATH variable already has some directories in it, then you might want to add the current directory to the others.

Programming a Client



Having described how to define Remote and Serializable classes, we now discuss how to program the Client and Server. The Client itself is just a Java program. It need not be part of a Remote or Serializable class, although it will use Remote and Serializable classes.

A remote method invocation can return a remote object as its return value, but one must have a remote object in order to perform a remote method invocation. So to obtain a remote object one must already have one. Accordingly, there must be a separate mechanism for obtaining the first remote object. The Object Registry fulfills this requirement. It allows one to obtain a remote object using only the name of the remote object.

The name of a remote object includes the following information:

1. The Internet name (or address) of the machine that is running the Object Registry with which the remote object is being registered. If the Object Registry is running on the same machine as the one that is making the request, then the name of the machine can be omitted.

2. The port to which the Object Registry is listening. If the Object Registry is listening to the default port, 1099, then this does not have to be included in the name.

3. The local name of the remote object within the Object Registry.

Here is the example Client program: /** * Client program for the "Hello, world!" example. * @param argv The command line arguments which are ignored. */ public static void main (String[] argv) { try { HelloInterface hello = (HelloInterface) Naming.lookup ("//ortles.ccs.neu.edu/Hello"); System.out.println (hello.say()); } catch (Exception e) { System.out.println ("HelloClient exception: " + e); } }

The Naming.lookup method obtains an object handle from the Object Registry running on ortles.ccs.neu.edu and listening to the default port. Note that the result of Naming.lookup must be cast to the type of the Remote interface.

The remote method invocation in the example Client is hello.say(). It returns a String which is then printed. A remote method invocation can return a String object because String is a Serializable class.

The code for the Client can be placed in any convenient class. In the example Client, it was placed in a class HelloClient that contains only the program above



Distributed Applications CORBA products provide a framework for the development and execution of distributed applications. But why would one want to develop a distributed application in the first place? As you will see later, distribution introduces a whole new set of difficult issues. However, sometimes there is no choice; some applications by their very nature are distributed across multiple computers because of one or more of the following reasons:

• The data used by the application are distributed • The computation is distributed • The users of the application are distributed

Data are Distributed Some applications must execute on multiple computers because the data that the application must access exist on multiple computers for administrative and ownership reasons. The owner may permit the data to be accessed remotely but not stored locally. Or perhaps the data cannot be co-located and must exist on multiple heterogeneous systems for historical reasons.

Computation is Distributed Some applications execute on multiple computers in order to take advantage of multiple processors computing in parallel to solve some problem. Other applications may execute on multiple computers in order to take advantage of some unique feature of a particular system. Distributed applications can take advantage of the scalability and heterogeneity of the distributed system.

Users are Distributed Some applications execute on multiple computers because users of the application communicate and interact with each other via the application. Each user executes a piece of the distributed application on his or her computer, and shared objects, typically execute on one or more servers.

Introduction to Distributed Computing with RMI Remote Method Invocation (RMI) technology, first introduced in JDK 1.1, elevates network programming to a higher plane. Although RMI is relatively easy to use, it is a remarkably powerful technology and exposes the average Java developer to an entirely new paradigm--the world of distributed object computing.

This course provides you with an in-depth introduction to this versatile technology. RMI has evolved considerably since JDK 1.1, and has been significantly upgraded under the Java 2 SDK. Where applicable, the differences between the two releases will be indicated.



Goals A primary goal for the RMI designers was to allow programmers to develop distributed Java programs with the same syntax and semantics used for non-distributed programs. To do this, they had to carefully map how Java classes and objects work in a single Java Virtual Machine1 (JVM) to a new model of how classes and objects would work in a distributed (multiple JVM) computing environment.

This section introduces the RMI architecture from the perspective of the distributed or remote Java objects, and explores their differences through the behavior of local Java objects. The RMI architecture defines how objects behave, how and when exceptions can occur, how memory is managed, and how parameters are passed to, and returned from, remote methods.

Comparison of Distributed and Nondistributed Java Programs The RMI architects tried to make the use of distributed Java objects similar to using local Java objects. While they succeeded, some important differences are listed in the table below.

Do not worry if you do not understand all of the difference. They will become clear as you explore the RMI architecture. You can use this table as a reference as you learn about RMI.

Local Object Remote Object

Object Definition

A local object is defined by a Java class.

A remote object's exported behavior is defined by an interface that must extend the Remote interface.

Object Implementation

A local object is implemented by its Java class.

A remote object's behavior is executed by a Java class that implements the remote interface.

Object Creation A new instance of a local object is created by the new operator.

A new instance of a remote object is created on the host computer with the new operator. A client cannot directly create a new remote object (unless using Java 2 Remote Object Activation).

Object Access A local object is accessed directly via an object reference variable.

A remote object is accessed via an object reference variable which points to a proxy stub implementation of the remote interface.

References In a single JVM, an object reference points

A "remote reference" is a pointer to a proxy object (a "stub") in the local heap.



directly at an object in the heap.

That stub contains information that allows it to connect to a remote object, which contains the implementation of the methods.

Active References

In a single JVM, an object is considered "alive" if there is at least one reference to it.

In a distributed environment, remote JVMs may crash, and network connections may be lost. A remote object is considered to have an active remote reference to it if it has been accessed within a certain time period (the lease period). If all remote references have been explicitly dropped, or if all remote references have expired leases, then a remote object is available for distributed garbage collection.

Finalization If an object implements the finalize() method, it is called before an object is reclaimed by the garbage collector.

If a remote object implements the Unreferenced interface, the unreferenced method of that interface is called when all remote references have been dropped.

Garbage Collection

When all local references to an object have been dropped, an object becomes a candidate for garbage collection.

The distributed garbage collector works with the local garbage collector. If there are no remote references and all local references to a remote object have been dropped, then it becomes a candidate for garbage collection through the normal means.

Exceptions Exceptions are either Runtime exceptions or Exceptions. The Java compiler forces a program to handle all Exceptions.

RMI forces programs to deal with any possible RemoteException objects that may be thrown. This was done to ensure the robustness of distributed applications.

Java RMI Architecture The design goal for the RMI architecture was to create a Java distributed object model that integrates naturally into the Java programming language and the local object model. RMI architects have succeeded; creating a system that extends the safety and robustness of the Java architecture to the distributed computing world.

Interfaces: The Heart of RMI



The RMI architecture is based on one important principle: the definition of behavior and the implementation of that behavior are separate concepts. RMI allows the code that defines the behavior and the code that implements the behavior to remain separate and to run on separate JVMs.

This fits nicely with the needs of a distributed system where clients are concerned about the definition of a service and servers are focused on providing the service.

Specifically, in RMI, the definition of a remote service is coded using a Java interface. The implementation of the remote service is coded in a class. Therefore, the key to understanding RMI is to remember that interfaces define behavior and classes define implementation.

While the following diagram illustrates this separation,

remember that a Java interface does not contain executable code. RMI supports two classes that implement the same interface. The first class is the implementation of the behavior, and it runs on the server. The second class acts as a proxy for the remote service and it runs on the client. This is shown in the following diagram.

A client program makes method calls on the proxy object, RMI sends the request to the remote JVM, and forwards it to the implementation. Any return values provided by the implementation are sent back to the proxy and then to the client's program.



RMI Architecture Layers With an understanding of the high-level RMI architecture, take a look under the covers to see its implementation.

The RMI implementation is essentially built from three abstraction layers. The first is the Stub and Skeleton layer, which lies just beneath the view of the developer. This layer intercepts method calls made by the client to the interface reference variable and redirects these calls to a remote RMI service.

The next layer is the Remote Reference Layer. This layer understands how to interpret and manage references made from clients to the remote service objects. In JDK 1.1, this layer connects clients to remote service objects that are running and exported on a server. The connection is a one-to-one (unicast) link. In the Java 2 SDK, this layer was enhanced to support the activation of dormant remote service objects via Remote Object Activation.

The transport layer is based on TCP/IP connections between machines in a network. It provides basic connectivity, as well as some firewall penetration strategies.

By using a layered architecture each of the layers could be enhanced or replaced without affecting the rest of the system. For example, the transport layer could be replaced by a UDP/IP layer without affecting the upper layers.

Stub and Skeleton Layer The stub and skeleton layer of RMI lie just beneath the view of the Java developer. In this layer, RMI uses the Proxy design pattern as described in the book, Design Patterns by Gamma, Helm, Johnson and Vlissides. In the Proxy pattern, an object in one context is represented by another (the proxy) in a separate context. The proxy knows how to forward method calls between the participating objects. The following class diagram illustrates the Proxy pattern.



In RMI's use of the Proxy pattern, the stub class plays the role of the proxy, and the remote service implementation class plays the role of the RealSubject.

A skeleton is a helper class that is generated for RMI to use. The skeleton understands how to communicate with the stub across the RMI link. The skeleton carries on a conversation with the stub; it reads the parameters for the method call from the link, makes the call to the remote service implementation object, accepts the return value, and then writes the return value back to the stub.

In the Java 2 SDK implementation of RMI, the new wire protocol has made skeleton classes obsolete. RMI uses reflection to make the connection to the remote service object. You only have to worry about skeleton classes and objects in JDK 1.1 and JDK 1.1 compatible system implementations.

Remote Reference Layer The Remote Reference Layers defines and supports the invocation semantics of the RMI connection. This layer provides a RemoteRef object that represents the link to the remote service implementation object.

The stub objects use the invoke() method in RemoteRef to forward the method call. The RemoteRef object understands the invocation semantics for remote services.

The JDK 1.1 implementation of RMI provides only one way for clients to connect to remote service implementations: a unicast, point-to-point connection. Before a client can use a remote service, the remote service must be instantiated on the server and exported to the RMI system. (If it is the primary service, it must also be named and registered in the RMI Registry).

The Java 2 SDK implementation of RMI adds a new semantic for the client-server connection. In this version, RMI supports activatable remote objects. When a method call is made to the proxy for an activatable object, RMI determines if the remote service implementation object is dormant. If it is dormant, RMI will instantiate the object and



restore its state from a disk file. Once an activatable object is in memory, it behaves just like JDK 1.1 remote service implementation objects.

Other types of connection semantics are possible. For example, with multicast, a single proxy could send a method request to multiple implementations simultaneously and accept the first reply (this improves response time and possibly improves availability). In the future, Sun may add additional invocation semantics to RMI.

Transport Layer The Transport Layer makes the connection between JVMs. All connections are stream-based network connections that use TCP/IP.

Even if two JVMs are running on the same physical computer, they connect through their host computer's TCP/IP network protocol stack. (This is why you must have an operational TCP/IP configuration on your computer to run the Exercises in this course). The following diagram shows the unfettered use of TCP/IP connections between JVMs.

As you know, TCP/IP provides a persistent, stream-based connection between two machines based on an IP address and port number at each end. Usually a DNS name is used instead of an IP address; this means you could talk about a TCP/IP connection between flicka.magelang.com:3452 and rosa.jguru.com:4432. In the current release of RMI, TCP/IP connections are used as the foundation for all machine-to-machine connections.

On top of TCP/IP, RMI uses a wire level protocol called Java Remote Method Protocol (JRMP). JRMP is a proprietary, stream-based protocol that is only partially specified is now in two versions. The first version was released with the JDK 1.1 version of RMI and required the use of Skeleton classes on the server. The second version was released with the Java 2 SDK. It has been optimized for performance and does not require skeleton classes. (Note that some alternate implementations, such as BEA Weblogic and NinjaRMI do not use JRMP, but instead use their own wire level protocol. ObjectSpace's Voyager does recognize JRMP and will interoperate with RMI at the wire level.) Some other changes with the Java 2 SDK are that RMI service interfaces are



not required to extend from java.rmi.Remote and their service methods do not necessarily throw RemoteException.

Sun and IBM have jointly worked on the next version of RMI, called RMI-IIOP, which will be available with Java 2 SDK Version 1.3. The interesting thing about RMI-IIOP is that instead of using JRMP, it will use the Object Management Group (OMG) Internet Inter-ORB Protocol, IIOP, to communicate between clients and servers.

The OMG is a group of more than 800 members that defines a vendor-neutral, distributed object architecture called Common Object Request Broker Architecture (CORBA). CORBA Object Request Broker (ORB) clients and servers communicate with each other using IIOP. With the adoption of the Objects-by-Value extension to CORBA and the Java Language to IDL Mapping proposal, the ground work was set for direct RMI to CORBA integration. This new RMI-IIOP implementation supports most of the RMI feature set, except for:

• java.rmi.server.RMISocketFactory • UnicastRemoteObject • Unreferenced • The DGC interfaces The RMI transport layer is designed to make a connection between clients and server, even in the face of networking obstacles.

While the transport layer prefers to use multiple TCP/IP connections, some network configurations only allow a single TCP/IP connection between a client and server (some browsers restrict applets to a single network connection back to their hosting server).

In this case, the transport layer multiplexes multiple virtual connections within a single TCP/IP connection.

Naming Remote Objects During the presentation of the RMI Architecture, one question has been repeatedly postponed: "How does a client find an RMI remote service? " Now you'll find the answer to that question. Clients find remote services by using a naming or directory service. This may seem like circular logic. How can a client locate a service by using a service? In fact, that is exactly the case. A naming or directory service is run on a well-known host and port number.

(Well-known meaning everyone in an organization knowing what it is).

RMI can use many different directory services, including the Java Naming and Directory Interface (JNDI). RMI itself includes a simple service called the RMI Registry, rmiregistry. The RMI Registry runs on each machine that hosts remote service objects and accepts queries for services, by default on port 1099.



On a host machine, a server program creates a remote service by first creating a local object that implements that service. Next, it exports that object to RMI. When the object is exported, RMI creates a listening service that waits for clients to connect and request the service. After exporting, the server registers the object in the RMI Registry under a public name.

On the client side, the RMI Registry is accessed through the static class Naming. It provides the method lookup() that a client uses to query a registry. The method lookup() accepts a URL that specifies the server host name and the name of the desired service. The method returns a remote reference to the service object. The URL takes the form:

rmi://<host_name> [:<name_service_port>] /<service_name> where the host_name is a name recognized on the local area network (LAN) or a DNS name on the Internet. The name_service_port only needs to be specified only if the naming service is running on a different port to the default 1099.

Using RMI It is now time to build a working RMI system and get hands-on experience. In this section, you will build a simple remote calculator service and use it from a client program.

A working RMI system is composed of several parts.

• Interface definitions for the remote services • Implementations of the remote services • Stub and Skeleton files • A server to host the remote services • An RMI Naming service that allows clients to find the remote services • A class file provider (an HTTP or FTP server) • A client program that needs the remote services In the next sections, you will build a simple RMI system in a step-by-step fashion. You are encouraged to create a fresh subdirectory on your computer and create these files as you read the text. To simplify things, you will use a single directory for the client and server code. By running the client and the server out of the same directory, you will not have to set up an HTTP or FTP server to provide the class files. (Details about how to use HTTP and FTP servers as class file providers will be covered in the section on Distributing and Installing RMI Software)

Assuming that the RMI system is already designed, you take the following steps to



build a system:

1. Write and compile Java code for interfaces 2. Write and compile Java code for implementation classes 3. Generate Stub and Skeleton class files from the implementation classes 4. Write Java code for a remote service host program 5. Develop Java code for RMI client program 6. Install and run RMI system

1. Interfaces The first step is to write and compile the Java code for the service interface. The Calculator interface defines all of the remote features offered by the service:

public interface Calculator extends java.rmi.Remote { public long add(long a, long b) throws java.rmi.RemoteException; public long sub(long a, long b) throws java.rmi.RemoteException; public long mul(long a, long b) throws java.rmi.RemoteException; public long div(long a, long b) throws java.rmi.RemoteException; }

Notice this interface extends Remote, and each method signature declares that it may throw a RemoteException object.

Copy this file to your directory and compile it with the Java compiler:

>javac Calculator.java 2. Implementation

Next, you write the implementation for the remote service. This is the



CalculatorImpl class:

public class CalculatorImpl extends java.rmi.server.UnicastRemoteObject implements Calculator { // Implementations must have an //explicit constructor // in order to declare the //RemoteException exception public CalculatorImpl() throws java.rmi.RemoteException { super(); } public long add(long a, long b) throws java.rmi.RemoteException { return a + b; } public long sub(long a, long b) throws java.rmi.RemoteException { return a - b; } public long mul(long a, long b) throws java.rmi.RemoteException { return a * b; } public long div(long a, long b) throws java.rmi.RemoteException { return a / b; } }

Again, copy this code into your directory and compile it.



The implementation class uses UnicastRemoteObject to link into the RMI system. In the example the implementation class directly extends UnicastRemoteObject. This is not a requirement. A class that does not extend UnicastRemoteObject may use its exportObject() method to be linked into RMI.

When a class extends UnicastRemoteObject, it must provide a constructor that declares that it may throw a RemoteException object. When this constructor calls super(), it activates code in UnicastRemoteObject that performs the RMI linking and remote object initialization.

3. Stubs and Skeletons You next use the RMI compiler, rmic, to generate the stub and skeleton files. The compiler runs on the remote service implementation class file.

>rmic CalculatorImpl Try this in your directory. After you run rmic you should find the file Calculator_Stub.class and, if you are running the Java 2 SDK, Calculator_Skel.class.

Options for the JDK 1.1 version of the RMI compiler, rmic, are:

Usage: rmic <options> <class names> where <options> includes: -keep Do not delete intermediate generated source files -keepgenerated (same as "-keep") -g Generate debugging info -depend Recompile out-of-date files recursively -nowarn Generate no warnings -verbose Output messages about what the compiler is doing -classpath <path> Specify where to find input source and class files -d <directory> Specify where to place generated class files -J<runtime flag> Pass argument to the java interpreter



The Java 2 platform version of rmic add three new options:

-v1.1 Create stubs/skeletons for JDK 1.1 stub protocol version -vcompat (default) Create stubs/skeletons compatible with both JDK 1.1 and Java 2 stub protocol versions -v1.2 Create stubs for Java 2 stub protocol version only

4. Host Server Remote RMI services must be hosted in a server process. The class CalculatorServer is a very simple server that provides the bare essentials for hosting.

import java.rmi.Naming; public class CalculatorServer { public CalculatorServer() { try { Calculator c = new CalculatorImpl(); Naming.rebind("rmi://localhost:1099/CalculatorService", c); } catch (Exception e) { System.out.println("Trouble: " + e); } } public static void main(String args[]) { new CalculatorServer(); } }



5. Client The source code for the client follows:

import java.rmi.Naming; import java.rmi.RemoteException; import java.net.MalformedURLException; import java.rmi.NotBoundException; public class CalculatorClient { public static void main(String[] args) { try { Calculator c = (Calculator) Naming.lookup( "rmi://localhost /CalculatorService"); System.out.println( c.sub(4, 3) ); System.out.println( c.add(4, 5) ); System.out.println( c.mul(3, 6) ); System.out.println( c.div(9, 3) ); } catch (MalformedURLException murle) { System.out.println(); System.out.println( "MalformedURLException"); System.out.println(murle); } catch (RemoteException re) { System.out.println(); System.out.println( "RemoteException"); System.out.println(re); } catch (NotBoundException nbe) { System.out.println();



System.out.println( "NotBoundException"); System.out.println(nbe); } catch ( java.lang.ArithmeticException ae) { System.out.println(); System.out.println( "java.lang.ArithmeticException"); System.out.println(ae); } } }

6. Running the RMI System You are now ready to run the system! You need to start three consoles, one for the server, one for the client, and one for the RMIRegistry.

Start with the Registry. You must be in the directory that contains the classes you have written. From there, enter the following:

rmiregistry If all goes well, the registry will start running and you can switch to the next console.

In the second console start the server hosting the CalculatorService, and enter the following:

>java CalculatorServer It will start, load the implementation into memory and wait for a client connection.

In the last console, start the client program.

>java CalculatorClient If all goes well you will see the following output:

1



9 18 3

That's it; you have created a working RMI system. Even though you ran the three consoles on the same computer, RMI uses your network stack and TCP/IP to communicate between the three separate JVMs. This is a full-fledged RMI system.

Exercise

1. UML Definition of RMI Example System 2. Simple Banking System

Parameters in RMI You have seen that RMI supports method calls to remote objects. When these calls involve passing parameters or accepting a return value, how does RMI transfer these between JVMs? What semantics are used? Does RMI support pass-by-value or pass-by-reference? The answer depends on whether the parameters are primitive data types, objects, or remote objects.

Parameters in a Single JVM First, review how parameters are passed in a single JVM. The normal semantics for Java technology is pass-by-value. When a parameter is passed to a method, the JVM makes a copy of the value, places the copy on the stack and then executes the method. When the code inside a method uses a parameter, it accesses its stack and uses the copy of the parameter. Values returned from methods are also copies.

When a primitive data type (boolean, byte, short, int, long, char, float, or double) is passed as a parameter to a method, the mechanics of pass-by-value are straightforward. The mechanics of passing an object as a parameter are more complex. Recall that an object resides in heap memory and is accessed through one or more reference variables. And, while the following code makes it look like an object is passed to the method println()

String s = "Test"; System.out.println(s); in the mechanics it is the reference variable that is passed to the method. In the example, a copy of reference variable s is made (increasing the reference count to the String object by one) and is placed on the stack. Inside the method, code uses the copy of the reference to access the object.

Now you will see how RMI passes parameters and return values between remote JVMs.



Primitive Parameters When a primitive data type is passed as a parameter to a remote method, the RMI system passes it by value. RMI will make a copy of a primitive data type and send it to the remote method. If a method returns a primitive data type, it is also returned to the calling JVM by value.

Values are passed between JVMs in a standard, machine-independent format. This allows JVMs running on different platforms to communicate with each other reliably.

Object Parameters When an object is passed to a remote method, the semantics change from the case of the single JVM. RMI sends the object itself, not its reference, between JVMs. It is the object that is passed by value, not the reference to the object. Similarly, when a remote method returns an object, a copy of the whole object is returned to the calling program.

Unlike primitive data types, sending an object to a remote JVM is a nontrivial task. A Java object can be simple and self-contained, or it could refer to other Java objects in complex graph-like structure. Because different JVMs do not share heap memory, RMI must send the referenced object and all objects it references. (Passing large object graphs can use a lot of CPU time and network bandwidth.)

RMI uses a technology called Object Serialization to transform an object into a linear format that can then be sent over the network wire. Object serialization essentially flattens an object and any objects it references. Serialized objects can be de-serialized in the memory of the remote JVM and made ready for use by a Java program.

Remote Object Parameters RMI introduces a third type of parameter to consider: remote objects. As you have seen, a client program can obtain a reference to a remote object through the RMI Registry program. There is another way in which a client can obtain a remote reference, it can be returned to the client from a method call. In the following code, the BankManager service getAccount() method is used to obtain a remote reference to an Account remote service.

BankManager bm; Account a; try { bm = (BankManager) Naming.lookup( "rmi://BankServer /BankManagerService"



); a = bm.getAccount( "jGuru" ); // Code that uses the account } catch (RemoteException re) { }

In the implementation of getAccount(), the method returns a (local) reference to the remote service.

public Account getAccount(String accountName) { // Code to find the matching account AccountImpl ai = // return reference from search return ai; }

When a method returns a local reference to an exported remote object, RMI does not return that object. Instead, it substitutes another object (the remote proxy for that service) in the return stream.

The following diagram illustrates how RMI method calls might be used to:

• Return a remote reference from Server to Client A • Send the remote reference from Client A to Client B • Send the remote reference from Client B back to Server



Notice that when the AccountImpl object is returned to Client A, the Account proxy object is substituted. Subsequent method calls continue to send the reference first to Client B and then back to Server. During this process, the reference continues to refer to one instance of the remote service.

It is particularly interesting to note that when the reference is returned to Server, it is not converted into a local reference to the implementation object. While this would result in a speed improvement, maintaining this indirection ensures that the semantics of using a remote reference is maintained.

Exercise

3. RMI Parameters

RMI Client-side Callbacks In many architectures, a server may need to make a remote call to a client. Examples include progress feedback, time tick notifications, warnings of problems, etc.

To accomplish this, a client must also act as an RMI server. There is nothing really special about this as RMI works equally well between all computers. However, it may be impractical for a client to extend java.rmi.server.UnicastRemoteObject. In these cases, a remote object may prepare itself for remote use by calling the static method

UnicastRemoteObject.exportObject (<remote_object>) Exercise

4. RMI Client Callbacks

Distributing and Installing RMI Software RMI adds support for a Distributed Class model to the Java platform and extends Java technology's reach to multiple JVMs. It should not be a surprise that installing an RMI system is more involved than setting up a Java runtime on a single computer. In this section, you will learn about the issues related to installing and distributing an RMI based system.

For the purposes of this section, it is assumed that the overall process of designing a DC system has led you to the point where you must consider the allocation of processing to nodes. And you are trying to determine how to install the system onto each node.

Distributing RMI Classes



To run an RMI application, the supporting class files must be placed in locations that can be found by the server and the clients.

For the server, the following classes must be available to its class loader:

• Remote service interface definitions • Remote service implementations • Skeletons for the implementation classes (JDK 1.1 based servers only) • Stubs for the implementation classes • All other server classes For the client, the following classes must be available to its class loader:

• Remote service interface definitions • Stubs for the remote service implementation classes • Server classes for objects used by the client (such as return values) • All other client classes Once you know which files must be on the different nodes, it is a simple task to make sure they are available to each JVM's class loader.

Automatic Distribution of Classes The RMI designers extended the concept of class loading to include the loading of classes from FTP servers and HTTP servers. This is a powerful extension as it means that classes can be deployed in one, or only a few places, and all nodes in a RMI system will be able to get the proper class files to operate.

RMI supports this remote class loading through the RMIClassLoader. If a client or server is running an RMI system and it sees that it must load a class from a remote location, it calls on the RMIClassLoader to do this work.

The way RMI loads classes is controlled by a number of properties. These properties can be set when each JVM is run:

java [ -D<PropertyName>=<PropertyValue> ]+ <ClassFile> The property java.rmi.server.codebase is used to specify a URL. This URL points to a file:, ftp:, or http: location that supplies classes for objects that are sent from this JVM. If a program running in a JVM sends an object to another JVM (as the return value from a method), that other JVM needs to load the class file for that object. When RMI sends the object via serialization of RMI embeds the URL specified by this parameter into the stream, alongside of the object.

Note: RMI does not send class files along with the serialized objects.

If the remote JVM needs to load a class file for an object, it looks for the embedded



URL and contacts the server at that location for the file.

When the property java.rmi.server.useCodebaseOnly is set to true, then the JVM will load classes from either a location specified by the CLASSPATH environment variable or the URL specified in this property.

By using different combinations of the available system properties, a number of different RMI system configurations can be created.

Closed. All classes used by clients and the server must be located on the JVM and referenced by the CLASSPATH environment variable. No dynamic class loading is supported.

Server based. A client applet is loaded from the server's CODEBASE along with all supporting classes. This is similar to the way applets are loaded from the same HTTP server that supports the applet's web page.

Client dynamic. The primary classes are loaded by referencing the CLASSPATH environment variable of the JVM for the client. Supporting classes are loaded by the java.rmi.server.RMIClassLoader from an HTTP or FTP server on the network at a location specified by the server.

Server-dynamic. The primary classes are loaded by referencing the CLASSPATH environment variable of the JVM for the server. Supporting classes are loaded by the java.rmi.server.RMIClassLoader from an HTTP or FTP server on the network at a location specified by the client.

Bootstrap client. In this configuration, all of the client code is loaded from an HTTP or FTP server across the network. The only code residing on the client machine is a small bootstrap loader.

Bootstrap server. In this configuration, all of the server code is loaded from an HTTP or FTP server located on the network. The only code residing on the server machine is a small bootstrap loader.

The exercise for this section involves creating a bootstrap client configuration. Please follow the directions carefully as different files need to be placed and compiled within separate directories.

Exercise

5. Bootstrap Example



Firewall Issues Firewalls are inevitably encountered by any networked enterprise application that has to operate beyond the sheltering confines of an Intranet. Typically, firewalls block all network traffic, with the exception of those intended for certain "well-known" ports.

Since the RMI transport layer opens dynamic socket connections between the client and the server to facilitate communication, the JRMP traffic is typically blocked by most firewall implementations. But luckily, the RMI designers had anticipated this problem, and a solution is provided by the RMI transport layer itself. To get across firewalls, RMI makes use of HTTP tunneling by encapsulating the RMI calls within an HTTP POST request.

Now, examine how HTTP tunneling of RMI traffic works by taking a closer look at the possible scenarios: the RMI client, the server, or both can be operating from behind a firewall. The following diagram shows the scenario where an RMI client located behind a firewall communicates with an external server.

In the above scenario, when the transport layer tries to establish a connection with the server, it is blocked by the firewall. When this happens, the RMI transport layer automatically retries by encapsulating the JRMP call data within an HTTP POST request. The HTTP POST header for the call is in the form:

http://hostname:port If a client is behind a firewall, it is important that you also set the system property http.proxyHost appropriately. Since almost all firewalls recognize the HTTP protocol, the specified proxy server should be able to forward the call directly to the port on which the remote server is listening on the outside. Once the HTTP-encapsulated JRMP data is received at the server, it is automatically decoded and dispatched by the RMI transport layer. The reply is then sent back to client as HTTP-encapsulated data.



The following diagram shows the scenario when both the RMI client and server are behind firewalls, or when the client proxy server can forward data only to the well-known HTTP port 80 at the server.

In this case, the RMI transport layer uses one additional level of indirection! This is because the client can no longer send the HTTP-encapsulated JRMP calls to arbitrary ports as the server is also behind a firewall. Instead, the RMI transport layer places JRMP call inside the HTTP packets and send those packets to port 80 of the server. The HTTP POST header is now in the form

http://hostname:80/cgi-bin/java-rmi?forward=<port> This causes the execution of the CGI script, java-rmi.cgi, which in turn invokes a local JVM, unbundles the HTTP packet, and forwards the call to the server process on the designated port. RMI JRMP-based replies from the server are sent back as HTTP REPLY packets to the originating client port where RMI again unbundles the information and sends it to the appropriate RMI stub.

Of course, for this to work, the java-rmi.cgi script, which is included within the standard JDK 1.1 or Java 2 platform distribution, must be preconfigured with the path of the Java interpreter and located within the web server's cgi-bin directory. It is also equally important for the RMI server to specify the host's fully-qualified domain name via a system property upon startup to avoid any DNS resolution problems, as:

java.rmi.server.hostname=host.domain.com Note: Rather than making use of CGI script for the call forwarding, it is more efficient to use a servlet implementation of the same. You should be able to obtain the servlet's source code from Sun's RMI FAQ.



It should be noted that notwithstanding the built-in mechanism for overcoming firewalls, RMI suffers a significant performance degradation imposed by HTTP tunneling. There are other disadvantages to using HTTP tunneling too. For instance, your RMI application will no longer be able to multiplex JRMP calls on a single connection, since it would now follow a discrete request/response protocol. Additionally, using the java-rmi.cgi script exposes a fairly large security loophole on your server machine, as now, the script can redirect any incoming request to any port, completely bypassing your firewalling mechanism. Developers should also note that using HTTP tunneling precludes RMI applications from using callbacks, which in itself could be a major design constraint. Consequently, if a client detects a firewall, it can always disable the default HTTP tunneling feature by setting the property:

java.rmi.server.disableHttp=true

Back to Top

Distributed Garbage Collection One of the joys of programming for the Java platform is not worrying about memory allocation. The JVM has an automatic garbage collector that will reclaim the memory from any object that has been discarded by the running program.

One of the design objectives for RMI was seamless integration into the Java programming language, which includes garbage collection. Designing an efficient single-machine garbage collector is hard; designing a distributed garbage collector is very hard.

The RMI system provides a reference counting distributed garbage collection algorithm based on Modula-3's Network Objects. This system works by having the server keep track of which clients have requested access to remote objects running on the server. When a reference is made, the server marks the object as "dirty" and when a client drops the reference, it is marked as being "clean."

The interface to the DGC (distributed garbage collector) is hidden in the stubs and skeletons layer. However, a remote object can implement the java.rmi.server.Unreferenced interface and get a notification via the unreferenced method when there are no longer any clients holding a live reference.

In addition to the reference counting mechanism, a live client reference has a lease with a specified time. If a client does not refresh the connection to the remote object before the lease term expires, the reference is considered to be dead and the remote object may be garbage collected. The lease time is controlled by the system property java.rmi.dgc.leaseValue. The value is in milliseconds and defaults to 10 minutes.

Because of these garbage collection semantics, a client must be prepared to deal with remote objects that have "disappeared."



In the following exercise, you will have the opportunity to experiment with the distributed garbage collector.

Exercise

6. Distributed Garbage Collection

Serializing Remote Objects When designing a system using RMI, there are times when you would like to have the flexibility to control where a remote object runs. Today, when a remote object is brought to life on a particular JVM, it will remain on that JVM. You cannot "send" the remote object to another machine for execution at a new location. RMI makes it difficult to have the option of running a service locally or remotely.

The very reason RMI makes it easy to build some distributed application can make it difficult to move objects between JVMs. When you declare that an object implements the java.rmi.Remote interface, RMI will prevent it from being serialized and sent between JVMs as a parameter. Instead of sending the implementation class for a java.rmi.Remote interface, RMI substitutes the stub class. Because this substitution occurs in the RMI internal code, one cannot intercept this operation.

There are two different ways to solve this problem. The first involves manually serializing the remote object and sending it to the other JVM. To do this, there are two strategies. The first strategy is to create an ObjectInputStream and ObjectOutputStream connection between the two JVMs. With this, you can explicitly write the remote object to the stream. The second way is to serialize the object into a byte array and send the byte array as the return value to an RMI method call. Both of these techniques require that you code at a level below RMI and this can lead to extra coding and maintenance complications.

In a second strategy, you can use a delegation pattern. In this pattern, you place the core functionality into a class that:

• Does not implement java.rmi.Remote • Does implement java.io.Serializable Then you build a remote interface that declares remote access to the functionality. When you create an implementation of the remote interface, instead of reimplementing the functionality, you allow the remote implementation to defer, or delegate, to an instance of the local version.

Now look at the building blocks of this pattern. Note that this is a very simple example. A real-world example would have a significant number of local fields and methods.



// Place functionality in a local object public class LocalModel implements java.io.Serializable { public String getVersionNumber() { return "Version 1.0"; } }

Next, you declare an java.rmi.Remote interface that defines the same functionality:

interface RemoteModelRef extends java.rmi.Remote { String getVersionNumber() throws java.rmi.RemoteException; }

The implementation of the remote service accepts a reference to the LocalModel and delegates the real work to that object:

public class RemoteModelImpl extends java.rmi.server.UnicastRemoteObject implements RemoteModelRef { LocalModel lm; public RemoteModelImpl (LocalModel lm) throws java.rmi.RemoteException { super(); this.lm = lm; }



// Delegate to the local //model implementation public String getVersionNumber() throws java.rmi.RemoteException { return lm.getVersionNumber(); } }

Finally, you define a remote service that provides access to clients. This is done with a java.r mi.Remote interface and an implementation:

interface RemoteModelMgr extends java.rmi.Remote { RemoteModelRef getRemoteModelRef() throws java.rmi.RemoteException; LocalModel getLocalModel() throws java.rmi.RemoteException; } public class RemoteModelMgrImpl extends java.rmi.server.UnicastRemoteObject implements RemoteModelMgr { LocalModel lm; RemoteModelImpl rmImpl; public RemoteModelMgrImpl() throws java.rmi.RemoteException { super(); } public RemoteModelRef getRemoteModelRef() throws java.rmi.RemoteException { // Lazy instantiation of delgatee



if (null == lm) { lm = new LocalModel(); } // Lazy instantiation of //Remote Interface Wrapper if (null == rmImpl) { rmImpl = new RemoteModelImpl (lm); } return ((RemoteModelRef) rmImpl); } public LocalModel getLocalModel() throws java.rmi.RemoteException { // Return a reference to the //same LocalModel // that exists as the delagetee //of the RMI remote // object wrapper // Lazy instantiation of delgatee if (null == lm) { lm = new LocalModel(); } return lm; } }

Exercises

7. Serializing Remote Objects: Server 8. Serializing Remote Objects: Client

Mobile Agent Architectures



The solution to the mobile computing agent using RMI is, at best, a work-around. Other distributed Java architectures have been designed to address this issue and others. These are collectively called mobile agent architectures. Some examples are IBM's Aglets Architecture and ObjectSpace's Voyager System. These systems are specifically designed to allow and support the movement of Java objects between JVMs, carrying their data along with their execution instructions.

Alternate Implementations This module has covered the RMI architecture and Sun's implementation. There are other implementations available, including:

• NinjaRMI A free implementation built at the University of California, Berkeley. Ninja supports the JDK 1.1 version of RMI, with extensions.

• BEA Weblogic Server BEA Weblogic Server is a high performance, secure Application Server that supports RMI, Microsoft COM, CORBA, and EJB (Enterprise JavaBeans), and other services.

• Voyager ObjectSpace's Voyager product transparently supports RMI along with a proprietary DOM, CORBA, EJB, Microsoft's DCOM, and transaction services.

Additional Resources Books and Articles • Design Patterns, by Erich Gamma, Richard Helm, Ralph Johnson, and John

Vlissides (The Gang of Four) • Sun's RMI FAQ • RMI over IIOP • RMI-USERS Mailing List Archive • Implementing Callbacks with Java RMI, by Govind Seshadri, Dr. Dobb's

Journal, March 1998 Copyright 1996-2000 jGuru.com. All Rights Reserved.

Back to Top About This Course Exercises Download

_______ 1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.



Java RMI Architecture The design goal for the RMI architecture was to create a Java distributed object model that integrates naturally into the Java programming language and the local object model. RMI architects have succeeded; creating a system that extends the safety and robustness of the Java architecture to the distributed computing world.

Interfaces: The Heart of RMI The RMI architecture is based on one important principle: the definition of behavior and the implementation of that behavior are separate concepts. RMI allows the code that defines the behavior and the code that implements the behavior to remain separate and to run on separate JVMs.

This fits nicely with the needs of a distributed system where clients are concerned about the definition of a service and servers are focused on providing the service.

Specifically, in RMI, the definition of a remote service is coded using a Java interface. The implementation of the remote service is coded in a class. Therefore, the key to understanding RMI is to remember that interfaces define behavior and classes define implementation.

While the following diagram illustrates this separation,

remember that a Java interface does not contain executable code. RMI supports two classes that implement the same interface. The first class is the implementation of the behavior, and it runs on the server. The second class acts as a proxy for the remote service and it runs on the client. This is shown in the following diagram.



A client program makes method calls on the proxy object, RMI sends the request to the remote JVM, and forwards it to the implementation. Any return values provided by the implementation are sent back to the proxy and then to the client's program.

RMI Architecture Layers With an understanding of the high-level RMI architecture, take a look under the covers to see its implementation.

The RMI implementation is essentially built from three abstraction layers. The first is the Stub and Skeleton layer, which lies just beneath the view of the developer. This layer intercepts method calls made by the client to the interface reference variable and redirects these calls to a remote RMI service.

The next layer is the Remote Reference Layer. This layer understands how to interpret and manage references made from clients to the remote service objects. In JDK 1.1, this layer connects clients to remote service objects that are running and exported on a server. The connection is a one-to-one (unicast) link. In the Java 2 SDK, this layer was enhanced to support the activation of dormant remote service objects via Remote Object Activation.

The transport layer is based on TCP/IP connections between machines in a network. It provides basic connectivity, as well as some firewall penetration strategies.



By using a layered architecture each of the layers could be enhanced or replaced without affecting the rest of the system. For example, the transport layer could be replaced by a UDP/IP layer without affecting the upper layers.

Stub and Skeleton Layer The stub and skeleton layer of RMI lie just beneath the view of the Java developer. In this layer, RMI uses the Proxy design pattern as described in the book, Design Patterns by Gamma, Helm, Johnson and Vlissides. In the Proxy pattern, an object in one context is represented by another (the proxy) in a separate context. The proxy knows how to forward method calls between the participating objects. The following class diagram illustrates the Proxy pattern.

In RMI's use of the Proxy pattern, the stub class plays the role of the proxy, and the remote service implementation class plays the role of the RealSubject.

A skeleton is a helper class that is generated for RMI to use. The skeleton understands how to communicate with the stub across the RMI link. The skeleton carries on a conversation with the stub; it reads the parameters for the method call from the link, makes the call to the remote service implementation object, accepts the return value, and then writes the return value back to the stub.

In the Java 2 SDK implementation of RMI, the new wire protocol has made skeleton classes obsolete. RMI uses reflection to make the connection to the remote service object. You only have to worry about skeleton classes and objects in JDK 1.1 and JDK 1.1 compatible system implementations.

Remote Reference Layer The Remote Reference Layers defines and supports the invocation semantics of the RMI connection. This layer provides a RemoteRef object that represents the link to the



remote service implementation object.

The stub objects use the invoke() method in RemoteRef to forward the method call. The RemoteRef object understands the invocation semantics for remote services.

The JDK 1.1 implementation of RMI provides only one way for clients to connect to remote service implementations: a unicast, point-to-point connection. Before a client can use a remote service, the remote service must be instantiated on the server and exported to the RMI system. (If it is the primary service, it must also be named and registered in the RMI Registry).

The Java 2 SDK implementation of RMI adds a new semantic for the client-server connection. In this version, RMI supports activatable remote objects. When a method call is made to the proxy for an activatable object, RMI determines if the remote service implementation object is dormant. If it is dormant, RMI will instantiate the object and restore its state from a disk file. Once an activatable object is in memory, it behaves just like JDK 1.1 remote service implementation objects.

Other types of connection semantics are possible. For example, with multicast, a single proxy could send a method request to multiple implementations simultaneously and accept the first reply (this improves response time and possibly improves availability). In the future, Sun may add additional invocation semantics to RMI.

Transport Layer The Transport Layer makes the connection between JVMs. All connections are stream-based network connections that use TCP/IP.

Even if two JVMs are running on the same physical computer, they connect through their host computer's TCP/IP network protocol stack. (This is why you must have an operational TCP/IP configuration on your computer to run the Exercises in this course). The following diagram shows the unfettered use of TCP/IP connections between JVMs.

As you know, TCP/IP provides a persistent, stream-based connection between two machines based on an IP address and port number at each end. Usually a DNS name is



used instead of an IP address; this means you could talk about a TCP/IP connection between flicka.magelang.com:3452 and rosa.jguru.com:4432. In the current release of RMI, TCP/IP connections are used as the foundation for all machine-to-machine connections.

On top of TCP/IP, RMI uses a wire level protocol called Java Remote Method Protocol (JRMP). JRMP is a proprietary, stream-based protocol that is only partially specified is now in two versions. The first version was released with the JDK 1.1 version of RMI and required the use of Skeleton classes on the server. The second version was released with the Java 2 SDK. It has been optimized for performance and does not require skeleton classes. (Note that some alternate implementations, such as BEA Weblogic and NinjaRMI do not use JRMP, but instead use their own wire level protocol. ObjectSpace's Voyager does recognize JRMP and will interoperate with RMI at the wire level.) Some other changes with the Java 2 SDK are that RMI service interfaces are not required to extend from java.rmi.Remote and their service methods do not necessarily throw RemoteException.

Sun and IBM have jointly worked on the next version of RMI, called RMI-IIOP, which will be available with Java 2 SDK Version 1.3. The interesting thing about RMI-IIOP is that instead of using JRMP, it will use the Object Management Group (OMG) Internet Inter-ORB Protocol, IIOP, to communicate between clients and servers.

The OMG is a group of more than 800 members that defines a vendor-neutral, distributed object architecture called Common Object Request Broker Architecture (CORBA). CORBA Object Request Broker (ORB) clients and servers communicate with each other using IIOP. With the adoption of the Objects-by-Value extension to CORBA and the Java Language to IDL Mapping proposal, the ground work was set for direct RMI to CORBA integration. This new RMI-IIOP implementation supports most of the RMI feature set, except for:

• java.rmi.server.RMISocketFactory • UnicastRemoteObject • Unreferenced • The DGC interfaces The RMI transport layer is designed to make a connection between clients and server, even in the face of networking obstacles.

While the transport layer prefers to use multiple TCP/IP connections, some network configurations only allow a single TCP/IP connection between a client and server (some browsers restrict applets to a single network connection back to their hosting server).

In this case, the transport layer multiplexes multiple virtual connections within a single TCP/IP connection.

Naming Remote Objects



During the presentation of the RMI Architecture, one question has been repeatedly postponed: "How does a client find an RMI remote service? " Now you'll find the answer to that question. Clients find remote services by using a naming or directory service. This may seem like circular logic. How can a client locate a service by using a service? In fact, that is exactly the case. A naming or directory service is run on a well-known host and port number.


RMI can use many different directory services, including the Java Naming and Directory Interface (JNDI). RMI itself includes a simple service called the RMI Registry, rmiregistry. The RMI Registry runs on each machine that hosts remote service objects and accepts queries for services, by default on port 1099.






• Interface definitions for the remote services • Implementations of the remote services • Stub and Skeleton files



• A server to host the remote services • An RMI Naming service that allows clients to find the remote services • A class file provider (an HTTP or FTP server) • A client program that needs the remote services In the next sections, you will build a simple RMI system in a step-by-step fashion. You are encouraged to create a fresh subdirectory on your computer and create these files as you read the text. To simplify things, you will use a single directory for the client and server code. By running the client and the server out of the same directory, you will not have to set up an HTTP or FTP server to provide the class files. (Details about how to use HTTP and FTP servers as class file providers will be covered in the section on Distributing and Installing RMI Software)

Assuming that the RMI system is already designed, you take the following steps to build a system:

1. Write and compile Java code for interfaces 2. Write and compile Java code for implementation classes 3. Generate Stub and Skeleton class files from the implementation classes 4. Write Java code for a remote service host program 5. Develop Java code for RMI client program 6. Install and run RMI system

1. Interfaces The first step is to write and compile the Java code for the service interface. The Calculator interface defines all of the remote features offered by the service:







Next, you write the implementation for the remote service. This is the CalculatorImpl class:

public class CalculatorImpl extends java.rmi.server.UnicastRemoteObject implements Calculator { // Implementations must have an //explicit constructor // in order to declare the //RemoteException exception public CalculatorImpl() throws java.rmi.RemoteException {



super(); } public long add(long a, long b) throws java.rmi.RemoteException { return a + b; } public long sub(long a, long b) throws java.rmi.RemoteException { return a - b; } public long mul(long a, long b) throws java.rmi.RemoteException { return a * b; } public long div(long a, long b) throws java.rmi.RemoteException { return a / b; } }









Usage: rmic <options> <class names> where <options> includes: -keep Do not delete intermediate generated source files -keepgenerated (same as "-keep") -g Generate debugging info -depend Recompile out-of-date files recursively -nowarn Generate no warnings -verbose Output messages about what the compiler is doing -classpath <path> Specify where to find input source and class files -d <directory> Specify where to place generated class files -J<runtime flag> Pass argument to the java interpreter


-v1.1 Create stubs/skeletons for JDK 1.1 stub protocol version -vcompat (default) Create stubs/skeletons compatible with both JDK 1.1 and Java 2 stub protocol versions



-v1.2 Create stubs for Java 2 stub protocol version only


import java.rmi.Naming; public class CalculatorServer { public CalculatorServer() { try { Calculator c = new CalculatorImpl(); Naming.rebind("rmi://localhost:1099/CalculatorService", c); } catch (Exception e) { System.out.println("Trouble: " + e); } } public static void main(String args[]) { new CalculatorServer(); } }




import java.rmi.Naming; import java.rmi.RemoteException; import java.net.MalformedURLException; import java.rmi.NotBoundException; public class CalculatorClient { public static void main(String[] args) { try { Calculator c = (Calculator) Naming.lookup( "rmi://localhost /CalculatorService"); System.out.println( c.sub(4, 3) ); System.out.println( c.add(4, 5) ); System.out.println( c.mul(3, 6) ); System.out.println( c.div(9, 3) ); } catch (MalformedURLException murle) { System.out.println(); System.out.println( "MalformedURLException"); System.out.println(murle); } catch (RemoteException re) { System.out.println(); System.out.println( "RemoteException"); System.out.println(re); } catch (NotBoundException nbe) { System.out.println(); System.out.println( "NotBoundException"); System.out.println(nbe); } catch ( java.lang.ArithmeticException ae) { System.out.println(); System.out.println( "java.lang.ArithmeticException"); System.out.println(ae);



} } }





>java CalculatorServer It will start, load the implementation into memory and wait for a client connection.



1 9 18 3

That's it; you have created a working RMI system. Even though you ran the three consoles on the same computer, RMI uses your network stack and TCP/IP to communicate between the three separate JVMs. This is a full-fledged RMI system.

Exercise



1. UML Definition of RMI Example System 2. Simple Banking System

Parameters in RMI You have seen that RMI supports method calls to remote objects. When these calls involve passing parameters or accepting a return value, how does RMI transfer these between JVMs? What semantics are used? Does RMI support pass-by-value or pass-by-reference? The answer depends on whether the parameters are primitive data types, objects, or remote objects.

Parameters in a Single JVM First, review how parameters are passed in a single JVM. The normal semantics for Java technology is pass-by-value. When a parameter is passed to a method, the JVM makes a copy of the value, places the copy on the stack and then executes the method. When the code inside a method uses a parameter, it accesses its stack and uses the copy of the parameter. Values returned from methods are also copies.

When a primitive data type (boolean, byte, short, int, long, char, float, or double) is passed as a parameter to a method, the mechanics of pass-by-value are straightforward. The mechanics of passing an object as a parameter are more complex. Recall that an object resides in heap memory and is accessed through one or more reference variables. And, while the following code makes it look like an object is passed to the method println()

String s = "Test"; System.out.println(s); in the mechanics it is the reference variable that is passed to the method. In the example, a copy of reference variable s is made (increasing the reference count to the String object by one) and is placed on the stack. Inside the method, code uses the copy of the reference to access the object.

Now you will see how RMI passes parameters and return values between remote JVMs.

Primitive Parameters When a primitive data type is passed as a parameter to a remote method, the RMI system passes it by value. RMI will make a copy of a primitive data type and send it to the remote method. If a method returns a primitive data type, it is also returned to the calling JVM by value.




Object Parameters When an object is passed to a remote method, the semantics change from the case of the single JVM. RMI sends the object itself, not its reference, between JVMs. It is the object that is passed by value, not the reference to the object. Similarly, when a remote method returns an object, a copy of the whole object is returned to the calling program.



Remote Object Parameters RMI introduces a third type of parameter to consider: remote objects. As you have seen, a client program can obtain a reference to a remote object through the RMI Registry program. There is another way in which a client can obtain a remote reference, it can be returned to the client from a method call. In the following code, the BankManager service getAccount() method is used to obtain a remote reference to an Account remote service.

BankManager bm; Account a; try { bm = (BankManager) Naming.lookup( "rmi://BankServer /BankManagerService" ); a = bm.getAccount( "jGuru" ); // Code that uses the account } catch (RemoteException re) { }

In the implementation of getAccount(), the method returns a (local) reference to the



remote service.

public Account getAccount(String accountName) { // Code to find the matching account AccountImpl ai = // return reference from search return ai; }

When a method returns a local reference to an exported remote object, RMI does not return that object. Instead, it substitutes another object (the remote proxy for that service) in the return stream.

The following diagram illustrates how RMI method calls might be used to:

• Return a remote reference from Server to Client A • Send the remote reference from Client A to Client B • Send the remote reference from Client B back to Server

Notice that when the AccountImpl object is returned to Client A, the Account proxy object is substituted. Subsequent method calls continue to send the reference first to Client B and then back to Server. During this process, the reference continues to refer to one instance of the remote service.

It is particularly interesting to note that when the reference is returned to Server, it is not converted into a local reference to the implementation object. While this would result in a speed improvement, maintaining this indirection ensures that the semantics of using a remote reference is maintained.



Exercise

3. RMI Parameters



UnicastRemoteObject.exportObject (<remote_object>) Exercise

4. RMI Client Callbacks



Distributing RMI Classes To run an RMI application, the supporting class files must be placed in locations that can be found by the server and the clients.


• Remote service interface definitions • Remote service implementations • Skeletons for the implementation classes (JDK 1.1 based servers only) • Stubs for the implementation classes



• All other server classes For the client, the following classes must be available to its class loader:

• Remote service interface definitions • Stubs for the remote service implementation classes • Server classes for objects used by the client (such as return values) • All other client classes Once you know which files must be on the different nodes, it is a simple task to make sure they are available to each JVM's class loader.

Automatic Distribution of Classes The RMI designers extended the concept of class loading to include the loading of classes from FTP servers and HTTP servers. This is a powerful extension as it means that classes can be deployed in one, or only a few places, and all nodes in a RMI system will be able to get the proper class files to operate.

RMI supports this remote class loading through the RMIClassLoader. If a client or server is running an RMI system and it sees that it must load a class from a remote location, it calls on the RMIClassLoader to do this work.

The way RMI loads classes is controlled by a number of properties. These properties can be set when each JVM is run:

java [ -D<PropertyName>=<PropertyValue> ]+ <ClassFile> The property java.rmi.server.codebase is used to specify a URL. This URL points to a file:, ftp:, or http: location that supplies classes for objects that are sent from this JVM. If a program running in a JVM sends an object to another JVM (as the return value from a method), that other JVM needs to load the class file for that object. When RMI sends the object via serialization of RMI embeds the URL specified by this parameter into the stream, alongside of the object.

Note: RMI does not send class files along with the serialized objects.

If the remote JVM needs to load a class file for an object, it looks for the embedded URL and contacts the server at that location for the file.

When the property java.rmi.server.useCodebaseOnly is set to true, then the JVM will load classes from either a location specified by the CLASSPATH environment variable or the URL specified in this property.

By using different combinations of the available system properties, a number of different RMI system configurations can be created.

Closed. All classes used by clients and the server must be located on the JVM and



referenced by the CLASSPATH environment variable. No dynamic class loading is supported.

Server based. A client applet is loaded from the server's CODEBASE along with all supporting classes. This is similar to the way applets are loaded from the same HTTP server that supports the applet's web page.

Client dynamic. The primary classes are loaded by referencing the CLASSPATH environment variable of the JVM for the client. Supporting classes are loaded by the java.rmi.server.RMIClassLoader from an HTTP or FTP server on the network at a location specified by the server.

Server-dynamic. The primary classes are loaded by referencing the CLASSPATH environment variable of the JVM for the server. Supporting classes are loaded by the java.rmi.server.RMIClassLoader from an HTTP or FTP server on the network at a location specified by the client.

Bootstrap client. In this configuration, all of the client code is loaded from an HTTP or FTP server across the network. The only code residing on the client machine is a small bootstrap loader.

Bootstrap server. In this configuration, all of the server code is loaded from an HTTP or FTP server located on the network. The only code residing on the server machine is a small bootstrap loader.

The exercise for this section involves creating a bootstrap client configuration. Please follow the directions carefully as different files need to be placed and compiled within separate directories.

Exercise

5. Bootstrap Example

Firewall Issues Firewalls are inevitably encountered by any networked enterprise application that has to operate beyond the sheltering confines of an Intranet. Typically, firewalls block all network traffic, with the exception of those intended for certain "well-known" ports.

Since the RMI transport layer opens dynamic socket connections between the client and the server to facilitate communication, the JRMP traffic is typically blocked by most firewall implementations. But luckily, the RMI designers had anticipated this problem, and a solution is provided by the RMI transport layer itself. To get across firewalls, RMI makes use of HTTP tunneling by encapsulating the RMI calls within an HTTP



POST request.

Now, examine how HTTP tunneling of RMI traffic works by taking a closer look at the possible scenarios: the RMI client, the server, or both can be operating from behind a firewall. The following diagram shows the scenario where an RMI client located behind a firewall communicates with an external server.

In the above scenario, when the transport layer tries to establish a connection with the server, it is blocked by the firewall. When this happens, the RMI transport layer automatically retries by encapsulating the JRMP call data within an HTTP POST request. The HTTP POST header for the call is in the form:

http://hostname:port If a client is behind a firewall, it is important that you also set the system property http.proxyHost appropriately. Since almost all firewalls recognize the HTTP protocol, the specified proxy server should be able to forward the call directly to the port on which the remote server is listening on the outside. Once the HTTP-encapsulated JRMP data is received at the server, it is automatically decoded and dispatched by the RMI transport layer. The reply is then sent back to client as HTTP-encapsulated data.

The following diagram shows the scenario when both the RMI client and server are behind firewalls, or when the client proxy server can forward data only to the well-known HTTP port 80 at the server.



In this case, the RMI transport layer uses one additional level of indirection! This is because the client can no longer send the HTTP-encapsulated JRMP calls to arbitrary ports as the server is also behind a firewall. Instead, the RMI transport layer places JRMP call inside the HTTP packets and send those packets to port 80 of the server. The HTTP POST header is now in the form

http://hostname:80/cgi-bin/java-rmi?forward=<port> This causes the execution of the CGI script, java-rmi.cgi, which in turn invokes a local JVM, unbundles the HTTP packet, and forwards the call to the server process on the designated port. RMI JRMP-based replies from the server are sent back as HTTP REPLY packets to the originating client port where RMI again unbundles the information and sends it to the appropriate RMI stub.

Of course, for this to work, the java-rmi.cgi script, which is included within the standard JDK 1.1 or Java 2 platform distribution, must be preconfigured with the path of the Java interpreter and located within the web server's cgi-bin directory. It is also equally important for the RMI server to specify the host's fully-qualified domain name via a system property upon startup to avoid any DNS resolution problems, as:

java.rmi.server.hostname=host.domain.com Note: Rather than making use of CGI script for the call forwarding, it is more efficient to use a servlet implementation of the same. You should be able to obtain the servlet's source code from Sun's RMI FAQ.

It should be noted that notwithstanding the built-in mechanism for overcoming firewalls, RMI suffers a significant performance degradation imposed by HTTP tunneling. There are other disadvantages to using HTTP tunneling too. For instance, your RMI application will no longer be able to multiplex JRMP calls on a single



connection, since it would now follow a discrete request/response protocol. Additionally, using the java-rmi.cgi script exposes a fairly large security loophole on your server machine, as now, the script can redirect any incoming request to any port, completely bypassing your firewalling mechanism. Developers should also note that using HTTP tunneling precludes RMI applications from using callbacks, which in itself could be a major design constraint. Consequently, if a client detects a firewall, it can always disable the default HTTP tunneling feature by setting the property:

java.rmi.server.disableHttp=true

Back to Top

Distributed Garbage Collection One of the joys of programming for the Java platform is not worrying about memory allocation. The JVM has an automatic garbage collector that will reclaim the memory from any object that has been discarded by the running program.

One of the design objectives for RMI was seamless integration into the Java programming language, which includes garbage collection. Designing an efficient single-machine garbage collector is hard; designing a distributed garbage collector is very hard.

The RMI system provides a reference counting distributed garbage collection algorithm based on Modula-3's Network Objects. This system works by having the server keep track of which clients have requested access to remote objects running on the server. When a reference is made, the server marks the object as "dirty" and when a client drops the reference, it is marked as being "clean."

The interface to the DGC (distributed garbage collector) is hidden in the stubs and skeletons layer. However, a remote object can implement the java.rmi.server.Unreferenced interface and get a notification via the unreferenced method when there are no longer any clients holding a live reference.

In addition to the reference counting mechanism, a live client reference has a lease with a specified time. If a client does not refresh the connection to the remote object before the lease term expires, the reference is considered to be dead and the remote object may be garbage collected. The lease time is controlled by the system property java.rmi.dgc.leaseValue. The value is in milliseconds and defaults to 10 minutes.

Because of these garbage collection semantics, a client must be prepared to deal with remote objects that have "disappeared."

In the following exercise, you will have the opportunity to experiment with the distributed garbage collector.



Exercise

6. Distributed Garbage Collection

Serializing Remote Objects When designing a system using RMI, there are times when you would like to have the flexibility to control where a remote object runs. Today, when a remote object is brought to life on a particular JVM, it will remain on that JVM. You cannot "send" the remote object to another machine for execution at a new location. RMI makes it difficult to have the option of running a service locally or remotely.

The very reason RMI makes it easy to build some distributed application can make it difficult to move objects between JVMs. When you declare that an object implements the java.rmi.Remote interface, RMI will prevent it from being serialized and sent between JVMs as a parameter. Instead of sending the implementation class for a java.rmi.Remote interface, RMI substitutes the stub class. Because this substitution occurs in the RMI internal code, one cannot intercept this operation.

There are two different ways to solve this problem. The first involves manually serializing the remote object and sending it to the other JVM. To do this, there are two strategies. The first strategy is to create an ObjectInputStream and ObjectOutputStream connection between the two JVMs. With this, you can explicitly write the remote object to the stream. The second way is to serialize the object into a byte array and send the byte array as the return value to an RMI method call. Both of these techniques require that you code at a level below RMI and this can lead to extra coding and maintenance complications.

In a second strategy, you can use a delegation pattern. In this pattern, you place the core functionality into a class that:

• Does not implement java.rmi.Remote • Does implement java.io.Serializable Then you build a remote interface that declares remote access to the functionality. When you create an implementation of the remote interface, instead of reimplementing the functionality, you allow the remote implementation to defer, or delegate, to an instance of the local version.

Now look at the building blocks of this pattern. Note that this is a very simple example. A real-world example would have a significant number of local fields and methods.

// Place functionality in a local object



public class LocalModel implements java.io.Serializable { public String getVersionNumber() { return "Version 1.0"; } }

Next, you declare an java.rmi.Remote interface that defines the same functionality:

interface RemoteModelRef extends java.rmi.Remote { String getVersionNumber() throws java.rmi.RemoteException; }

The implementation of the remote service accepts a reference to the LocalModel and delegates the real work to that object:

public class RemoteModelImpl extends java.rmi.server.UnicastRemoteObject implements RemoteModelRef { LocalModel lm; public RemoteModelImpl (LocalModel lm) throws java.rmi.RemoteException { super(); this.lm = lm; } // Delegate to the local //model implementation



public String getVersionNumber() throws java.rmi.RemoteException { return lm.getVersionNumber(); } }

Finally, you define a remote service that provides access to clients. This is done with a java.r mi.Remote interface and an implementation:

interface RemoteModelMgr extends java.rmi.Remote { RemoteModelRef getRemoteModelRef() throws java.rmi.RemoteException; LocalModel getLocalModel() throws java.rmi.RemoteException; } public class RemoteModelMgrImpl extends java.rmi.server.UnicastRemoteObject implements RemoteModelMgr { LocalModel lm; RemoteModelImpl rmImpl; public RemoteModelMgrImpl() throws java.rmi.RemoteException { super(); } public RemoteModelRef getRemoteModelRef() throws java.rmi.RemoteException { // Lazy instantiation of delgatee if (null == lm) {



lm = new LocalModel(); } // Lazy instantiation of //Remote Interface Wrapper if (null == rmImpl) { rmImpl = new RemoteModelImpl (lm); } return ((RemoteModelRef) rmImpl); } public LocalModel getLocalModel() throws java.rmi.RemoteException { // Return a reference to the //same LocalModel // that exists as the delagetee //of the RMI remote // object wrapper // Lazy instantiation of delgatee if (null == lm) { lm = new LocalModel(); } return lm; } }

Exercises

7. Serializing Remote Objects: Server 8. Serializing Remote Objects: Client

Mobile Agent Architectures The solution to the mobile computing agent using RMI is, at best, a work-around. Other distributed Java architectures have been designed to address this issue and others. These



are collectively called mobile agent architectures. Some examples are IBM's Aglets Architecture and ObjectSpace's Voyager System. These systems are specifically designed to allow and support the movement of Java objects between JVMs, carrying their data along with their execution instructions.

Alternate Implementations This module has covered the RMI architecture and Sun's implementation. There are other implementations available, including:

• NinjaRMI A free implementation built at the University of California, Berkeley. Ninja supports the JDK 1.1 version of RMI, with extensions.

• BEA Weblogic Server BEA Weblogic Server is a high performance, secure Application Server that supports RMI, Microsoft COM, CORBA, and EJB (Enterprise JavaBeans), and other services.

• Voyager ObjectSpace's Voyager product transparently supports RMI along with a proprietary DOM, CORBA, EJB, Microsoft's DCOM, and transaction services.

Additional Resources Books and Articles • Design Patterns, by Erich Gamma, Richard Helm, Ralph Johnson, and John

Vlissides (The Gang of Four) • Sun's RMI FAQ • RMI over IIOP • RMI-USERS Mailing List Archive • Implementing Callbacks with Java RMI, by Govind Seshadri, Dr. Dobb's

Journal, March 1998 Copyright 1996-2000 jGuru.com. All Rights Reserved.

Back to Top About This Course Exercises Download

_______ 1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.

Chapter 14 Naming remote objects


Naming Remote Objects During the presentation of the RMI Architecture, one question has been repeatedly postponed: "How does a client find an RMI remote service? " Now you'll find the answer to that question. Clients find remote services by using a naming or directory service. This may seem like circular logic. How can a client locate a service by using a service? In fact, that is exactly the case. A naming or directory service is run on a well-known host and port number.


RMI can use many different directory services, including the Java Naming and Directory Interface (JNDI). RMI itself includes a simple service called the RMI Registry, RMI registry. The RMI Registry runs on each machine that hosts remote service objects and accepts queries for services, by default on port 1099.




Chapter 15 Using RMI Interfaces Implementation stub skeleton




1 Interface definitions for the remote services 2 Implementations of the remote services 3 Stub and Skeleton files 4 A server to host the remote services 5 An RMI Naming service that allows clients to find the remote services 6 A class file provider (an HTTP or FTP server) 7 A client program that needs the remote services

In the next sections, you will build a simple RMI system in a step-by-step fashion. You are encouraged to create a fresh subdirectory on your computer and create these files as you read the text. To simplify things, you will use a single directory for the client and server code. By running the client and the server out of the same directory, you will not have to set up an HTTP or FTP server to provide the class files. (Details about how to use HTTP and FTP servers as class file providers will be covered in the section on Distributing and Installing RMI Software)

Assuming that the RMI system is already designed, you take the following steps to build a system:

1 Write and compile Java code for interfaces 2 Write and compile Java code for implementation classes 3 Generate Stub and Skeleton class files from the implementation classes 4 Write Java code for a remote service host program 5 Develop Java code for RMI client program 6 Install and run RMI system

1. Interfaces



The first step is to write and compile the Java code for the service interface. The Calculator interface defines all of the remote features offered by the service:





Next, you write the implementation for the remote service. This is the CalculatorImpl class:

public class CalculatorImpl extends java.rmi.server.UnicastRemoteObject implements Calculator {



// Implementations must have an //explicit constructor // in order to declare the //RemoteException exception public CalculatorImpl() throws java.rmi.RemoteException { super(); } public long add(long a, long b) throws java.rmi.RemoteException { return a + b; } public long sub(long a, long b) throws java.rmi.RemoteException { return a - b; } public long mul(long a, long b) throws java.rmi.RemoteException { return a * b; } public long div(long a, long b) throws java.rmi.RemoteException { return a / b; } }









Usage: rmic <options> <class names> where <options> includes: -keep Do not delete intermediate

generated source files -keepgenerated (same as "-keep") -g Generate debugging info -depend Recompile out-of-date

files recursively -nowarn Generate no warnings -verbose Output messages about

what the compiler is doing -classpath <path> Specify where

to find input source and class files -d <directory> Specify where to

place generated class files -J<runtime flag> Pass argument

to the java interpreter




-v1.1 Create stubs/skeletons

for JDK 1.1 stub protocol version -vcompat (default)

Create stubs/skeletons compatible with both JDK 1.1 and Java 2 stub protocol versions -v1.2 Create stubs for Java 2 stub protocol

version only


import java.rmi.Naming;

public class CalculatorServer {

public CalculatorServer() { try { Calculator c = new CalculatorImpl(); Naming.rebind("rmi://localhost:1099/CalculatorService", c); } catch (Exception e) { System.out.println("Trouble: " + e); } }

public static void main(String args[]) {



new CalculatorServer(); }


import java.rmi.Naming; import java.rmi.RemoteException; import java.net.MalformedURLException; import java.rmi.NotBoundException; public class CalculatorClient { public static void main(String[] args) { try { Calculator c = (Calculator) Naming.lookup( "rmi://localhost /CalculatorService"); System.out.println( c.sub(4, 3) ); System.out.println( c.add(4, 5) ); System.out.println( c.mul(3, 6) ); System.out.println( c.div(9, 3) ); } catch (MalformedURLException murle) { System.out.println(); System.out.println( "MalformedURLException"); System.out.println(murle); } catch (RemoteException re) { System.out.println(); System.out.println(



"RemoteException"); System.out.println(re); } catch (NotBoundException nbe) { System.out.println(); System.out.println( "NotBoundException"); System.out.println(nbe); } catch ( java.lang.ArithmeticException ae) { System.out.println(); System.out.println( "java.lang.ArithmeticException"); System.out.println(ae); } } }





>java CalculatorServer



It will start, load the implementation into memory and wait for a client connection.



1 9 18 3 That's it; you have created a working RMI system. Even though you ran the three consoles on the same computer, RMI uses your network stack and TCP/IP to communicate between the three separate JVMs. This is a full-fledged RMI system.

Chapter 17 Parameters in RMI


Parameters in RMI Primitive Parameters

When a primitive data type is passed as a parameter to a remote method, the RMI system passes it by value. RMI will make a copy of a primitive data type and send it to the remote method. If a method returns a primitive data type, it is also returned to the calling JVM by value.


Object Parameters

When an object is passed to a remote method, the semantics change from the case of the single JVM. RMI sends the object itself, not its reference, between JVMs. It is the object that is passed by value, not the reference to the object. Similarly, when a remote method returns an object, a copy of the whole object is returned to the calling program.







UnicastRemoteObject.exportObject (<remote_object>)





Distributing RMI Classes To run an RMI application, the supporting class files must be placed in locations that can be found by the server and the clients.


i Remote service interface definitions ii Remote service implementations iii Skeletons for the implementation classes (JDK 1.1 based servers only) iv Stubs for the implementation classes v All other server classes

For the client, the following classes must be available to its class loader:

i Remote service interface definitions ii Stubs for the remote service implementation classes iii Server classes for objects used by the client (such as return values) iv All other client classes Once you know which files must be on the different nodes, it is a simple task to make sure they are available to each JVM's class loader.

Chapter 20 Introduction to CORBA


What is CORBA? What does it do?

CORBA is the acronym for Common Object Request Broker Architecture, OMG's open, vendor-independent architecture and infrastructure that computer applications use to work together over networks. Using the standard protocol IIOP, a CORBA-based program from any vendor, on almost any computer, operating system, programming language, and network, can interoperate with a CORBA-based program from the same or another vendor, on almost any other computer, operating system, programming language, and network.

Some people think that CORBA is the only specification that OMG produces, or that the term "CORBA" covers all of the OMG specifications. Neither is true; for an overview of all the OMG specifications and how they work together, click here. To continue with CORBA, read on

What is CORBA good for?

CORBA is useful in many situations. Because of the easy way that CORBA integrates machines from so many vendors, with sizes ranging from mainframes through minis and desktops to hand-helds and embedded systems, it is the middleware of choice for large (and even not-so-large) enterprises. One of its most important, as well most frequent, uses is in servers that must handle large number of clients, at high hit rates, with high reliability. CORBA works behind the scenes in the computer rooms of many of the world's largest websites; ones that you probably use every day. Specializations for scalability and fault-tolerance support these systems. But it's not used just for large applications; specialized versions of CORBA run real-time systems, and small embedded systems.

How about a high-level technical overview?

CORBA applications are composed of objects, individual units of running software that combine functionality and data, and that frequently (but not always) represent something in the real world. Typically, there are many instances of an object of a single type - for example, an e-commerce website would have many shopping cart object instances, all identical in functionality but differing in that each is assigned to a different customer, and contains data representing the merchandise that its particular customer has selected. For other types, there may be only one instance. When a legacy application, such as an accounting system, is wrapped in code with CORBA interfaces and opened up to clients on the network, there is usually only one instance.

For each object type, such as the shopping cart that we just mentioned, you define an interface in OMG IDL. The interface is the syntax part of the contract that the server object offers to the clients that invoke it. Any client that wants to invoke an operation on the object must use this IDL interface to specify the operation it wants to perform, and to marshal the arguments that it sends. When the invocation reaches the target object, the same interface definition is used there to unmarshal the arguments so that the object can perform the requested operation with them. The interface definition is then used to marshal the results for their trip back, and to unmarshal them when they reach their destination.



The IDL interface definition is independent of programming language, but maps to all of the popular programming languages via OMG standards: OMG has standardized mappings from IDL to C, C++, Java, COBOL, Smalltalk, Ada, Lisp, Python, and IDLscript.

This separation of interface from implementation, enabled by OMG IDL, is the essence of CORBA - how it enables interoperability,

with all of the transparencies we've claimed. The interface to each object is defined very strictly. In contrast, the implementation of an object - its running code, and its data - is hidden from the rest of the system (that is, encapsulated) behind a boundary that the client may not cross. Clients access objects only through their advertised interface, invoking only those operations that that the object exposes through its IDL interface, with only those parameters (input and output) that are included in the invocation.

Figure 1 shows how everything fits together, at least within a single process: You compile your IDL into client stubs and object skeletons, and write your object (shown on the right) and a client for it (on the left). Stubs and skeletons serve as proxies for clients and servers, respectively. Because IDL defines interfaces so strictly, the stub on the client side has no trouble meshing perfectly with the skeleton on the server side, even if the two are compiled into different programming languages, or even running on different ORBs from different vendors.

In CORBA, every object instance has its own unique object reference, an identifying electronic token. Clients use the object references to direct their invocations, identifying to the ORB the exact instance they want to invoke (Ensuring, for example, that the books you select go into your own shopping cart, and not into your neighbor's.) The client acts as if it's invoking an operation on the object instance, but it's actually invoking on the IDL stub which acts as a proxy. Passing through the stub on the client side, the invocation continues through the ORB (Object Request Broker), and the skeleton on the implementation side, to get to the object where it is executed.

How do remote invocations work?

Figure 2 diagrams a remote invocation. In order to invoke the remote object instance, the client first obtains its object reference. (There are many ways to do this, but we won't detail any of them here. Easy ways include the Naming Service and the Trader Service.) To make the remote invocation, the client uses the same code that it used in the local invocation we just described, substituting the



object reference for the remote instance. When the ORB examines the object reference and discovers that the target object is remote, it routes the invocation out over the network to the remote object's ORB. (Again we point out: for load balanced servers, this is an oversimplification.)

How does this work? OMG has standardized this process at two key levels: First, the client knows the type of object it's invoking (that it's a shopping cart object, for instance), and the client stub and object skeleton are generated from the same IDL. This means that the client knows exactly which operations it may invoke, what the input parameters are, and where they have to go in the invocation; when the invocation reaches the target, everything is there and in the right place. We've already seen how OMG IDL accomplishes this. Second, the client's ORB and object's ORB must agree on a common protocol - that is, a representation to specify the target object, operation, all parameters (input and output) of every type that they may use, and how all of this is represented over the wire. OMG has defined this also - it's the standard protocol IIOP. (ORBs may use other protocols besides IIOP, and many do for various reasons. But virtually all speak the standard protocol IIOP for reasons of interoperability, and because it's required by OMG for compliance.)

Although the ORB can tell from the object reference that the target object is remote, the client can not. (The user may know that this also, because of other knowledge - for instance, that all accounting objects run on the mainframe at the main office in Tulsa.) There is nothing in the object reference token that the client holds and uses at invocation time that identifies the location of the target object. This ensures location transparency - the CORBA principle that simplifies the design of distributed object computing applications.

Chapter 21 What is CORBA


What is CORBA?

CORBA, or Common Object Request Broker Architecture, is a standard architecture for distributed object systems. It allows a distributed, heterogeneous collection of objects to interoperate.

The OMG

The Object Management Group (OMG) is responsible for defining CORBA. The OMG comprises over 700 companies and organizations, including almost all the major vendors and developers of distributed object technology, including platform, database, and application vendors as well as software tool and corporate developers.

What is CORBA?

The Common Object Request Broker Architecture (CORBA) from the Object Management Group (OMG) provides a platform-independent, language-independent architecture for writing distributed, object-oriented applications. CORBA objects can reside in the same process, on the same machine, down the hall, or across the planet. The Java language is an excellent language for writing CORBA programs. Some of the features that account for this popularity include the clear mapping from OMG IDL to the Java programming language, and the Java runtime environment's built-in garbage collection.

The OMG

The Object Management Group (OMG) is responsible for defining CORBA. The OMG comprises over 700 companies and organizations, including almost all the major vendors and developers of distributed object technology, including platform, database, and application vendors as well as software tool and corporate developers.

Chapter 22 CORBA Architecture


CORBA Architecture

[The one written down is not good do it form the blue book]

CORBA defines an architecture for distributed objects. The basic CORBA paradigm is that of a request for services of a distributed object. Everything else defined by the OMG is in terms of this basic paradigm.

The services that an object provides are given by its interface. Interfaces are defined in OMG's Interface Definition Language (IDL). Distributed objects are identified by object references, which are typed by IDL interfaces.

The figure below graphically depicts a request. A client holds an object reference to a distributed object. The object reference is typed by an interface. In the figure below the object reference is typed by the Rabbit interface. The Object Request Broker, or ORB, delivers the request to the object and returns any results to the client. In the figure, a jump request returns an object reference typed by the AnotherObject interface.

The ORB

The ORB is the distributed service that implements the request to the remote object. It locates the remote object on the network, communicates the request to the object, waits for the results and when available communicates those results back to the client.

The ORB implements location transparency. Exactly the same request mechanism is used by the client and the CORBA object regardless of where the object is located. It might be in the same process with the client, down the hall or across the planet. The client cannot tell the difference.

The ORB implements programming language independence for the request. The client issuing the request can be written in a different programming language from the implementation of the CORBA object. The ORB does the necessary translation between programming languages. Language bindings are defined for all popular programming languages.

CORBA as a Standard for Distributed Objects



One of the goals of the CORBA specification is that clients and object implementations are portable. The CORBA specification defines an application programmer's interface (API) for clients of a distributed object as well as an API for the implementation of a CORBA object. This means that code written for one vendor's CORBA product could, with a minimum of effort, be rewritten to work with a different vendor's product. However, the reality of CORBA products on the market today is that CORBA clients are portable but object implementations need some rework to port from one CORBA product to another.

CORBA 2.0 added interoperability as a goal in the specification. In particular, CORBA 2.0 defines a network protocol, called IIOP (Internet Inter-ORB Protocol), that allows clients using a CORBA product from any vendor to communicate with objects using a CORBA product from any other vendor. IIOP works across the Internet, or more precisely, across any TCP/IP implementation.

Interoperability is more important in a distributed system than portability. IIOP is used in other systems that do not even attempt to provide the CORBA API. In particular, IIOP is used as the transport protocol for a version of JavaTM RMI (so called "RMI over IIOP"). Since EJB is defined in terms of RMI, it too can use IIOP. Various application servers available on the market use IIOP but do not expose the entire CORBA API. Because they all use IIOP, programs written to these different API's can interoperate with each other and with programs written to the CORBA API.

CORBA Services

Another important part of the CORBA standard is the definition of a set of distributed services to support the integration and interoperation of distributed objects. As depicted in the graphic below, the services, known as CORBA Services or COS, are defined on top of the ORB. That is, they are defined as standard CORBA objects with IDL interfaces, sometimes referred to as "Object Services."

There are several CORBA services. The popular ones are described in detail in another module of this course. Below is a brief description of each:

Service Description

Object life cycle Defines how CORBA objects are created, removed, moved, and copied

Naming Defines how CORBA objects can have friendly symbolic names

Events Decouples the communication between distributed objects

Relationships Provides arbitrary typed n-ary relationships between CORBA objects



Externalization Coordinates the transformation of CORBA objects to and from external media

Transactions Coordinates atomic access to CORBA objects

Concurrency Control Provides a locking service for CORBA objects in order to ensure serializable access

Property Supports the association of name-value pairs with CORBA objects

Trader Supports the finding of CORBA objects based on properties describing the service offered by the object

Query Supports queries on objects

CORBA Products

CORBA is a specification; it is a guide for implementing products. Several vendors provide CORBA products for various programming languages. The CORBA products that support the Java programming language include:

ORB Description

The Java 2 ORB The Java 2 ORB comes with Sun's Java 2 SDK. It is missing several features.

VisiBroker for Java A popular Java ORB from Inprise Corporation. VisiBroker is also embedded in other products. For example, it is the ORB that is embedded in the Netscape Communicator browser.

OrbixWeb A popular Java ORB from Iona Technologies.

WebSphere A popular application server with an ORB from IBM.

Netscape Communicator Netscape browsers have a version of VisiBroker embedded in them. Applets can issue request on CORBA objects without downloading ORB classes into the browser. They are already there.

Various free or shareware ORBs

CORBA implementations for various languages are available for download on the web from various sources.

Providing detailed information about all of these products is beyond the scope of this introductory course. This course will just use examples from both Sun's Java 2 ORB and Inprise's VisiBroker 3.x for Java products.



CORBA vs. RMI Code-wise, it is clear that RMI is simpler to work with since the Java developer does not need to be familiar with the Interface Definition Language (IDL). In general, however, CORBA differs from RMI in the following areas:

� CORBA interfaces are defined in IDL and RMI interfaces are defined in Java. RMI-IIOP allows you to write all interfaces in Java (see RMI-IIOP).

� CORBA supports in and out parameters, while RMI does not since local objects are passed by copy and remote objects are passed by reference.

� CORBA was designed with language independence in mind. This means that some of the objects can be written in Java, for example, and other objects can be written in C++ and yet they all can interoperate. Therefore, CORBA is an ideal mechanism for bridging islands between different programming languages. On the other hand, RMI was designed for a single language where all objects are written in Java. Note however, with RMI-IIOP it is possible to achieve interoperability.

� CORBA objects are not garbage collected. As we mentioned, CORBA is language independent and some languages (C++ for example) does not support garbage collection. This can be considered a disadvantage since once a CORBA object is created, it continues to exist until you get rid of it, and deciding when to get rid of an object is not a trivial task. On the other hand, RMI objects are garbage collected automatically.

Chapter 24 Introduction to Wireless LAN


What is a wireless LAN?

Wireless LAN stands for Wireless Local Area Network. It is a flexible data communications system implemented to extend or substitute for, a wired LAN. Radio frequency (RF) technology is used by a wireless LAN to transmit and receive data over the air, minimizing the need for wired connections. A WLAN enables data connectivity and user mobility.

What are some Wireless LAN Applications? A Wireless LAN is frequently used to augment and enhance a wired LAN network rather than as a replacement. Applications made possible through the use of wireless LAN technology include: Hospital applications using wireless LAN capable hand-held or notebook computers deliver patient information instantly and securely to doctors and nurses. Small workgroups and audit teams can increase productivity due to quick network setup. Students, professors, and staff at universities, corporate training centers, and other schools can access the Internet, the college catalog, and actual course content. Network managers can use wireless LANs to reduce the overhead caused by moves, extensions to networks, and other changes. Installing networked computers in older buildings becomes easier by using wireless LAN as a cost-effective network infrastructure solution. Pre-configured wireless LAN setups need no local computer support and make trade show and branch office setups simple. Wireless LAN in warehouses can be used to retrieve and updated information on centralized databases, thereby increasing productivity. Network managers, senior executives, and line managers can make quicker decisions because they have real-time information at their fingertips.

INTRODUCTION TO WIRELESS LAN

WLAN is a flexible data communication system, which can be used for applications in which mobility is necessary or desirable. Using electromagnetic waves, WLANs transmit and receive data over the air without relying on physical connection. Current WLAN technology is capable of reaching a data rate of 11Mbps. Overall, WLAN is a promising technology for the future communication market.



Wireless Local Area Networks

• The Point-to-Point Protocol (PPP) was designed to provide a dedicated line for

users who need Internet access via a telephone line or a cable TV connection. • A PPP connection goes through these phases: idle, establishing, authenticating

(optional), networking, and terminating. • At the data link layer, PPP employs a version of HDLC. • The Link Control Protocol (LCP) is responsible for establishing, maintaining,

configuring, and terminating links. • Password Authentication Protocol (PAP) and Challenge Handshake

Authentication Protocol (CHAP) are two protocols used for authentication in PPP. • PAP is a two-step process. The user sends authentication identification and a

password. The system determines the validity of the information sent. • CHAP is a three-step process. The system sends a value to the user. The user

manipulates the value and sends its result. The system verifies the result. • Network Control Protocol (NCP) is a set of protocols to allow the encapsulation

of data coming from network layer protocols; each set is specific for a network layer protocol that requires the services of PPP.

• Internetwork Protocol Control Protocol (IPCP), an NCP protocol, establishes and terminates a network layer connection for IP packets.

• The IEEE 802.11 standard for wireless LANs defines two services: basic service set (BSS) and extended service set (ESS). An ESS consists of two or more BSSs; each BSS must have an access point (AP).

• The physical layer methods used by wireless LANs include frequency-hopping spread spectrum (FHSS), direct sequence spread spectrum (DSSS), orthogonal frequency-division multiplexing (OFDM), and high-rate direct sequence spread spectrum (HR-DSSS).

• FHSS is a signal generation method in which repeated sequences of carrier frequencies are used for protection against hackers.

• One bit is replaced by a chip code in DSSS. • OFDM specifies that one source must use all the channels of the bandwidth. • HR-DSSS is DSSS with an encoding method called complementary code keying

(CCK). • The wireless LAN access method is CSMA/CA. • The network allocation vector (NAV) is a timer for collision avoidance. • The MAC layer frame has nine fields. The addressing mechanism can include up

to four addresses. • Wireless LANs use management frames, control frames, and data frames. • Bluetooth is a wireless LAN technology that connects devices (called gadgets) in

a small area. • A Bluetooth network is called a piconet. Multiple piconets form a network called

a scatternet. • The Bluetooth radio layer performs functions similar to those in the Internet

model's physcial layer.



• The Bluetooth baseband layer performs functions similar to those in the Internet model's MAC sublayer.

• A Bluetooth network consists of one master device and up to seven slave devices. • A Bluetooth frame consists of data as well as hopping and control mechanisms. A

fram is one, three, or five slots in length with each slot equal to 625 µs.

Chapter 25 How does WLAN work


How does WLAN work?

WLANs use radio, infrared and microwave transmission to transmit data from one point to another without cables. Therefore WLAN offers way to build a Local Area Network without cables. This WLAN can then be attached to an already existing larger network, the internet for example.

A wireless LAN consists of nodes and access points. A node is a computer or a peripheral (such as a printer) that has a network adapter, in WLANs case with an antenna. Access points function as transmitters and receivers between the nodes themselves or between the nodes and another network. More on this later.

WLAN data transfer in itself is implemented by one of the following technologies:

• Frequency Hopping Spread Spectrum (FHSS) • Direct Sequence Spread Spectrum (DSSS) • Infrared (IR)

In the following is a brief discussion about each of them.

Frequency Hopping Spread Spectrum

Frequency Hopping Spread Spectrum (FHSS) uses a narrowband carrier that changes frequency in a pattern known to both transmitter and receiver. Properly synchronized, the net effect is to maintain a single logical channel. To an unintended receiver, FHSS appears to be short-duration impulse noise.

Direct Sequence Spread Spectrum

Direct Sequence Spread Spectrum (DSSS) generates a redundant bit pattern for each bit to be transmitted. This bit pattern is called a chip (or chipping code). The longer the chip, the greater the probability that the original data can be recovered (the more bandwidth required also). Even if one or more bits in the chip are damaged during transmission, statistical techniques can recover the original data without the need for retransmission. To an unintended receiver, DSSS appears as low-power wideband noise and is ignored by most narrowband receivers.

Infrared Technology

Infrared (IR) systems use very high frequencies, just below visible light in the electromagnetic spectrum, to carry data. Like light, IR cannot penetrate opaque objects; it is either directed (line-of-sight) or diffuse technology. Inexpensive directed systems provide very limited range (3 ft) and are occasionally used in specific WLAN applications. High performance directed IR is impractical for mobile users and is

Chapter 25 How does WLAN work


therefore used only to implement fixed subnetworks. Diffuse (or reflective) IR WLAN systems do not require line-of-sight, but cells are limited to individual rooms.

Chapter 26 WLAN setup Adhoc, infrasturture LAN


WLAN setups

A WLAN can be set up in two main architectures: Ad-hoc (distributed control) and infrastructure LAN (centralized control).

The ad-hoc network (also called peer to peer mode) is simply a set of WLAN wireless stations that communicate directly with one another without using access point or any connection to the wired network. For example, this ad-hoc network can be formed by two laptops with a network interface card. There is no central controller; mobile terminals can communicate using peer-to-peer connections with other terminals independently. The network may still include a gateway node to create an interface with a fixed network. As an example this kind of setup might be very useful in a meeting where employees bring laptop computers together to communicate and share information even when the network is not provided by the company. Or an ad-hoc network could be set up in a hotel room or in the airport or where the access to the wired network is barred.

Fig 1. Ad-hoc network setup. Reference:Designing a high performance Radio Local Area Network adapter.Juha Ala- Laurila .Master thesis 1997, Tampere University of Technology p.84.

The infrastructure LAN network consists of an arbitrary number of mobile terminals in addition to access points. The access points are located between mobile terminals and the fixed network. All data transmission is controlled and conveyed by the access points and they are also responsible for sharing resources between terminals. The range of an access point using radio frequencies is roughly 100 meters. The range varies widely with the geometry and other physical properties of the space in which it is used.

Chapter 26 WLAN setup Adhoc, infrasturture LAN


Fig 2.Infrastructure LAN network setup. Reference: Designing a high performance Radio Local Area Network adapter.Juha Ala- Laurila .Master thesis 1997, Tampere

Chapter 27 Uses of WLAN


What are some uses of Wireless LAN Applications?

A Wireless LAN is frequently used to augment and enhance a wired LAN network rather than as a replacement.

Applications made possible through the use of wireless LAN technology include:

• Hospital applications using wireless LAN capable hand-held or notebook computers deliver patient information instantly and securely to doctors and nurses.

• Small workgroups and audit teams can increase productivity due to quick network setup. • Students, professors, and staff at universities, corporate training centers, and other

schools can access the Internet, the college catalog, and actual course content. • Network managers can use wireless LANs to reduce the overhead caused by moves,

extensions to networks, and other changes. • Installing networked computers in older buildings becomes easier by using wireless LAN

as a cost-effective network infrastructure solution. • Pre-configured wireless LAN setups need no local computer support and make trade

show and branch office setups simple. • Wireless LAN in warehouses can be used to retrieve and updated information on

centralized databases, thereby increasing productivity. • Network managers, senior executives, and line managers can make quicker decisions

because they have real-time information at their fingertips.

Chapter 28 Benefits of WLAN


Benefits of WLAN What are the concrete benefits of WLAN over wired networks? While the most obvious is mobility, there are advantages also in building and maintaining a wireless network. Let us look at the benefits more closely: Mobility Mobility is a significant advantage of WLANs. User can access shared resources without looking for a place to plug in, anywhere in the organization. A wireless network allows users to be truly mobile as long as the mobile terminal is under the network coverage area. Range of coverage The distance over which RF and IR waves can communicate depends on product design (including transmitted power and receiver design) and the propagation path, especially in indoor environments. Interactions with typical building objects, such as walls, metal, and even people, can affect the propagation of energy, and thus also the range and coverage of the system. IR is blocked by solid objects, which provides additional limitations. Most wireless LAN systems use RF, because radio waves can penetrate many indoor walls and surfaces. The range of a typical WLAN node is about 100 m. Coverage can be extended, and true freedom of mobility achieved via roaming. This means using access points to cover an area in such a way that their coverage overlaps each other. Thereby the user can wander around and move from the coverage area of one access point to another without even knowing he has, and at the same time seamlessly maintain the connection between his node and an access point. Ease of use WLAN is easy to use and the users need very little new information to take advantage of WLANs. Because the WLAN is transparent to a user's network operating system, applications work in the same way as they do in wired LANs. Installation Speed, Simplicity and Flexibility Installation of a WLAN system can be fast and easy and can eliminate the need to pull cable through walls and ceilings. Furthermore, wireless LAN enables networks to be set up where wires might be impossible to install. Scalability Wireless networks can be designed to be extremely simple or complex. Wireless networks can support large numbers of nodes and large physical areas by adding access points to extend coverage. Cost Finally, the cost of installing and maintaining a WLAN is on average lower than the cost of installing and maintaining a traditional wired LAN, for two reasons. First, WLAN eliminates the direct costs of cabling and the labor associated with installing and repairing it. Second, because WLANs simplify moving, additions, and changes, the indirect costs of user downtime and administrative overhead are reduced.

Chapter 29 Restrictions and Problems with WLAN


Restrictions and potential problems with WLAN

Radio signal interference

Radio signal interference in WLAN systems can go two ways: The WLAN can cause interference to other devices operating in or near it’s frequency band. Or conversely, other devices can interfere with WLAN operation, provided their signal is stronger. The result is a scrambled signal, which of course prevents the nodes from exchanging information between each other or access points. WLANs using infrared technology generally experience line-of-sight problems. An object blocking this line between the two WLAN units is very likely to interrupt the transmission of data.

Connection problem

TCP/IP provides reliable connection over wired LANs, but in WLAN it is susceptible to losing connections, especially when the terminal is operating within the marginal WLAN coverage. Another connection related issue is IP addressing. The wireless terminals can roam between access points in the same IP subnet but connections are lost if the terminal moves from one IP subnet to another.

Network security

This is an important aspect in WLAN. It is difficult to restrict access to a WLAN physically, because radio signals can propagate outside the intended coverage of a specific WLAN, for example an office building. Some security measures against the problem are using encryption, access control lists on the access points and network identifier codes. The technical operation of WLANs also works against the intruder: Frequency hopping and direct sequence operation makes eavesdropping impossible for everyone else than the most sophisticated.

Book for Internet Technology

Documents

Transcript of Book for Internet Technology