Unit 9
Distributed Applications Systems
OverviewA distributed system consists of several computers connected by a variety of transmission
media. Probably the best-known example is the World Wide Web. There are two kinds of
software associated with such systems: system software, which is used to manage and
administer the system (software that maintains a distributed file system, for example);
and application software, which is written to satisfy some commercial need. The
application software draws on facilities provided by the system software: for example, an
online retail site will use TCP/IP protocol facilities to carry out the process of sending
and receiving sales data.
This lesson concentrates on distributed application development and the technologies
used for implementation.
Introduction looks at the main features of distributed applications and the characteristics
of ecommerce systems. It describes the supporting infrastructure provided by the internet
and examines the structure and use of clients and servers. The introduction ends by
examining the different development paradigms that can be used for developing
distributed systems: message passing, distributed objects, event-based bus technologies
and space-based technologies.
Servers looks at the main features of the two most popular kinds of server found in
distributed application systems: web servers and database servers.
General issues considers such matters as the security of distributed systems and design
principles for distributed systems.
Lessons
1. Overview of the Distributed Application System.
2. Servers looks
3. General issues
Lesson 1 – Overview of the Distributed Application System
Objectives:
At the end of this lesson you will be able to:
Understand the Distributed Application System
Understand the main features of distributed application system and
characteristics of e-commerce systems
Understand supporting infrastructure provided by the internet and
examines the structure and use of clients and servers
Understand different development paradigms used for developing
distributed application system
What Is a Distributed Application System?
Distributed computing deals with hardware and software systems containing more than
one processing element or storage element, concurrent processes, or multiple programs,
running under a loosely or tightly controlled regime.
In distributed computing a program is split up into parts that run simultaneously on
multiple computers communicating over a network. Distributed computing is a form of
parallel computing, but parallel computing is most commonly used to describe program
parts running simultaneously on multiple processors in the same computer. Both types of
processing require dividing a program into parts that can run simultaneously, but
distributed programs often must deal with heterogeneous environments, network links of
varying latencies, and unpredictable failures in the network or the computers.
There are lots of advantages including the ability to connect remote users with remote
resources in an open and scalable way. When we say open, we mean each component is
continually open to interaction with other components. When we say scalable, we mean
the system can easily be altered to accommodate changes in the number of users,
resources and computing entities.
Thus, a distributed system can be much larger and more powerful given the combined
capabilities of the distributed components, than combinations of stand-alone systems. But
it's not easy - for a distributed system to be useful, it must be reliable. This is a difficult
goal to achieve because of the complexity of the interactions between simultaneously
running components
The distributed applications are based on possibility to send objects from one application
program to another and to allow an invocation by one application program of object
methods, which are located in another application program. The processing of user's
interaction with the database in realized in three levels: web browser, database client and
database server. The user's level contains web browser, which displays web pages and
collects information for processing. The middle level contains web server and application
server, which are the client programs for database. The lowest level contains database
servers.
Features of Distributed Application System
The technologies of the distributed applications design
The interest to distributed applications is explained by increased requirements to
modern program tools. The major of them are: application scalability - the capability
for an effective maintenance of any quantity of clients at the same time; application
reliability to the client application errors and communication failures; transactions
reliability - is the secure system junction in functioning process from one stable and
authentic state to another; long-term behavior - the nonstop run for 24 hours in a
week (24x7 run model); high security level of applications, which guarantees not only
the access control to different data, but also guarding in all stages of functioning; high
application development speed, simplicity of maintenance and possibility to modify
them by the programmers of a medium qualification.
At present time, there are some known technologies for static and dynamic distributed
applications realization, which meet the requirements, described above: socket
programming, RPC (Remote Procedure Call), DCOM (Microsoft Distributed
Component Object Model), CORBA (Common Object Request Broker Architecture)
and Java RMI (Java Remote Method Invocation). At the same time the most
important of them are the last three- DCOM, CORBA, Java RMI.
The DCOM technology is object-oriented which is supported by the next operation
systems: Windows 98, Windows NT, Windows 2000, Sun Solaris, Digital UNIX,
IBM MVS, etc. The most important merit of this technology consists in possibility to
integrate the applications, realized in different programming systems.
The CORBA technology is the part of OMA (Object Management Architecture),
developed to standardize the architecture and interaction interfaces of object-oriented
applications. The interfaces between the CORBA-objects encode using a special
language for interface definitions IDL (Interface Definition Language). Such
interfaces can be realized on any language of applied programming and connected to
CORBA-applications. In the context of standards it is proposed to connect CORBA-
object with DCOM-objects through the special CORBA-DCOM bridges.
The Java RMI applications consist of client and server as usual. Some objects are
created on server which can be transmitted through the network or methods which are
declared shared to remote application calls. On the client side there are realized
applications, which use remote objects. The distinguishing feature of RMI is
possibility to transmit through the network 4either methods or objects. This feature
provides, finally, mobility (portability) realization.
Today, the Java RMI and CORBA technologies are the most flexible and effective to
create distributed applications. These technologies are relatives by their features. The
major merit of CORBA is the IDL interface, which unifies communication tools
between the applications and the interoperability with other applications. On the other
hand, Java RMI is more flexible and powerful tool to distributed applications
development using the Java platform, including possibility of mobile applications
realization.
The Java RMI technology includes two constituents: Java language instrumental tools
and remote method invocation (RMI) to Java-objects. The Java language tools let to
create complex distributed network applications, which is possessed of high security
level and reliability, realize object-oriented programming, integrated multithreading
and platform independence. The RMI technology assigns a set of tools, which let to
get access to remote object on server by the special stub-object methods calls.
The Java RMI specification shows:
1. To set remote interfaces for classes whose methods can be called through the
network;
2. To create stub-objects using a special compiler;
3. To get a full copy of remote object, not only the reference to it;
4. To transmit objects in such a way that their behaviours will it not be changed
when they are transmitted to another virtual machine;
5. To register a server object in the RMI registry and to support that registry in
accessible state with the help of special background process. The clients can
access to this registry when they are looking for the needed objects;
6. To produce marshalling and de-marshalling with the help of API serialization
which translate object to a byte stream before transmitting and then, after
receiving, backwards;
7. To work over the IIOP protocol that gives a possibility to communicate with the
CORBA-objects (IIOP protocol can transmit a data of a different types, including
structures and unions, even if they contains reverse definitions).
The RMI technology organizes the programmable interface to work with the
networks unlike the TCP sockets. Such an interface has higher level, it is based on
method invocation and makes an impression allegedly the remote object is being
operated locally. RMI is more suitable and more natural than interface based on
sockets, however, it requires the Java programs execution on the both side of a
connection. The network connection, nevertheless, may use the same TCP/IP
protocol.
The network information technologies
There are three main parts in client-server technologies: user interface to display
information, realize graphical user interface and form a requests to server; functional
logic to realize required computing and business rules; database to execute selections,
modify data and process them in accordance with received commands.
Depending on ordering method of these components on client and server machines
can be 2-tier, 3-tier and n-tier client-server technologies for corporate systems
functioning.
The 3-tier technology with distributed services provides independent functional
relationships between the user interface, functional logic and database. The user
interface and the smaller part of a functional logic are located on client computers.
The bigger part of a functional logic is located on application server. The database
locates on database server. The applications are independent in the model with
distributed services. They interact through the networks with application server. This
interacts with database server on demand.
The object-oriented network technologies of distributed services connect in
themselves the models of a distributed databases and services. The software of such
kind of systems consists of a set of object units which interact between themselves
through the computer network using standard interfaces. This approach lets to use
units many times and to spend computer resources more economically. Each object,
depending on conditions, can be either client or server in this technology. The object
computing architecture, which based on distributed network services, presents a new
highly upcoming kind of computer technologies, which are widely uses to distributed
corporate systems' design.
The object-oriented approach is perspective to create dynamic web-oriented
applications. The problem of "super thin" clients realization is closely tied with them,
when the client-applied program uses the browser environment for execution. The
effectiveness of such technology for corporate systems' design is described by the
possibility to realize the standard HTML-browser in any operation environment. If
we'll take into account that the maintenance cost for one server and the maintenance
cost for a thousand of connected to it "thick" clients are not comparable then we can
make a conclusion: successfully realized corporate system sharply reduces overhead
charges for maintenance.
The component-oriented technologies
At present time, for the development of corporate systems with a distributed database
branches and remote services (functional logic) the most considerable are integral
technologies like an Active X/DCOM, RMI/CORBA, Enterprise Java Beans/CORBA
and CORBA/J2EE. The distributed component-oriented Active X/DCOM technology
is intended for application server development, registration and management by the
distributed program objects. The main its demerits are: in the first place, server
computer operates under Windows NT/2000 Server operation system only; and in the
second place, the tools of multi-user access for a several server’s operations
coordination which is managing by transactions are completely absent.
The remote access technology RMI organizes interconnection with a remote objects,
lets to develop a qualitative Internet-applications, possesses by all Java language
environment merits, provides an object-oriented programming, guarantees a high
security and reliability level, multithreading, multiplatform support, an independence
to operation system. The RMI built on the concept of call and its parameters
translation to a byte stream for transmits them through the network. The backward
operation is carried out on the server side, method invocation and result transmission
back to the client.
In the base of the common object request broker architecture lies an idea of uniting
multifinder applications in a common tooling environment, than lets to this
technology operate in heterogeneous systems, i.e. the network can serve, at the same
time, computers of a different types, operating under different operation systems and
serving applications written on different programming languages. The CORBA
technology provides development and maintenance of software for distributed
corporate systems with a maximal convenience. Its software is called CORBA-
application.
Each object requests broker ORB producer, which, practically, is the CORBA,
theoretically can propose its own protocol of transport service for data transmission.
In this case the expanded name of technology, for example CORBA/IIOP, which
reflects the name of the IIOP network protocol. With the aid of IIOP protocol any
CORBA/IIOP application can interact with other CORBA/IIOP-applications
independent of hardware, software and operation systems producers.
Characteristics of E-commerce System
Electronic commerce, a subset of e-business, is the purchasing, selling, and exchanging
of goods and services over computer networks (such as the Internet) through which
transactions or terms of sale are performed electronically. Ecommerce can be broken into
four main categories: Business-to-Consumer (B2C), B2B (Business-to-Business),
Business-to-Government (B2G) and Consumer-to-Consumer (C2C). Ecommerce offers a
wider choice of services and types of making transactions. The characteristics of
Ecommerce can be summarized as follows:
1. Business-oriented
Ecommerce is essentially business-oriented, as it is the purchasing, selling, and
exchanging of goods and services. Ecommerce expands market and brings more
customers. Online shopping is a more convenient way through which consumers get
what they want to buy. Regardless of the size of a business, Ecommerce means
opportunities to all.
2. Convenient service
In the unique Ecommerce environment, customers will no longer be confined by
geographical constraints in receiving services. The prominent feature of E-service is
convenience. Both consumers and businesses benefit from it.
3. System extendable
For Ecommerce, an extendable system is the guarantee of system stability. It is of
vital importance that the system must be extendable when a server traffic jam occurs
because a two-minute reset will result in loosing a huge number of customers.
4. Online safety
Online safety is the first priority of Ecommerce. Frauds, wiretapping, virus, and
illegal entries pose constant threats to Ecommerce. Therefore, an Internet safety
solution featuring end-to-end protection is called for. The solution may include
various protective measures such as encryption mechanisms, digital signature
mechanisms, firewalls, secure World Wide Web servers and antivirus protections.
5. Coordination
Ecommerce is a process of coordination between employees, customers,
manufacturers, suppliers and business partners. The traditional Ecommerce solution
can improve the internal coordination in a company, for example, via Emails. Thanks
to the booming Ecommerce, many new words and expressions are being created and
becoming familiar to the public. Typical examples include virtual businesses, virtual
banks, online shopping, online payment, online advertising and so on. In conclusion,
Ecommerce is displaying its charm to all.
Development Paradigm
1. Message passing
In computer science, message passing is a form of communication used in parallel
computing, object-oriented programming, and interprocess communication.
Communication is made by the sending of messages to recipients. Forms of messages
include function invocation, signals, and data packets. Prominent models of
computation based on message passing include the Actor model and the process
calculi which are a diverse family of related approaches to formally modeling
concurrent systems.
Microkernel operating systems pass messages between one kernel and one or more
server blocks.
Distributed object and remote method invocation systems like ONC RPC, Corba,
Java RMI, DCOM, SOAP, .NET Remoting, QNX Neutrino RTOS, OpenBinder, D-
Bus and similar are message passing systems. The term is also used in High
Performance Computing using Message Passing Interface.
The concept of message passing is also used in Bayesian inference over Graphical
models.
2. Distributed objects
Distributed objects are software modules that are designed to work together, but
reside either in multiple computers connected via a network or in different processes
inside the same computer. One object sends a message to another object in a remote
machine or process to perform some task. The results are sent back to the calling
object.
3. Event-based bus technologies
4. Space-based technologies
Lesson 2 – Server Looks
Objectives:
At the end of this lesson you will be able to:
Understand the web servers
Understand the database servers.
Web servers
The term “Web server” is somewhat nebulous from a technical perspective, because it
can refer to multiple parts of a whole or the conglomeration of the parts. A Web server, in
its most basic form, is actually three distinctly different but equally important parts:
A physical machine connected to the Internet
A network operating system (NOS) that runs on the machine and manages all basic
networking functionality
Software that processes incoming HTTP requests from HTTP clients
In a more complex system, the server may also run special software (database engines,
transaction processors, site and server management tools, FTP or mail services, etc.) and
may be connected to specialized hardware (caching servers, firewalls, telephony
equipment, etc.). A server may actually consist of several physical machines working
together to provide the appearance of a single point of entry. When designing a Web
solution, the business need for all these components must be taken into consideration.
The two most popular Web servers for Internet hosts are Apache Web Server and
Microsoft Internet Information Server (IIS).
Apache HTTP Server
The Apache Web server was originally developed in 1995 by eight independent
developers who came together to create the Apache Project.
Using the source code for the NCSA HTTPd server, combined with numerous
patches and bug fixes, they founded the Apache Project as “a collaborative software
development effort aimed at creating a robust, commercial-grade, featureful and
freely-available source code implementation of an HTTP (Web) server.” Together
they founded the Apache Group and created the Apache Web server (“A PAtCHy
server).
Today the project is jointly managed by a group of volunteers located around the
world who use the Internet and Web to communicate, plan and develop the server and
its related documentation. These colunteers are known as the Apache Group. In
addition, hundreds of users have contributed ideas, code, and documentation to the
project.
One of the principle strengths of Apache is that it is free. All executables, the source
code, updates, patches, fixes, frequently asked questions (FAQs), and available
documentation cab be downloaded from the Apache Web site
(http://www.apache.org). The source code can be modified freely, so custom features
cab be added as necessary. The strong freeware mentality that exists in the UNIX
community (the platform for which Apache was originally developed) has helped
drive the development of Apache to a level equal to or surpassing many commercial
packages.
The major disadvantages of the Apache Web server is that it is difficult to configure.
Unlike products developed by Windows-oriented vendors, Apache does not have a
rich graphical environment in which to operate. Changes to the server must be made
by modifying the configuration files. This task can be daunting to those without a
strong UNIX background, and mistakes some of which can be devastating are easy to
make.
The dependence on a command-line interface, however, can also be one of Apache’s
greatest strengths. No other Web server provides the operator with as much control
over how the program operates. When performance is of utmost importance, Apache
is normally the server of choice.
Another area where Apache to lag behind the competition is in content support.
Apache comes with only basic Web page authoring tools, no content management
tools, a limited search engine, and no built-in SSL support. The open nature of the
product does allow development of add-on modules by third parties much easier than
for most other Web servers. These add-on modules, many of which are freely
available over the Internet, make Apache the most extensible Web server on the
market.
Microsoft IIS
Microsoft Internet Information Server (IIS) is free to all licensed users of Windows
NT and the two products are highly integrated. IIS 5.0 is now also included with
Windows 2000. this high level of integration makes IIS a good choice for companies
that already have a Windows NT network in operation. Companies that have
enterprise networks built on UNIX may not find IIS a very good choice because it
only runs on Windows NT and Microsoft has no plans to develop it for other
platforms.
One of the biggest selling features of IIS is its integration with a full line of other
Microsoft products. All of Microsoft’s BackOffice products, as well as FrontPage,
SQL Server, Proxy Server, Site Server, Systems Management Server, Certificate
Server, Transaction Server, Message Queue Server, Index Server, and Exchange
Server are designed to combine into a full enterprise networking solution.
Another advantage, or possible drawback to using IIS is the support that it provides to
proprietary Microsoft Web services. IIS is one of the few Web servers to support both
Active Server Pages and FrontPage extensions. These features allow users of IIS to
add dynamic content to their sites, which is not possible when using other Web
servers. Using them, however, may ultimately limit growth and choice because an all-
Microsoft solution may be necessary to support the Web site.
Database servers
A database server is a computer program that provides database services to other
computer programs or computers, as defined by the client-server model. The term may
also refer to a computer dedicated to running such a program. Database management
systems frequently provide database server functionality, and some DBMS's (e.g.,
MySQL) rely exclusively on the client-server model for database access.
In a master-slave model, database master servers are central and primary locations of data
while database slave servers are synchronized backups of the master acting as proxies.
Typically, client applications access database servers over a network.
Database servers are gaining importance because of the increasing popularity of the
client/server architecture model in computing. Database servers store the database on a
dedicated computer system, allow it to be accessed concurrently, maintain the integrity of
the data, and handle transaction support and user authorization.
A database server divides an application into a front end and a back end, in accordance
with the client-server model. The front end runs on the user’s computer and displays
requested data. The back end runs on the server and handles tasks such as data analysis
and storage.
Implementation of a database server
A database server can be implemented in a straightforward manner as separate node (on a
network) dedicated to running database-management software. This node provides an
interface to client nodes such that the same data is accessible to all nodes. The interface
allows users to submit requests to the database server and retrieve information. These
requests are typically made using a high-level query language such as SQL (standard
query language).
The server manages the any processor-intensive work such as data manipulation,
compilation, and optimization, and sends only the final results back to the client.
Database servers are typically made to run on a UNIX operating system.
Benefits of using a database server
A database server allows users to store data in one central location.
It performs complex functions such as searching, sorting, and indexing on the server
itself. This reduces network traffic because fewer items need to be transferred
between the client and the server.
Because data is stored centrally, there is enhanced security.
A database server uses its own processing power to find requested data, rather than
sending the complete data to the client so that the client searches for the data, as is
done in a file server.
A database server allows concurrent access to data.
Lesson 3 – General Issues
Objectives:
At the end of this lesson you will be able to:
Understand the security for distributed systems
Understand the design principles for distributed systems
Security for distributed systems
Security involves protecting the system hardware and software from both internal attack
and from external attack (hackers). An internal attack normally involves uneducated users
causing damage, such as deleting important files, crashing systems. Another attack can
come from internal fraud, where employees may intentionally attack a system for their
own gain, or through some dislike for something within the organization. There are many
cases of users who have grudges against other users, causing damage to systems, by
misconfiguring systems. This effect can be minimized if the system manager properly
protects the system. Typical action are to limit the files that certain users can access and
also the actions they can perform on the system.
Most system manager have seen the following:
Users sending a file of the wrong format to the system printer (such as sending a
binary file). Another typical one is where there is a problem on a networked printer
(such as lack of paper), but the user keeps re-sending the same print job.
Users deleting the contents of sub-directories, or moving files from one place to
another (typically, these days, with the dragging of a mouse cursor). Regular backups
can reduce this problem.
Users deleting important system files (in a PC, these are normally AUTOEXEC.BAT
and CONFIG.SYS). This can be overcome by the system administrator protecting
important system file, such as making them read-only or hidden.
Users telling other people their user passwords or not changing a password from the
initial default one. This can be overcome by the system administrator forcing the user
to change their password at given time periods.
Security takes many forms as:
Data protection
This is typically where sensitive or commercially important information is kept. It
might include information databases, design files or source code files. One method of
reducing this risk is encrypt important files with a password, another is to encrypt
data with a secret electronic key (files are encrypted with a commonly known public
key, and decrypted with a secret key, which is only known by user who has the rights
to access the files).
Software Protection
This involves protecting all the software packages from damage or from being
misconfigured. A misconfigured software package can cause as much damage as
physical attack on a system, because it can take a long time to find the problem.
Physical system protection
This involves protecting systems from intruders who might physically attack the
systems. Normally, important systems are locked in rooms and then within locked
rack-mounted cabinets.
Transmission protection
This involves a hacker tampering with a transmission connection. It might involve
tapping into a network connection or total disconnection. Tapping can be avoided by
many methods, including using optical fibers which are almost impossible to tap into
(as it would typically involve sawing through a cable with hundreds of fiber cables,
which would each have to be connected back as they were connected initially).
Underground cables can avoid total disconnection, or its damage can be reduced by
having redundant paths (such as different connections to the Internet).
Using an audit log file
Many secure operating systems, such as Windows NT/2000, have an audit file, which
is a text file that the system maintains and updates daily. This is a text file that can
record all of the actions of a specific user, and is regularly update. It can include the
dates and times that a user logs into the system, the files that were accessed, the
programs that were run, and the networked resources that were used, and so on. By
examining this file the system administrator can detect malicious attacks on the
system, whether it is by internal or external users.
Distributed Application Design
Building a reliable system that runs over an unreliable communications network seems
like an impossible goal. We are forced to deal with uncertainty. A process knows its own
state, and it knows what state other processes were in recently. But the processes have no
way of knowing each other's current state. They lack the equivalent of shared memory.
They also lack accurate ways to detect failure, or to distinguish a local software/hardware
failure from a communication failure.
Distributed systems design is obviously a challenging endeavor. How do we do it when
we are not allowed to assume anything, and there are so many complexities? We start by
limiting the scope. We will focus on a particular type of distributed systems design, one
that uses a client-server model with mostly standard protocols. It turns out that these
standard protocols provide considerable help with the low-level details of reliable
network communications, which makes our job easier. Let's start by reviewing client-
server technology and the protocols.
In client-server applications, the server provides some service, such as processing database queries or sending out current stock prices. The client uses the service provided by the server, either displaying database query results to the user or making stock purchase recommendations to an investor. The communication that occurs between the client and the server must be reliable. That is, no data can be dropped and it must arrive on the client side in the same order in which the server sent it. There are many types of servers we encounter in a distributed system. For example, file servers manage disk storage units on which file systems reside. Database servers house databases and make them available to clients. Network name servers implement a mapping between a symbolic name or a service description and a value such as an IP address and port number for a process that provides the service.
In distributed systems, there can be many servers of a particular type, e.g., multiple file
servers or multiple network name servers. The term service is used to denote a set of
servers of a particular type. We say that a binding occurs when a process that needs to
access a service becomes associated with a particular server which provides the service.
There are many binding policies that define how a particular server is chosen. For
example, the policy could be based on locality (a Unix NIS client starts by looking first
for a server on its own machine); or it could be based on load balance (a CICS client is
bound in such a way that uniform responsiveness for all clients is attempted).
A distributed service may employ data replication, where a service maintains multiple
copies of data to permit local access at multiple locations, or to increase availability when
a server process may have crashed. Caching is a related concept and very common in
distributed systems. We say a process has cached data if it maintains a copy of the data
locally, for quick access if it is needed again. A cache hit is when a request is satisfied
from cached data, rather than from the primary service. For example, browsers use
document caching to speed up access to frequently used documents.
Caching is similar to replication, but cached data can become stale. Thus, there may need
to be a policy for validating a cached data item before using it. If a cache is actively
refreshed by the primary service, caching is identical to replication. [1]
As mentioned earlier, the communication between client and server needs to be reliable.
You have probably heard of TCP/IP before. The Internet Protocol (IP) suite is the set of
communication protocols that allow for communication on the Internet and most
commercial networks. The Transmission Control Protocol (TCP) is one of the core
protocols of this suite. Using TCP, clients and servers can create connections to one
another, over which they can exchange data in packets. The protocol guarantees reliable
and in-order delivery of data from sender to receiver.
The IP suite can be viewed as a set of layers, each layer having the property that it only
uses the functions of the layer below, and only exports functionality to the layer above. A
system that implements protocol behavior consisting of layers is known as a protocol
stack. Protocol stacks can be implemented either in hardware or software, or a mixture of
both. Typically, only the lower layers are implemented in hardware, with the higher
layers being implemented in software.
There are four layers in the IP suite:
1. Application Layer
The application layer is used by most programs that require network communication.
Data is passed down from the program in an application-specific format to the next
layer, then encapsulated into a transport layer protocol. Examples of applications are
HTTP, FTP or Telnet.
2. Transport Layer
The transport layer's responsibilities include end-to-end message transfer independent
of the underlying network, along with error control, fragmentation and flow control.
End-to-end message transmission at the transport layer can be categorized as either
connection-oriented (TCP) or connectionless (UDP). TCP is the more sophisticated
of the two protocols, providing reliable delivery. First, TCP ensures that the receiving
computer is ready to accept data. It uses a three-packet handshake in which both the
sender and receiver agree that they are ready to communicate. Second, TCP makes
sure that data gets to its destination. If the receiver doesn't acknowledge a particular
packet, TCP automatically retransmits the packet typically three times. If necessary,
TCP can also split large packets into smaller ones so that data can travel reliably
between source and destination. TCP drops duplicate packets and rearranges packets
that arrive out of sequence.
<>UDP is similar to TCP in that it is a protocol for sending and receiving packets
across a network, but with two major differences. First, it is connectionless. This
means that one program can send off a load of packets to another, but that's the end of
their relationship. The second might send some back to the first and the first might
send some more, but there's never a solid connection. UDP is also different from TCP
in that it doesn't provide any sort of guarantee that the receiver will receive the
packets that are sent in the right order. All that is guaranteed is the packet's contents.
This means it's a lot faster, because there's no extra overhead for error-checking
above the packet level. For this reason, games often use this protocol. In a game, if
one packet for updating a screen position goes missing, the player will just jerk a
little. The other packets will simply update the position, and the missing packet -
although making the movement a little rougher - won't change anything.
<>Although TCP is more reliable than UDP, the protocol is still at risk of failing in
many ways. TCP uses acknowledgements and retransmission to detect and repair
loss. But it cannot overcome longer communication outages that disconnect the
sender and receiver for long enough to defeat the retransmission strategy. The normal
maximum disconnection time is between 30 and 90 seconds. TCP could signal a
failure and give up when both end-points are fine. This is just one example of how
TCP can fail, even though it does provide some mitigating strategies.
3. Network Layer
As originally defined, the Network layer solves the problem of getting packets across
a single network. With the advent of the concept of internetworking, additional
functionality was added to this layer, namely getting data from a source network to a
destination network. This generally involves routing the packet across a network of
networks, e.g. the Internet. IP performs the basic task of getting packets of data from
source to destination.
4. Link Layer
The link layer deals with the physical transmission of data, and usually involves
placing frame headers and trailers on packets for travelling over the physical network
and dealing with physical components along the way.
Remote Procedure Calls
Many distributed systems were built using TCP/IP as the foundation for the
communication between components. Over time, an efficient method for clients to
interact with servers evolved called RPC, which means remote procedure call. It is a
powerful technique based on extending the notion of local procedure calling, so that the
called procedure may not exist in the same address space as the calling procedure. The
two processes may be on the same system, or they may be on different systems with a
network connecting them.
An RPC is similar to a function call. Like a function call, when an RPC is made, the
arguments are passed to the remote procedure and the caller waits for a response to be
returned. In the illustration below, the client makes a procedure call that sends a request
to the server. The client process waits until either a reply is received, or it times out.
When the request arrives at the server, it calls a dispatch routine that performs the
requested service, and sends the reply to the client. After the RPC call is completed, the
client process continues.
Threads are common in RPC-based distributed systems. Each incoming request to a
server typically spawns a new thread. A thread in the client typically issues an RPC and
then blocks (waits). When the reply is received, the client thread resumes execution.
A programmer writing RPC-based code does three things:
1. Specifies the protocol for client-server communication
2. Develops the client program
3. Develops the server program
The communication protocol is created by stubs generated by a protocol compiler. A stub
is a routine that doesn't actually do much other than declare itself and the parameters it
accepts. The stub contains just enough code to allow it to be compiled and linked.
The client and server programs must communicate via the procedures and data types
specified in the protocol. The server side registers the procedures that may be called by
the client and receives and returns data required for processing. The client side calls the
remote procedure, passes any required data and receives the returned data.
Thus, an RPC application uses classes generated by the stub generator to execute an RPC
and wait for it to finish. The programmer needs to supply classes on the server side that
provide the logic for handling an RPC request.
RPC introduces a set of error cases that are not present in local procedure programming.
For example, a binding error can occur when a server is not running when the client is
started. Version mismatches occur if a client was compiled against one version of a
server, but the server has now been updated to a newer version. A timeout can result from
a server crash, network problem, or a problem on a client computer.
Some RPC applications view these types of errors as unrecoverable. Fault-tolerant
systems, however, have alternate sources for critical services and fail-over from a
primary server to a backup server.
A challenging error-handling case occurs when a client needs to know the outcome of a
request in order to take the next step, after failure of a server. This can sometimes result
in incorrect actions and results. For example, suppose a client process requests a ticket-
selling server to check for a seat in the orchestra section of Carnegie Hall. If it's available,
the server records the request and the sale. But the request fails by timing out. Was the
seat available and the sale recorded? Even if there is a backup server to which the request
can be re-issued, there is a risk that the client will be sold two tickets, which is an
expensive mistake in Carnegie Hall .
Here are some common error conditions that need to be handled:
Network data loss resulting in retransmit: Often, a system tries to achieve 'at most
once' transmission tries. In the worst case, if duplicate transmissions occur, we try to
minimize any damage done by the data being received multiple time.
Server process crashes during RPC operation: If a server process crashes before it
completes its task, the system usually recovers correctly because the client will
initiate a retry request once the server has recovered. If the server crashes completing
the task but before the RPC reply is sent, duplicate requests sometimes result due to
client retries.
Client process crashes before receiving response: Client is restarted. Server
discards response data.
Fundamental Design Principles for Distributed System
We can define some fundamental design principles which every distributed system
designer and software engineer should know. Some of these may seem obvious, but it
will be helpful as we proceed to have a good starting list.
As Ken Arnold says: "You have to design distributed systems with the expectation of
failure." Avoid making assumptions that any component in the system is in a
particular state. A classic error scenario is for a process to send data to a process
running on a second machine. The process on the first machine receives some data
back and processes it, and then sends the results back to the second machine
assuming it is ready to receive. Any number of things could have failed in the interim
and the sending process must anticipate these possible failures.
Explicitly define failure scenarios and identify how likely each one might occur.
Make sure your code is thoroughly covered for the most likely ones.
Both clients and servers must be able to deal with unresponsive senders/receivers.
Think carefully about how much data you send over the network. Minimize traffic as
much as possible.
Latency is the time between initiating a request for data and the beginning of the
actual data transfer. Minimizing latency sometimes comes down to a question of
whether you should make many little calls/data transfers or one big call/data transfer.
The way to make this decision is to experiment. Do small tests to identify the best
compromise.
Don't assume that data sent across a network (or even sent from disk to disk in a rack)
is the same data when it arrives. If you must be sure, do checksums or validity checks
on data to verify that the data has not changed.
Caches and replication strategies are methods for dealing with state across
components. We try to minimize stateful components in distributed systems, but it's
challenging. State is something held in one place on behalf of a process that is in
another place, something that cannot be reconstructed by any other component. If it
can be reconstructed it's a cache. Caches can be helpful in mitigating the risks of
maintaining state across components. But cached data can become stale, so there may
need to be a policy for validating a cached data item before using it.
If a process stores information that can't be reconstructed, then problems arise. One
possible question is, "Are you now a single point of failure?" I have to talk to you
now - I can't talk to anyone else. So what happens if you go down? To deal with this
issue, you could be replicated. Replication strategies are also useful in mitigating the
risks of maintaining state. But there are challenges here too: What if I talk to one
replicant and modify some data, then I talk to another? Is that modification
guaranteed to have already arrived at the other? What happens if the network gets
partitioned and the replicants can't talk to each other? Can anybody proceed?
There are a set of tradeoffs in deciding how and where to maintain state, and when to
use caches and replication. It's more difficult to run small tests in these scenarios
because of the overhead in setting up the different mechanisms.
Be sensitive to speed and performance. Take time to determine which parts of your
system can have a significant impact on performance: Where are the bottlenecks and
why? Devise small tests you can do to evaluate alternatives. Profile and measure to
learn more. Talk to your colleagues about these alternatives and your results, and
decide on the best solution.
Retransmission is costly. It's important to experiment so you can tune the delay that
prompts a retransmission to be optimal.
Top Related