Distributed Systems. Introduction to Distributed Systems Why do we develop distributed systems?...

Distributed Systems

Introduction to Distributed Systems• Why do we develop distributed systems?

– availability of powerful yet cheap microprocessors (PCs, workstations), continuing advances in communication technology.

• What is a distributed system?• A distributed system is a collection of independent

computers connected via a high speed computers and appear to the users of the system as a single system.

Examples of DS

– Network of workstations :Personal workstations + a pool of processors + single file system

– Distributed manufacturing system (e.g. automated assembly line) :Robots on the assembly line + Robots in the parts department

– Network of branch office computers : A large bank with hundreds of branch offices all over the world

Advantages of Distributed Systems over Centralized Systems

• Economics: a collection of microprocessors offer a better price/performance ratio than mainframes. Low price/performance ratio: cost effective way to increase computing power. microprocessors offer a better price/performance than mainframes

• Speed: a distributed system may have more total computing power than a mainframe. Ex. 10,000 CPU chips, each running at 50 MIPS. Not possible to build 500,000 MIPS single processor since it would require 0.002 nsec instruction cycle. Enhanced performance through load distributing. a distributed system may have more total computing power than a mainframe

Advantages of Distributed Systems over Centralized Systems

• Inherent distribution: Some applications are inherently distributed. Ex. a supermarket chain, co-operative work ( a group of people located far away stations, can work together and create join report) .

• Reliability: If one machine crashes, the system as a whole can still survive. Higher availability and improved reliability. Ex: Nuclear reactors, aircrafts.

• Incremental growth: Computing power can be added in small increments. Modular expandability i. e Gradual expansion is possible .

• Another deriving force: the existence of large number of personal computers, the need for people to collaborate and share information

Advantages of Distributed Systems over Independent PCs

– Data sharing: allow many users to access to a common data base.Ex: Airline/ Railway Reservation system

– Resource Sharing: expensive peripherals like color printers, Phototypesetters, storage devices can be shared in DS .

– Communication: make human-to-human communication easier, for example, by E-mail , chat.

– Flexibility: spread the workload over the available machines in the most cost effective way

Disadvantages of Distributed Systems

– Software: difficult to develop software , PL and applications for distributed systems.

– Network: saturation, lossy transmissions. the network can saturate or cause other problems i.e may loose messages, & can be overloaded

– Security: easy access also applies to secrete data.

Hardware Concepts of DS• MIMD (Multiple-Instruction Multiple-Data)

• Tightly Coupled versus Loosely Coupled MIMD Tightly coupled systems (multiprocessors)( pure DS )

o shared memoryo intermachine delay short, data rate high

Loosely coupled systems (multicomputers)( EX: LAN)o private memoryo intermachine delay long, data rate low

•

Bus versus Switched MIMD

• Bus: a single network, backplane, bus, cable or other medium that connects all machines. E.g., cable TV

• Switched: individual wires from machine to machine, with many different wiring patterns in use. Masg can be passed from m/c to m/c . The roots can be changed by using switch Ex: world wide telephonic system .

• Categories of MIMD( DS ) :– Multiprocessors (shared memory)

• Bus multiprocessor • Switched multiprocessor

– Multicomputers (private memory)

• Bus Multicomputers

• Switched Multicomputers

write-through cache:if a word is written to cache it should be written to memory as well i.e propagate write immediately

snoopy cache: whenever cache sees a write occuring to mem address present in cache it either removes that entry from cache or update it with new value this is called as Snoopy Cache .

Switched Multiprocessor

Switched Multiprocessor

• For Crossbar switch we need n CPUs & n Memories so n2 crossbar switches are needed .for large n it will be difficult.

• Solution is use Omega Network : – contains four 2*2 switches .– Each having 2 inputs and 2 outputs– Each switch can route either input to either

output.

Multicomputers• Bus-Based Multicomputers

– Each CPU has direct connection to its own local memory .

– easy to build communication volume much smaller. relatively slow speed LAN (10-100 Mbps,

compared to 300 Mbps and up for a backplane bus)

Bus Multicomputer

Switched MulticomputersSwitched Multicomputers:-Each CPU has direct and exclusive access to its

own, private memory.-interconnection networks: E.g., grid, hypercube- Grids : easy to understand and lay out on PCB . Best suited for 2 D nature problemsEx: Graph Theory .

-hypercube: n-dimensional cube: each CPU has n connections to other CPUs.- msg passing is easy becoz of nearest neighbors are connected .

Switched Multicomputer

Software Concepts

• Software more important for users

• Three types:1.Network Operating Systems

2.(True) Distributed Systems

3.Multiprocessor Time Sharing

Network Operating Systems loosely-coupled software on loosely-coupled hardware A network of workstations connected by LAN Here each user has a workstation for its exclusive use along with

Exclusive OS . All commands normally run locally. Sometimes user can log into another machine remotely

o rlogin machineo rcp machine1:file1 machine2:file2

o To avoid communication problems one approach is to provide a shared global file system accessible from all workstation .

Files servers: the file system supported by one or more machines is called as file servers .client and server model. Refer fig …

Clients mount directories on file servers Best known network OS:

o Sun’s NFS (network file servers) for shared file systems o A situation where each m/c has high degree of Autonomy and there

are a few system-wide requirements that os is called as NOS .

NFS• NFS Architecture

– Server exports directories– Clients mount exported directories

• NFS Protocols– For handling mounting– For read/write: no open/close, stateless

• NFS Implementation

(True) Distributed Systems tightly-coupled software on loosely-coupled hardware

provide a single-system image or a virtual uniprocessor

a single, global interprocess communication mechanism, process management, file system; the same system call interface everywhere

Ideal definition:

“ A distributed system runs on a collection of computers that do not have shared memory, yet looks like a single computer to its users.”

Multiprocessor Operating Systems

Tightly-coupled software on tightly-coupled hardware Examples: high-performance servers shared memory single run queue traditional file system as on a single-processor

system: central block cache

Design Issues of Distributed Systems

• Transparency

• Flexibility

• Reliability

• Performance

• Scalability

1. Transparency• How to achieve the single-system image, i.e., how

to make a collection of computers appear as a single computer. A system that realizes this goal is said to be transparent .

• Hiding all the distribution from the users as well as the application programs can be achieved at two levels:

1) hide the distribution from users .

2) at a lower level, make the system look transparent to programs.

1) and 2) requires uniform interfaces such as access to files, communication.

Types of transparency– Location Transparency: users cannot tell where hardware and

software resources such as CPUs, printers, files, data bases are located. The name of resource must not encode the location of resource , so machine1: prog.c or /machine1/proc.c are not acceptable .

– Migration Transparency: resources must be free to move from one location to another without their names changed.E.g., /usr/lee, /central/usr/lee

– Replication Transparency: OS can make additional copies of files and resources on its own without the users noticing.

– Concurrency Transparency: if the system is CT then The user will not notice the existence of another user . to achieve CT , one mechanism is to lock the resource automatically when one had started to use it , unlocking it when the access was finished . Lock and unlock for mutual exclusion.

– Parallelism Transparency: Automatic use of parallelism without having to program explicitly.

There are times when Users do not want complete transparency.

2. Flexibility • Make it easier to change• Monolithic Kernel: each machine should run a traditional

kernel that provides most services itself.• systems calls are trapped and executed by the kernel. All

system calls are served by the kernel, e.g., UNIX.• Adv: performance • Ex: today’s centralized operating system with n/w facilities

and remote services, Sprite os• Microkernel: provides minimal services.

1) IPC 2) some memory management 3) some low-level process management and scheduling 4) low-level i/oE.g.,Amoeba , Mach can support multiple file systems, multiple system interfaces. Most of DS has this kind of system .

3. Reliability • The goal behind designing DS is to make

them more reliable than single processor system.

• Aspects of reliability : – Availability: fraction of time the system is

usable. Another tool to improve availability is redundancy i.e key pieces of h/w and s/w can be replicated .

– Security : files and other resources must be protected from unauthorized users.

– Fault tolerance: need to mask failures, recover from errors.

4. Performance• Performance problem is compounded by the

fact that is communication which is essential in DS .

• Performance loss due to communication delays:– fine-grained parallelism: high degree of

interaction, not good for DS. – coarse-grained parallelism : best fit for DS

( jobs that involve large computations, less interactions and little data)

• Performance loss due to making the system fault tolerant.

5. Scalability• Systems grow with time or become obsolete.

Techniques that require resources linearly in terms of the size of the system are not scalable. e.g., broadcast based query won't work for large distributed systems.

• Examples of bottleneckso Centralized components: a single mail servero Centralized tables: a single URL address booko Centralized algorithms: routing based on

complete information, only decentralized algorithm should be used in Ds.

Characteristics of Decentralized Algorithm • No m/c has complete information about the

system state .

• m/c makes decision based on local information only

• Failure of one m/c does not ruin the algorithm

• There is no implicit assumption that a global clock exists.

Experiments & Assignments

1. Simulating NOS commands:

1. Implementation of Stack

2. Implementation of Queue.

2. RMI supporting Distributing Computing in JAVA:

1. Design a GUI based Calculator

2. Retrieve Time & Date function from server to Client .

3. Develop a Program for Client Chat Server.

4. Remote Objects for Database Access.

5. Amoeba, V-System, Mach , Chorus – Case study

6. Centralized & Distributed Algorithms for Clock synchronization.

7. Deadlock prevention & Deadlock detection algorithms.

8. Election algorithms for co-ordinator .

9. Load balancing Algorithms.

Distributed Systems. Introduction to Distributed Systems Why do we develop distributed systems?...

Documents

Transcript of Distributed Systems. Introduction to Distributed Systems Why do we develop distributed systems?...