CS 843 - Distributed Computing Systems Chapter 6: Operating System Support Chin-Chih Chang,...

CS 843 - Distributed Computing Systems

Chapter 6: Operating System SupportChin-Chih Chang, [email protected]

From Coulouris, Dollimore and Kindberg

Distributed Systems: Concepts and Design

Edition 3, © Addison-Wesley 2001

Introduction

• An important aspect of distributed systems is resource sharing.

• Applications (clients) and services (resource managers) use middleware for their interaction

• Middleware provides resource invocation between objects or processes at the nodes of a distributed system.

Software and hardware service layers in distributed systems

Applications, services

Computer and network hardware

Platform

Operating system

Middleware

Operating System Layer

• How well the requirements of middleware can be met by the operating system?

• Those requirements include: efficient and robust access to physical resources flexibility to implement the resource management

• An operating system is software that controls access to the underlying resources – processors, memory, communications, and storage media.

Two Concepts of Operating Systems

• Network operating system (There are multiple system images): It has networking capability built in. It can be used to access remote resources. A user use rlogin or telnet to another computer. It does not schedule process across the node. For example, Unix, MacOS, Windows

• Distributed system (There is a single system image) It controls all nodes. It transparently puts new processes at suitable nodes.

Middleware and Network Operating System (NOS)

• There are no distributed operating systems in general use for two reasons: For NOS, users can use it in current problem-solving. Users prefer to have a degree of autonomy for their machine

• Combination of middleware and NOS provides some autonomy and network transparent resources access.

• NOS enables user to run word processor and other stand-alone applications. Middleware enables them to take advantage of services available in their distributed system.

Operating System Layer

• Users will only be satisfied if their middleware-OS combination has good performance.

• Middleware runs on a variety of OS-hardware combinations at a node.

• The OS running at node provides its own abstractions of local hardware.

• Middleware utilizes these local resources to implement the remote invocation between objects or processes at the nodes.

Figure 6.1 System layers

Applications, services

Computer &

Platform

Middleware

OS: kernel,libraries & servers

network hardware

OS1

Computer & network hardware

Node 1 Node 2

Processes, threads,communication, ...

OS2Processes, threads,communication, ...

OS layer support the middleware

• Figure 6.1 shows how the operating system layer at each of two nodes supports a common middleware layer.

• Kernels and the client and server processes are the chief components that manage resources and present client and require: Encapsulation: They provide a useful interface to their

resources. Protection: Files are protected from being read by

users without read permission. Concurrent processing: Clients may share resources

and access them concurrently.

OS layer support the middleware

• Clients access resources by making: A RMI to a server object A system call to a kernel

• The following invocation-related tasks are performed: Communication: Operation parameters and results

are passed by a network or within a computer. Scheduling: When an operation is invoked, its process

must be scheduled with the kernel or server.

• Figure 6.2 shows the core OS functionality that we shall be concerned with.

Figure 6.2 Core OS functionality

Communication

manager

Thread manager Memory manager

Supervisor

Process manager

Core OS Functionality

• Process manager: Handles the creation of and operations upon processes.

• Thread manager: Thread creation, synchronization and scheduling

• Communication manager: Threads attached to different processes on the same computer or remote processes.

• Memory manger: Management of physical and virtual memory.

• Supervisor: Dispatching of interrupts, system call traps and other exceptions.

Protection

• Shared resources require protection from illegitimate accesses.

• The threat to a system’s integrity does not only come from malicious code but also benign code with errors.

• For example, suppose that open files have only two operation: read and write. Protection involves: Ensure file operations can be performed only by clients with

the right to perform it. Prevent a misbehaving client performs some operation that

is not specified. For example, access the file pointer directly.

Protection

• We can protect resources from illegitimate invocations such as setFilePointerRandomly by the following methods: We can use a type-safe programming language, such as

Java or Modula-3. No module may access a target module unless it has a reference to it.

We can employ hardware support to protect modules from one another at the level of individual invocations. This protection mechanism needs to be built into a kernel.

Kernel and Protection

• The kernel sets up address spaces to protect itself and other processes.

• A process can not access the memory outside its address space.

• The process can safely transfer (switch) from a user level address space to the kernel’s address space via an interrupt or a system call trap.

• Programs pay a price for protection.

• Switch between address spaces may take many processor cycles.

Processes and Threads

• Nowadays, a process consists of an execution environment together with one or more threads.

• A thread is the operating system abstraction of an activity.

• An execution environment is the unit of resource management which primarily consists of: An address space Thread synchronization and communication resources Open files and windows

Processes and Threads

• Execution environments are expensive, but several threads can share them.

• The use of threads can be very helpful within servers, where concurrent processing of clients’ requests can reduce the tendency for servers to become bottlenecks.

• For example, one thread can process a client’s request and a second thread waits for a disk access to complete.

Address Spaces

• An address space is a unit of management of a process’ virtual memory. It is large and can be up to 264 bytes. It consists of one or more regions.

• A region (Figure 6.3) is an area of contiguous virtual memory and has a following properties: Its context (in lowest address) Read/write/execute permissions for the process’

threads Whether it can be grown upwards or downwards.

Figure 6.3 Address space

Stack

Text

Heap

Auxiliaryregions

0

2N

Address Spaces

• This model is page-oriented rather than segment-oriented.

• This address space model has three general regions: A fixed text region contains code. A heap contains initialized values and dynamically

allocated variables and extends to higher address. A stack contains dynamically allocated code (routine)

and extends to lower address.

Address Spaces

• A shared memory region is a region that can be accessed by several processes.

• The uses of shared regions include the following: Libraries: Library code can be very large and would

waste considerable memory. Kernel: Often the kernel code and data are mapped into

every address space at the same location. Then there is no need to switch during a system call or an exception.

Data sharing and communication: Two processes might need to share data in order to cooperate on some task.

Creating new Processes

• The process creation in UNIX: fork system call creates a new process with an execution

environment inherited from the caller (parent process). exec system call replaces the process. memory space

with a new program. wait system call moves the parent process off the ready

queue until the termination of the child.

• Creating a new process has two aspects in a distributed system: Choosing a target host Creating the execution environment

Choosing the Target Host

• On which node will the process be created?• It may depend on the policy used:

Transfer Policy: This policy determines where to put the new process - local or remote.

Location Policy: Which node should the process be created on. Based on several parameters, there are two types of policies:

oStatic – It is based on a mathematical analysis without regard to the current state of the system.

oAdaptive – It applies heuristics (unpredictable run-time factors like current load) to make the location decision.

Load-Sharing Systems

• Load-sharing Systems may be Centralized – There is one centralized load manager. Hierarchical – Load managers are built in a tree structure. Decentralized – Load managers exchange information.

• Algorithms may also be Sender-Initiated – The node that requests a new process

initiates the transfer decision. Receiver-Initiated – The node whose load is low

advertise its availability.

• Transferring a process from one node to another is known as process migration.

Creating New Execution Environments

• Two methods can determine content of new environment. Statically defined: The address space regions are

initialized from an executable file or filled with zeros. Dynamic defined: The address space can be defined

based on some type of existing execution environment.o In UNIX fork semantics, the newly created child

process physically shares the parent’s text region, and has heap and stack that are copies of the parent’s context.

Creating New Execution Environments

• When based on existing execution environment, an inherited region may be: Shared with the parent. Copied from the parent’s region.

• Mach and Chorus apply an optimization called copy-on-write when an inherited region is copied from the parent (Figure 6.4). The inherited region is shared between parent and child

address spaces. A page is only physically copied when one or other

process attempts to modify it.

• Copy-on-write can be used in copying large message.

Figure 6.4 Copy-on-write

a) Before write b) After write

Sharedframe

A's pagetable

B's pagetable

Process A’s address space Process B’s address space

Kernel

RA RB

RB copiedfrom RA

Thread vs. Process

• A thread – lightweight process (LWP) is a basic unit of CPU utilization.

• It comprises a thread ID, a program counter, a register set, and a stack.

• A traditional (heavyweight) process has a single thread of control.

• If the process has multiple threads of control, it can do more than one task at a time.

Single and Multithreaded Processes

ProcessThread activations

Activation stacks(parameters, local variables)

Threads concept and implementation

'text' (program code)Heap (dynamic storage, objects, global variables)

system-provided resources(sockets, windows, open files)

*

Motivation

• An application typically is implemented as a separate process with several threads of control.

• For example, a web browser might have one thread display images or text while another thread retrieves data from the network.

• It is more efficient for a process that contains multiple threads to serve the same purpose.

• This approach would multithreaded the web-server process.

Figure 6.5 Client and server with threads

Server

N threads

Input-output

Client

Thread 2 makes

T1

Thread 1

requests to server

generates results

Requests

Receipt &queuing

Benefits

• Responsiveness: Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.

• Resource Sharing: Threads share the memory and the resources of the process to which they belong.

• Economy: Allocating memory and resources for process creation is costly.

• Utilization of MP (multiprocessor) Architectures: Each thread may be running in parallel on a different processor.

Threading Architectures

• Worker Pool Architecture (Figure 6.5) The server creates a fixed pool of worker threads. Benefit: Easy to implement Drawback: It is inflexible.

o Too few workerso High level of switching

• Thread–Per–Request Architecture (Figure 6.6a) The I/O thread spawns a new work thread for each

request. Once it finishes the request, it destroys itself. Benefit: The threads do not contend for a shared queue. Drawback: The overhead of the thread creation and

destruction.

Threading Architectures

• Thread–Per–Connection Architecture (Figure 6.6b) - The I/O thread spawns a new work thread for each connection. Once a client closes the connection, it destroys the thread.

• Thread–Per–Object Architecture (Figure 6.6c) - The I/O thread spawns a new work thread for each object. Benefit: It has lower overhead. Drawback: Clients may be delayed while a worker thread

has several outstanding requests but another thread has no work to do.

Figure 6.6 Alternative server threading architectures

a. Thread-per-request b. Thread-per-connection c. Thread-per-object

remote

workers

I/O

objects

serverprocess

remote

per-connection threads

objects

serverprocess

remoteI/O

per-object threads

objects

serverprocess

• Threads implemented by the server-side ORB in CORBA: a. would be useful for UDP-based service, e.g. Network Time Protocol (NTP)b. is the most commonly used - matches the TCP connection modelc. is used where the service is encapsulated as an object. E.g. could have

multiple shared whiteboards with one thread each. Each object has only one thread, avoiding the need for thread synchronization within objects.

Threads within Clients

• Threads can be useful for clients as well as servers.• Figure 6.5 shows a client process with two threads.

The first thread generates and puts values to be passed to a server in a buffer.

The second thread reads the value from the buffer and performs the remote invocations

• Multi-threaded clients are used in Web browsers. Users experience delays while the browser handles multiple concurrent requests during the page fetching.

Threads vs. Multiple Processes

• Why are threads better than multiple processes?Threads are cheaper to create.Resource sharing can be achieved more efficiently.

• Figure 6.7 shows some of the main state components that must be maintained for execution environments and threads.

• A software interrupt is an event that causes a thread to be interrupted and the control is transferred to the event handler.

Figure 6.7 State associated with execution environments and threads

Execution environment ThreadAddress space tables Saved processor registersCommunication interfaces, open files Priority and execution state (such as

BLOCKED)Semaphores, other synchronizationobjects

Software interrupt handling information

List of thread identifiers Execution environment identifier

Pages of address space resident in memory; hardware cache entries

Threads vs. Multiple Processes

• Summary:Threads are cheaper to create.Resource sharing can be achieved more efficiently.Context Switching to a different thread within the same

process is cheaper than context switching between threads belonging to different processes.

Threads may share data more conveniently and efficiently.However, threads within the same process are not

protected from each other.

Thread Programming

• Threads is concurrent programming in the field of operating systems. Thread programming involves concepts of race condition, critical section, monitor, conditional variable, semaphore.

• Threads programming can be done: With a threads library

o Mach operating systemo IEEE POSIX 1003.4a – pthread library

In programming languageo Ada95o Modula-3o Java

Java Thread Programming

• Java provides methods for creating threads, destroying them and synchronizing them.

• The Java Thread class includes the constructor and management methods listed in Figure 6.8.

• The Thread and Object synchronization methods are in Figure 6.9.

• Thread lifetimes: new – The thread is created and in the suspended state. start / run – The thread is made in the runnable state. The

method of the thread is running. destroy – The thread is terminated.

Figure 6.8 Java thread constructor and management methods

Thread(ThreadGroup group, Runnable target, String name) Creates a new thread in the SUSPENDED state, which will belong to group and be identified as name; the thread will execute the run() method of target.

setPriority(int newPriority), getPriority()Set and return the thread’s priority.

run()A thread executes the run() method of its target object, if it has one, and otherwise its own run() method (Thread implements Runnable).

start()Change the state of the thread from SUSPENDED to RUNNABLE.

sleep(int millisecs)Cause the thread to enter the SUSPENDED state for the specified time.

yield()Enter the READY state and invoke the scheduler.

destroy()Destroy the thread.

Figure 6.9 Java thread synchronization calls

thread.join(int millisecs)

Blocks the calling thread for up to the specified time until thread has terminated.

thread.interrupt()Interrupts thread: causes it to return from a blocking method call such as sleep().

object.wait(long millisecs, int nanosecs)Blocks the calling thread until a call made to notify() or notifyAll() on object wakes the thread, or the thread is interrupted, or the specified time has elapsed.

object.notify(), object.notifyAll()Wakes, respectively, one or all of any threads that have called wait() on object.


• Programs can manage threads in groups. It is useful when several applications coexist on the same

JVM. In the example of security, one group is not allowed to access the methods in other group.

Thread groups facilitate control of the relative priorities of threads. This is useful for browsers running applets, and for servlets.

o A servlet is a server running program which creates dynamic Web pages.

o A thread within an applet or servlet can only create a new thread within its own group.

Thread Synchronization

• Programming a multi-threaded process has the following difficulties: The sharing of objects The techniques used for thread coordination and cooperation.

• The race condition can happen when threads manipulate shared data. This can be prevented by the thread synchronization. Java provides the synchronized keyword for thread coordination in a

monitor control. The monitor guarantee that at most one thread can execute within it at any time.

We could serialize the actions by designating a class as a synchronized one.


• Programming a multi-threaded process has the following difficulties: The sharing of objects The techniques used for thread coordination and

cooperation.

• The race condition can happen when threads manipulate shared data. This can be prevented by the thread synchronization.


• Java provides the synchronized keyword for thread coordination in a monitor control. The monitor guarantee that at most one thread can execute within it at any time.

• We could serialize the actions by designating a class as a synchronized one.

• Java allows threads to be blocked and woken up via arbitrary objects that act as condition variables. A thread that needs to block to wait for an event uses the

wait method. Another thread uses the notify method to unblock it.


• Thread Scheduling Preemptive scheduling – A thread may be suspended by

other thread. Non-preemptive (co-routine) scheduling – A thread can

only run when requesting the threading system to schedule it.

o Benefit: Any section of code that dose not contain a call to the threading system is automatically a critical section.

o Drawback: Multiprocessors can not be utilized. The yield method can allow other thread to be scheduled.

• Java by default does not provide a real-time scheduling but real-time implementations exist.

Thread Implementations

• Thread Implementations Kernel-level threading – Windows 2000/XP, Solaris, Mach

and Chorus User-level threading – POSIX

• A user-level threads has the following drawbacks: The threads cannot take advantage of a multiprocessor. A thread that takes a page fault blocks the entire process. Threads within different processes cannot be scheduled

according to a single scheme of relative prioritization.

Thread Implementations

• A user-level threads has the following advantages: Thread creations are less costly. The thread-scheduling can be customized or changed to

suit specific applications. Many more user-level threads can be supported than the

kernel could provide.

• Hybrid approaches can gain some advantages of both user-level hints to kernel scheduler - Mach hierarchic threads - Solaris 2 event-based - SPIN, FastThreads (Figure 6.10)

Solaris 2 Threads

• Solaris 2 is a version of UNIX with support for threads at the kernel and user levels, SMP, and real-time scheduling.

• Solaris 2 implements the Pthread API and UI threads.

• Between user- and kernel-level threads are ligthweight processes (LWPs).

• Threads in a process multiplex to connect an LWP. An LWP corresponds a kernel thread.

• A (un)bound user-level thread is (not) permanently attached to an LWP.

Solaris 2 Threads

Solaris Process

Figure 6.10 Scheduler activations

ProcessA

ProcessB

Virtual processors Kernel

Process

Kernel

P idle

P needed

P added

SA blocked

SA unblocked

SA preempted

A. Assignment of virtual processors to processes

B. Events between user-level scheduler & kernel Key: P = processor; SA = scheduler activation

Communication and Invocation

• The following design issues are concerned: Communication primitives Protocols and openness Measures to make communication efficient Support for high-latency and disconnected operation

• Communication primitives found in some research kernels. For examples, Amoeba: doOperation getRequest sendReply


• Middleware provides most high-level communication facilities including: RPC/RMI Even notification Group communication

• Middleware is developed over sockets that are found in all common operating systems.

• The principal reasons for using sockets are portability and interoperability.


• An operating system should provide standard protocols that enable internetworking between middleware implementations on different platforms.

• Protocols are normally arranged in a stack of layers. Most operating systems integrate the a layer of protocol statically.

• Dynamic protocol composition is a technique whereby a protocol stack can be composed on the fly to meet the requirements of a particular application.

Support for Communication and Invocation

• The performance of RPC and RMI mechanisms is critical for effective distributed systems.

• Figure 6.11 shows a case of a system call and a remote invocation.

• Typical times for null procedure call (which could measure a fixed overhead, the latency). Local procedure call is less than 1 microseconds. Remote procedure call is about 10 milliseconds. The time for a null procedure call includes:

o The network time involving about 100 bytes transferred, at 100 megabits/sec accounts for only 0.01 millisecond.

o The remaining delays must be in OS and middleware latency, and not communication time.

Figure 6.11 Invocations between Address Spaces

Control transfer viatrap instruction

User Kernel

Thread

User 1 User 2

Control transfer viaprivileged instructions

Thread 1 Thread 2

Protection domainboundary

(a) System call

(b) RPC/RMI (within one computer)

Kernel

(c) RPC/RMI (between computers)

User 1 User 2

Thread 1 Network Thread 2

Kernel 2Kernel 1

Support for Communication and Invocation

• Figure 6.12 shows client delay against requested data size. The delay is roughly proportional to the size until the size reaches the threshold.

• Factors affecting RPC/RMI performance marshalling/unmarshalling data copying - from application to kernel space, across

protocol layers, to communication buffers thread scheduling and context switching - including

kernel entry protocol processing - for each protocol layer network access delays - connection setup and network

latency

Figure 6.12 RPC Delay against Parameter Size

1000 2000

RPC delay

Requested datasize (bytes)

Packetsize

0

Implementation of invocation mechanisms

• Shared memory may be used for rapid communication between a user process and the kernel, or between user processes.

• Most invocation middleware (CORBA, Java RMI, HTTP) is implemented over TCP. The TCP is chosen because of universal availability and

unlimited message size and reliable transfer. Sun RPC (used in NFS) is implemented over both UDP

and TCP and generally works faster over UDP

Implementation of Invocation Mechanisms

• Research-based systems have implemented much more efficient invocation protocols, E.g. Firefly RPC (see www.cdk3.net/oss) Amoeba's doOperation, getRequest, sendReply primitives

(www.cdk3.net/oss) Light-Weight RPC [Bershad et. al. 1990], described on pp.

237-9 as shown in Figure 6.13.

Bershad's LRPC

• It uses shared memory for interprocess communication. while maintaining protection of the two processes arguments copied only once (versus four times for conventional RPC)

• Client threads can execute server code via protected entry points only (uses capabilities)

• Up to 3 x faster for local invocations

Figure 6.13 A lightweight remote procedure call

1. Copy args

2. Trap to Kernel

4. Execute procedureand copy results

Client

User stub

Server

Kernel

stub

3. Upcall 5. Return (trap)

A A stack


• High latency is common in a wireless environment.• The technique to defeat high latencies is

asynchronous operation including concurrent and asynchronous invocations.

• Concurrent invocations: The middleware provides only blocking invocations, but

the applications spawns multiple threads to perform blocking invocations concurrently.

A Web browser is an example. The browser fetch images concurrently.

Figure 6.14 shows the potential benefits of interleaving invocations between a client and a single server.

Figure 6.14 Times for serialized and concurrent invocations

Client Server

execute request

Send

Receiveunmarshal

marshal

Receiveunmarshal

process results

marshalSend

process args

marshalSend

process args

transmission

Receiveunmarshal

process results

execute request

Send

Receiveunmarshal

marshal

marshalSend

process args

marshalSend

process args

execute request

Send

Receiveunmarshal

marshal

execute request

Send

Receiveunmarshal

marshalReceive

unmarshalprocess results

Receiveunmarshal

process resultstime

Client Server

Serialised invocations Concurrent invocations


• Asynchronous invocations: It is performed asynchronously with respect to the caller. Middleware or application does not block waiting for reply

to each invocation.

• Persistent synchronous invocations: It tries indefinitely to perform the invocation until it is

known to have succeeded or failed, or until the application cancels the invocation.

An example is QRPC (Queued RPC). It queues outgoing invocation requests in a stable log while there is no network connection and schedules their dispatch over the network to servers when there is a connection.

Operating System Architecture

• An open distributed system should make it possible to (plug and play modules): Run only the necessary software module at each node. Allow the software module to be changed without effects

on other facilities. Allow the similar software module to be used. Introduce new services without harming the integrity of

existing ones.


• The separation of fixed resource management mechanisms from resource management policies has been a guiding principle in operating system design for a long time: The kernel would provide the most basic mechanisms

upon which the general resource management tasks at a node are carried out.

Server modules would be dynamically loaded as required.

• There are two key examples of kernel design: monolithic and microkernel approaches.


• The UNIX operating system kernel has been called monolithic. This term suggests all basic operating system functions are coded in a non-modular way.

• Microkernel design: The kernel provides only the basic address spaces,

threads, and local interprocess communication. Other services are dynamically loaded. Clients access these system services using the kernel’s

message-based invocation mechanisms (Figure 6.15).

• The place of the microkernel is shown in Figure 6.16.

Figure 6.15 Monolithic Kernel and Microkernel

Monolithic Kernel Microkernel

Server: Dynamically loaded server program:Kernel code and data:

.......

.......

Key:

S4

S1 .......

S1 S2 S3

S2 S3 S4

Figure 6.16 The Role of the Microkernel

Middleware

Languagesupport

subsystem

Languagesupport

subsystem

OS emulationsubsystem ....

Microkernel

Hardware

The microkernel supports middleware via subsystems

Advantages and Disadvantages of Microkernel

• Flexibility and Extensibility Services can be added, modified and debugged. Small kernel has fewer bugs. Protection of services and resources is still maintained.

• Service invocation expensive unless LRPC is used. extra system calls by services for access to protected

resources.

• Different approaches and improvements have been made in the microkenel design.

Summary

• The OS provides local support for the implementation of distributed applications and middleware: Manages and protects system resources (memory,

processing, communication) Provides relevant local abstractions:

o files, processeso threads, communication ports

Summary

• Middleware provides general-purpose distributed abstractions RPC, DSM (Distributed Shared Memory), event notification,

streaming

• Invocation performance is important It can be optimized. For example, Firefly RPC, LRPC

• Microkernel architecture for flexibility The KISS principle ('Keep it simple – stupid!')

o has resulted in migration of many functions out of the OS

CS 843 - Distributed Computing Systems Chapter 6: Operating System Support Chin-Chih Chang,...

Documents

Transcript of CS 843 - Distributed Computing Systems Chapter 6: Operating System Support Chin-Chih Chang,...