Chapter 1 introduction

Chapter 1 - Introduction

2

1.1 Introduction and Definitionbefore the mid-80s, computers were

very expensive (hundred of thousands or even millions of dollars)very slow (a few thousand instructions per second)not connected among themselves

after the mid-80s: two major developmentscheap and powerful microprocessor-based computers appearedcomputer networks

LANs at speeds ranging from 10 to 1000 Mbps (now even 10Gbps)WANs at speed ranging from 64 Kbps to gigabits/sec

consequencefeasibility of using a large network of computers to work for the same application; this is in contrast to the old centralized systems where there was a single computer with its peripherals

3

Definition of a Distributed Systema distributed system is:a collection of independent computers that appears to its users as a single coherent system - computer (Tanenbaum& Van Steen)

this definition has two aspects:1. hardware: autonomous machines2. software: a single system view for the users

4

Other DefinitionsA distributed system is a system designed to support the development of applications and services which can exploit a physical architecture consisting of multiple, autonomous processing elements that do not share primary memory but cooperate by sending asynchronous messages over a communication network (Blair & Stefani)

A distributed system is one that stops you getting any work done when a machine you have never even heard of crashes (Leslie)

5

Why Distributed?Resource and Data Sharing

printers, databases, multimedia servers, ... Availability, Reliability

the loss of some instances can be hidden Scalability, Extensibility

the system grows with demand (e.g., extra servers) Performance

huge power (CPU, memory, ...) available Inherent distribution, communication

organizational distribution, e-mail, video

6

Problems of Distribution Concurrency, Security

clients must not disturb each other Privacy

e.g., when building a preference profile such as using cookies unwanted communication such as spam

Partial failure we often do not know where the error is (e.g., RPC)

Location, Migration, Relocation, Replication clients must be able to find their servers

Heterogeneity hardware, platforms, languages, management

7

Characteristics of Distributed Systemsdifferences between the computers and the ways they communicate are hidden from usersusers and applications can interact with a distributed system in a consistent and uniform way regardless of locationdistributed systems should be easy to expand and scalea distributed system is normally continuously available, even if there may be partial failures

8

1.2 Goals of a Distributed Systemto support heterogeneous computers and networks and to provide a single-system view, a distributed system is often organized by means of a layer of software called middleware that extends over multiple machines

a distributed system organized as middleware; note that the middleware layer extends over multiple machines, and offers each application the

same interface

Ack: most diagrams in all slides are taken from the textbook

9

a distributed system should easily connect users with resources (printers, computers, storage facilities, data, files, Web pages, ...)

Some of the reasonseconomics: sharing resources such as printers and high-speed computersto collaborate and exchange informationgroupware: software for collaborative editing, teleconferencing, etc.e-commerce: buying and selling goods

be transparent: hide the fact that the resources and processes are distributed across multiple computersbe openbe scalable

Transparency in a Distributed Systema distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent

10

different forms of transparency in a distributed systemTransparency DescriptionAccess Hide differences in data representation

(endianness, file naming, ...) and how a resourceis accessed

Location Hide where a resource is physically located; whereis http://www.prenhall.com/index.html? (naming)

Migration Hide that a resource may move to another locationRelocation Hide that a resource may be moved to another

location while in use; e.g., mobile users using their wireless laptops and moving from place to place

Replication Hide that a resource is replicated (for availabilityand performance); all replicas have the same name

Concurrency Hide that a resource may be shared by several competitive users; a resource must be left in a consistent state; through locking

Failure Hide the failure and recovery of a resourceBut trying to achieve all distribution transparency may be impossible or may not be a good idea

11

Openness in a Distributed Systema distributed system should be openwe need well-defined interfaces interoperability

components of different origin can communicate portability

components work on different platforms another goal of an open distributed system is that it should be flexible and extensible; easy to configure the system out of different components; easy to add new components, replace existing ones; easier said than donean Open Distributed System is a system that offers services according to standard rules that describe the syntax and semantics of those services; e.g., protocols in networksstandards - a necessity

12

Scalability in Distributed Systemsa distributed system should be scalable; there are three dimensions

size: adding more users and resources to the systemgeographically: users and resources may be far apartadministratively: should be easy to manage even if it spans many administrative organizations

but a scalable system may exhibit performance problems

in distributed systems, such services are often specified through interfaces often described using an Interface Definition Language (IDL)

specify only syntax: the names of the functions, types of parameters, return values, possible exceptions, ...semantics given in an informal way by means of natural languages

13

Concept Example

Centralized services Single server for all users-mostly for security reasons

Centralized data A single on-line telephone book

Centralized algorithms Doing routing based on complete information

examples of scalability limitations

scalability problems leading to low performance

Scaling Techniques: how to solve scaling problemsthe problem is mainly performance, and arises as a result of limitations in the capacity of servers and networks (for geographical scalability with high latency and mostly unreliable links)three possible solutions: hiding communication latencies, distribution, and replication

14

a. Hide Communication Latenciestry to avoid waiting for responses to remote service requestslet the requester do other useful jobi.e., construct requesting applications that use only asynchronous communication instead of synchronouscommunication; when a reply arrives the application is interruptedgood for batch processing and parallel applicationssince independent tasks can be scheduled while another task is waiting for communication to complete or use multithreading for non-parallel programshiding communication latencies is not in general applicable for interactive applicationsfor interactive applications, try to reduce communication; move part of the job to the client to reduce communication; e.g. filling a form to access a database and checking the entries

15

(a) a server checking the correctness of field entries(b) a client doing the job

e.g., checking the completeness of mandatory fieldsshipping code is now supported in Web applications using Java Applets and ActiveX controls (with some security issues)

16

b. Distributionmeans splitting a component into smaller parts and spreading those parts across the systeme.g., DNS - Domain Name System ([email protected])divide the name space into nonoverlapping zonesfor details, see later in Chapter 5 - Naming

an example of dividing the DNS name space into zones

17

c. Replicationreplicate components across a distributed system to increase availability and for load balancing, leading to better performancereplication is decided by the owner of a resourcecaching (a special form of replication) also reduces communication latency; decided by the userbut, caching and replication may lead to consistencyproblems (see Chapter 7 - Consistency and Replication)

18

Pitfalls when Developing Distributed Systemsbecause of false assumptions made by first time developers (of distributed systems) which are related to the properties of distributed systems and do not occur in nondistributed applications

The network is reliable (making it difficult to achieve failure transparency)The network is secureThe network is homogeneousThe topology does not changeLatency is zeroBandwidth is infiniteTransport cost is zeroThere is one administrator

19

1.3 Types of Distributed SystemsThree types: distributed computing systems, distributedinformation systems, and distributed pervasive/embeddedsystems

1. Distributed Computing SystemsUsed for high-performance computing taskstwo types: cluster computing and grid computingCluster Computing

a collection of similar workstations or PCs (homogeneous), closely connected by means of a high-speed LANeach node runs the same operating systemused for parallel programming in which a single compute intensive program is run in parallel on multiple machines

20

an example of a cluster computing system

a master node runs a middleware (containing libraries for parallel programs) and controls other compute nodes; it

allocates tasksprovides an interface to usersetc.

21

Grid Computing“Resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations” (Ian Foster)high degree of heterogeneity: no assumptions are made concerning hardware, operating systems, networks, administrative domains, security policies, etc.Globus is a software system for Grid Computing; read about the Globus Alliance at http://www.globus.org/

2. Distributed Information Systemsproblem: many networked applications with a problem of interoperabilityat the lowest level: wrap a number of requests into a single larger request and have it executed as a distributed transaction; all or none of the requests would be executedhow to let applications communicate directly with each other, i.e., Enterprise Application Integration (EAI)

22

a. Transaction Processing Systemsconsider database applicationsspecial primitives are required to program transactions, supplied either by the underlying distributed system or by the language runtime systemexact list of primitives depends on the type of application; procedure calls, ordinary statements, etc. can also be included

Primitive DescriptionBEGIN_TRANSACTION Mark the start of a transaction

END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old values

READ Read data from a file, a table, or otherwise

WRITE Write data to a file, a table, or otherwise

23

The Transaction Modelthe model for transactions comes from the world of businessa supplier and a retailer negotiate on

pricedelivery datequalityetc.

until the deal is concluded they can continue negotiating or one of them can terminatebut once they have reached an agreement they are bound by law to carry out their part of the dealtransactions between processes is similar with this scenario

24

e.g., assume the following banking operationwithdraw an amount x from account 1deposit the amount x to account 2

what happens if there is a problem after the first activity is carried out?group the two operations into one transaction; either both are carried out or neitherwe need a way to roll back when a transaction is not completed

25

(a) transaction to reserve three flights commits(b) transaction aborts when third flight is unavailable

BEGIN_TRANSACTIONreserve Man → Heathrow;reserve Heathrow → Bole;reserve Bole → Lalibella;

END_TRANSACTION(a)

BEGIN_TRANSACTIONreserve Man → Heathrow;reserve Heathrow → Bole;reserve Bole → Lalibella full ⇒

ABORT_TRANSACTION(b)

e.g. reserving a seat from Manchester to Lalibella through Heathrow and AA Bole airports

26

properties of transactions, often referred to as ACID1. Atomic: to the outside world, the transaction happens

indivisibly; a transaction either happens completely or not at all; intermediate states are not seen by other processes

2. Consistent: the transaction does not violate system invariants; e.g., in an internal transfer in a bank, the amount of money in the bank must be the same as it was before the transfer (the law of conservation of money); this may be violated for a brief period of time, but not seen to other processes

3. Isolated or Serializable: concurrent transactions do not interfere with each other; if two or more transactions are running at the same time, the final result must look as though all transactions run sequentially in some order

4. Durable: once a transaction commits, the changes are permanent; see later in Chapter 8 - Fault Tolerance

27

Classification of Transactionsa transaction could be flat, nested or distributedFlat Transaction

consists of a series of operations that satisfy the ACID propertiessimple and widely used but with some limitations

do not allow partial results to be committed or abortedi.e., atomicity is also partly a weaknessin our airline reservation example, we may want to accept the first two reservations and find an alternative one for the last

some transactions may take too much time

28

Nested Transactionconstructed from a number of subtransactions; it is logically decomposed into a hierarchy of subtransactions; the flight reservation can be split into three transactions, each accessing a different databasethe top-level transaction forks off children that run in parallel, on different machines; to gain performance or for programming simplicityeach may also execute one or more subtransactionspermanence (durability) applies only to the top-level transaction; commits by children should be undone

Distributed Transactiona flat transaction that operates on data that are distributed across multiple machinesproblem: separate algorithms are needed to handle the locking of data and committing the entire transaction; see later in Chapter 8 for distributed commit

29

(a) a nested transaction(b) a distributed transaction

30

b. Enterprise Application Integrationhow to integrate applications independent from their databasestransaction systems rely on request/replyhow can applications communicate with each other; by means of a middleware

middleware as a communication facilitator in enterprise application integration

31

there are different communication modelsRPC (Remote procedure Call)RMI (Remote Method Invocation)MOM (Message-Oriented Middleware)Stream-Oriented CommunicationMulticast Communication

see later in Chapter 4 - Communication

3. Distributed Pervasive Systemsthe distributed systems discussed so far are characterized by their stability; fixed nodes having high-quality connection to a networkthere are also mobile and embedded computing devices which are small, battery-powered, mobile, and with a wireless connection

32

three requirements for pervasive applicationsembrace contextual changes: a device is aware that its environment (location, identities of nearby people and objects, time of the day, season, temperature, etc.) may change all the time, e.g., by changing its network access point; hence its operations and services must be adapted to the current contextencourage ad hoc composition: devices are used in different ways by different usersrecognize sharing as the default: devices join a system to access or provide information

examples of pervasive systemsHome Systems that integrate consumer electronicsElectronic Health Care Systems to monitor the well-being of individualsSensor Networksread pages 26 - 30

33

[ DiversionDifferent approaches to distribution - Lost in the forest of distribution

Distributed systemN autonomous computers (sites): n administrators, n data/control flowsan interconnection networkuser view: one single (virtual) system(traditional) programmer view: client-server

Parallel system1 computer, n nodes: one administrator, one scheduler, one power sourcememory: it dependsprogrammer view: one single machine executing parallel codes; various programming models (message passing, distributed shared memory, …)

34

Cluster computinguse of PCs interconnected by a (high performance) network as a parallel (cheap) machine

Network computingfrom LAN (cluster) computing to WAN computingset of machines distributed over a MAN/WAN that are used to execute parallel loosely coupled codesdepending on the infrastructure, network computing comes in many flavours: grid computing, P2P, Internet computing, etc.

Grid computing“Resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations” (Ian Foster)

Peer-to-peer computinga site is both client and server

35

application: mostly file sharing, but also others like Internet Telephony (Skype)2 approaches:

centralized management: Napsterdistributed management: Gnutella, Kazaa

Internet Computinguse of (idle) computers interconnected by Internet for processing large throughput applicationsprogrammer view: a single master, n servants

Cloud Computinga general term for anything that involves delivering hosted services over the Interneta model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction

]

Chapter 1 introduction

Education

Transcript of Chapter 1 introduction