CSE 5306 Distributed Systems - CSE SERVICESranger.uta.edu/~dliu/courses/ds/1-intro.pdf ·...

42
CSE 5306 Distributed Systems Course Introduction 1

Transcript of CSE 5306 Distributed Systems - CSE SERVICESranger.uta.edu/~dliu/courses/ds/1-intro.pdf ·...

CSE 5306Distributed Systems

Course Introduction

1

Instructor and TA

• Dr. Donggang Liu @ CSE– Web: http://ranger.uta.edu/~dliu– Email: [email protected]– Phone: 817-2720741– Office: ERB 555– Office hours: Tus/Ths 12:30PM to 2PM

• TA for Section 1– Sarker Ahmed rumee ([email protected])– Office hours: TBA

• TA for Section 3– Harshavardha Gorla ([email protected])– Office hours: TBA

Course Objective

Understand the distributed systems !

...

Textbook and Prerequisites

• Textbook– Andrew S. Tanenbaum and Maarten Van Steen,

Distributed Systems: Principles and Paradigms (2nd Edition)

• Prerequisites– CSE 3320: Operating Systems– CSE 4344: Computer Networks

Topics• Architectures

• Processes

• Communication

• Naming

• Synchronization

• Consistency and replication

• Fault tolerance, security

• Security

• Distributed File Systems.

5

Expected Outcomes

• Enough understanding of distributed systems

• Be able to explain– the principles underlying the functioning of

distributed systems

– how these principles are applied in distributed systems and what the problems and challenges are–

• Understand and estimate the impact of different design choices, system features on distributed systems

6

Grading

• Tentative course work– Pop quiz (10%)– One midterm (30%)

• March 19 (Tuesday After Spring Break) – One final exam (30%)

• May 12 – Projects (30%)

• The final grades are computed as follows– A: >=90% B: >=75%– C: >=60% D: >=40%– F: <40%

Course Policies

• Projects– Deadlines are all firms– Late submission will be accepted with a 10%

reduction in grade for each day it is late by– You must use text editors (e.g., MS Word) to

complete reports

• No makeup exams – Only miss midterm, you final exam will be 45% – Only miss final, you midterm exam will be 45%– Miss both, you get 0% from exams

8

Submission and Confirmation• Email your assignment and project submissions to TA

– Convert your submission to a ps or pdf file

• You should– ask for confirmation in your email, and– get a confirmation email from our TA– The confirmation is the only acceptable evidence

that proves your submission

• Include “CSE 5306” in the subject line of every submission email

9

CSE 5306Distributed Systems

Introduction

10

What is a Distributed System?• A loose definition

– A collection of independent computers that appears to its users as a single coherent system

• Characteristics– Autonomous components (i.e., computers)– A single coherent system

• The difference between components as well as the communication between them are hidden from users

• Users can interact in a uniform and consistent way regardless of where and when interaction takes place

– Easy to expand and replace

• In some sense, distributed systems is an operating system that manages multiple computers connected via network

11

Distributed System As a Middleware

12

A distributed system organized as middleware. The middleware layer extends over multiple machines,

and offers each application the same interface.

Why Distributed Systems?• Achieve something that cannot be easily done by a

single computer– More computing power

– More storage space

– Pervasive computing• Anytime anywhere computing

• Examples– Scientific computing on Grid or Cluster platforms

– Peer-to-peer file sharing system

– www.google.com

– Wireless sensor networks13

Design Goals• Resource accessibility

– Easy to access and share resource

• Distribution transparency– Hide the fact that resources are distributed across the

network

• Openness– The system should offer services according to standard rules

that describe their syntax and semantics

– Extensible: easy to add / replace components

• Scalability– Size scalable, geographically scalable, administratively scalable

14

Resource Accessibility

• Benefits– Economic, e.g., sharing costly devices such as

printers and RAID– Encourage collaboration and exchange of

information, e.g., Internet, Facebook, CVS version control

• Problems– Security, e.g., eavesdropping connection, email

spam, DDOS attacks15

Distribution Transparency• Access transparency

– Hide the difference in data representation and how a resource is accessed

• Location transparency– Hide where a resource is physically located

• Migration transparency– Hide that a resource may be moved to another location

• Relocation transparency– Hide that a resource may be moved during access

• Replication transparency– Hide that a resource may be replicated at many locations

• Concurrency transparency– Hide that a resource may be shared by several competitive users

• Failure transparency– Hide the failure and recovery of a resource

16

Openness• Interoperability

– Implementations from different manufacturers can work together by merely relying on the standard rules

• Portability– Applications from one distributed system can be executed on

another distributed system that implements the same service

• Extensibility– Easy to add or replaces components in the system

• Flexibility– Separating policy from mechanism

17

Scalability

• Measured in three dimensions– Size scalable

• Can easily add more users or resources to the system

– Geographically scalable• Can easily handle users and resources that may lie far apart

– Administratively scalable• Can easily manage the system even if it spans many independent administrative organizations

18

Size Scalability - Examples

• Centralized services–A single server for all users

• Centralized data–A single online telephone book

• Centralized algorithms–A routing algorithm that requires the knowledge of full network topology

19

Decentralized Algorithms

• Characteristics–No machine has complete information about the system state

–Machines make decisions based only on local information

–Failure of one machine does not ruin the algorithm

–No implicit assumption about a global clock

20

Geographical Scalability

• Synchronous communication– Large network latency in wide-area network

–Building interactive application is non-trivial

• Assumption of reliable communication–Wide-area network is unreliable–E.g., locating a server through a single broadcast message

21

Administrative Scalability

• Conflicting policies with respect to –Resource usage–Management–Security

• Attacks from a foreign user

• Attacks from a foreign distributed system

22

Scaling Techniques - Hide Latency

23

The difference between letting (a) a server or (b) a client check forms as they are being filled.

Scaling Techniques - Distribution

24

DNS example: locating nl.vu.cs.flits

Pitfalls

• The network is reliable• The network is secure• The network is homogeneous• The topology does not change• Latency is zero• Bandwidth is infinite• Transport cost is zero• There is one administrator

25

Types of Distributed Systems• Distributed computing systems

– Cluster computing systems– Grid computing systems– Cloud computing systems

• Distributed Information systems– Transaction processing systems– Enterprise application integration (exchange info via RPC or RMI)

• Distributed pervasive systems– Smart-home systems– Electronic Healthcare systems (Heart monitors, Body Area

Network)– wireless sensor networks

26

Cluster Computing Systems• Hooking up a collection of simple computers (mostly

homogeneous) via high-speed network– To build a supercomputing platform

• Example: server cluster at Banks, Brokerages, etc.

27

Linux-based beowulf architecture

Grid Computing Systems• In contrast to cluster computing, grid computing

– Have a high degree of heterogeneity

– No assumption are made concerning hardware, OS, security, etc.

• Users and resources from different organizations are brought together to allow collaboration – Virtual Organization (VO)

• Members belong to the same VO has access to the resources that are provided to that VO

• Focus of the software design for grid computing– Provide access to resources from different administrative

domain to only those users that belong to a specific VO

28

An Example Architecture (1)

29

Applications operate within a VO and make use of grid computing environment

Grid Middleware

An Example Architecture (2)• Fabric layer

– Provide interface to local resources as a specific site within a VO

• Resource layer– Managing a single resource such as creating a process or read

data

• Connectivity layer– Transfer data between resources or access a resource from a

remote location

• Collective layer– Provide access to multiple resources; typically consists of

services for resource discovery, allocation and scheduling30

Cloud Computing Systems

• Computing resources (hardware and software) that are delivered as a service over a network

• Cloud computing providers offer their services according to three fundamental models– Infrastructure as a service (IaaS)

• Basic infrastructure, clients need to install OS images

– platform as a service (PaaS)• Comes with OS, database, web servers

– software as a service (SaaS)• Comes with the software that the clients need

31

Distributed Information Systems

• Deal with the interoperability between networked applications

• Two forms of distributed info. systems–Transaction Processing System (TPS)

• Distributed transaction: all or nothing happened

–Enterprise Application Integration (EAI)• Separate process components from databases

• Let applications communicate with each other

32

Transaction Processing Systems• Properties of transactions

– Atomic: to the outside world, each transaction happens indivisibly

– Consistent: the transaction does not violate the system invariants– Isolated: concurrent transactions do not interfere with each other– Durable: once a transaction commits, the changes are permanent

• Primitives for transactions:

33

Nested Transactions

34

Realizing Transactions

35

Transaction Processing Monitor

Enterprise Application Integration• Goal: link applications in a single organization together to

simplify or automate the business process– Example: publish/subscribe systems

• Middleware as a communication facilitator (RPI,RMI,MOM)

36

Distributed Pervasive Systems• Devices in a distributed pervasive system are often

– Small, batter-powered, mobile, and limited wireless communication

• Three characteristics– Embrace contextual changes

• I was a phone and now I am a web access device• A device must be aware that its environment may change

– Encourage ad hoc composition• Many devices will be used differently by different users

– Recognize sharing as the default• Easy to read, store, manage, and share information

37

Electronic Healthcare Systems (1)

• Questions to answer:– Where and how should monitored data be stored?

– How can we prevent loss of crucial data?– What infrastructure is needed to generate and

propagate alerts?

– How can physicians provide online feedback?– How can extreme robustness of the monitoring

system be realized?

– What are the security issues and how can the proper policies be enforced?

38

Electronic Healthcare Systems (2)

39

Monitoring a person in a pervasive electronic health care system, using (a) a local hub or (b) a continuous wireless connection.

Wireless Sensor Networks (1)• A network that consists of a large number of low-end

sensor nodes– Each sensor can sense the physical environments and – talk to other sensor nodes via short-range radio communication

• Ideal candidate for applications that interact with environments– Infrastructure monitoring, battlefield surveillance

• Some design questions to answer– How to efficiently route data in the network?– How to aggregate the results to reduce the communication?– What to do when network link fails

40

Wireless Sensor Networks (2)

41

Organizing a sensor network database, while storing and processing data only at the operator’s site

Wireless Sensor Networks (3)

42

Organizing a sensor network database, while storing and processing data only at the sensors.