CS514: Intermediate Course in Operating Systems

46
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 2: August 29

description

CS514: Intermediate Course in Operating Systems. Professor Ken Birman Ben Atkin: TA Lecture 2: August 29. Overview of Lecture. Fundamentals: terminology and components of a reliable distributed computing system Communication technologies and their properties Basic communication services - PowerPoint PPT Presentation

Transcript of CS514: Intermediate Course in Operating Systems

Page 1: CS514: Intermediate Course in Operating Systems

CS514: Intermediate Course in Operating

SystemsProfessor Ken Birman

Ben Atkin: TALecture 2: August 29

Page 2: CS514: Intermediate Course in Operating Systems

Overview of Lecture

• Fundamentals: terminology and components of a reliable distributed computing system

• Communication technologies and their properties

• Basic communication services• Internet protocols• End-to-end argument

Page 3: CS514: Intermediate Course in Operating Systems

Some terminology

• A program is the code you type in• A process is what you get when you run it• A message is used to communicate between

processes. Arbitrary size.• A packet is a fragment of a message that

might travel on the wire. Variable size but limited, usually to 1400 bytes or less.

• A protocol is an algorithm by which processes cooperate to do something using message exchanges.

Page 4: CS514: Intermediate Course in Operating Systems

More terminology

• A network is the infrastructure that links the computers, workstations, terminals, servers, etc.– It consists of routers – They are connected by communication links

• A network application is one that fetches needed data from servers over the network

• A distributed system is a more complex application designed to run on a network. Such a system has multiple processes that cooperate to do something.

Page 5: CS514: Intermediate Course in Operating Systems

A network is like a “mostly reliable” post office

Page 6: CS514: Intermediate Course in Operating Systems

Why isn’t it totally reliable?

• Links can corrupt messages– Rare in the high quality ones on the Internet

“backbone”– More common with wireless connections,

cable modems, ADSL

• Routers can get overloaded– When this happens they drop messages– As we’ll see, this is very common

• But protocols that retransmit lost packets can increase reliability

Page 7: CS514: Intermediate Course in Operating Systems

How do distributed systems differ from network applications?

• Distributed systems may have many components but are often designed to mimic a single, non-distributed process running at a single place.

• “State” is spread around in a distributed system

• Networked application is free-standing and centered around the user or computer where it runs. (E.g. “web browser.) Distributed system is spread out, decen-tralized. (E.g. “air traffic control system”)

Page 8: CS514: Intermediate Course in Operating Systems

What about the Web?

• Browser is independent: fetches data you request when you ask for it.

• Web servers don’t keep track of who is using them. Each request is self-contained and treated independently of all others.– Cookies don’t count: they sit on your machine– And the database of account info doesn’t count either…

this is “ancient” history, nothing recent

• ... So the web has two network applications that talk to each other– The browser on your machine– The web server it happens to connect with… which has

a database “behind” it

Page 9: CS514: Intermediate Course in Operating Systems

You and the Web

DatabaseWeb browser with

stashed cookies

Cookie identifies this user, encodes past

preferences

HTTP request

Web servers are kept current by the database but usually don’t talk to it

when your request comes in

Page 10: CS514: Intermediate Course in Operating Systems

You and the Web

Reply updates cookie

Web servers immediately forget the interaction

Page 11: CS514: Intermediate Course in Operating Systems

You and the Web

Purchase is a “transaction” on the database

Web servers have no memory of the interaction

Page 12: CS514: Intermediate Course in Operating Systems

Examples of Distributed Systems

• Air traffic control system with workstations for the controllers

• Banking/brokerage trading system that coord-inates trading (risk management) at multiple locations

• Factory floor control system that monitors devices and replans work as they go on/offline

Page 13: CS514: Intermediate Course in Operating Systems

This course is about reliability

• We want to build distributed systems that can be relied upon to do the correct thing and to provide services according to the user’s expectations

• Not all systems need reliability– If a web site doesn’t respond, you just try again later– If you end up with two wheels of brie, well, throw a party!

• Reliability is a growing requirement in “critical” settings but these remain a small percentage of the overall market for networked computers

Page 14: CS514: Intermediate Course in Operating Systems

Reliability is a broad term

• Fault-Tolerance: remains correct despite failures• High or continuous availability: resumes service after

failures, doesn’t wait for repairs• Performance: provides desired responsiveness• Recoverability: can restart failed components• Consistency: coordinates actions by multiple

components, so they mimic a single one• Security: authenticates access to data, services• Privacy: protects identity, locations of users

Page 15: CS514: Intermediate Course in Operating Systems

“Failure” also has many meanings

• Halting failures: component simply stops• Fail-stop: halting failures with notifications• Omission failures: failure to send/recv.

message• Network failures: network link breaks• Network partition: network fragments into two

or more disjoint subnetworks• Timing failures: action early/late; clock fails,

etc.• Byzantine failures: arbitrary malicious behavior

Page 16: CS514: Intermediate Course in Operating Systems

Examples of failures

• My PC suddenly freezes up while running a text processing program. No damage is done. This is a halting failure

• A network file server tells its clients that it is about to shut down, then goes offline. This is a failstop failure. (The notification can be trusted)

• An intruder hacks the network and replaces some parts with fakes. This is a Byzantine failure.

Page 17: CS514: Intermediate Course in Operating Systems

More terminology

• A real-world network is what we work on. It has computers, links that can fail, and some problems synchronizing time. But this is hard to model in a formal way.

• An asynchronous distributed system is a theoretical model of a network with no notion of time

• A synchronous distributed system, in contrast, has perfect clocks and bounds all all events, like message passing.

Page 18: CS514: Intermediate Course in Operating Systems

Model we’ll use?

• Our focus is on real-world networks, halting failures, and extremely practical techniques

• The closest model is the asynchronous one; we use it to reason about protocols– Most often, employ asynchronous model to illustrate

techniques we can actually implement in real-world settings

– And usually employ the synchronous model to obtain impossibility results

– Question: why not prove impossibility results in an asynchronous model, or use the synchronous one to illustrate techniques that we might really use?

Page 19: CS514: Intermediate Course in Operating Systems

ISO protocol layers: Oft-cited Standard

Application The program using a communication connection

Presentation Software to encode data into messages, and decode on reception

Session Logic associated with guaranteeing end-to-end reliability andflow control, if desired

Transport Software for fragmenting big messages into small packets

Network Routing functionality, limited to small packets

Data-link The protocol that represents packets on the wire

• ISO is tied to a TCP-style of connection• Match with modern protocols is poor•We are mostly at “layer 4” – session

Page 20: CS514: Intermediate Course in Operating Systems

Internet protocol suite

• Can be understood in terms of ISO• Defines “addressing” standard, basic

network layer (IP packets, limited to 1400 bytes), and session protocols (TCP, UDP, UDP-multicast)

• Includes standard “domain name service” that maps host names to IP addresses

• DNS itself is tree-structured and caches data

Page 21: CS514: Intermediate Course in Operating Systems

Major internet protocols

• TCP, UDP, FTP, Telnet• Email: Simple Mail Transfer Protocol (SMTP)• News: Network News Transfer Protocol (NNTP)• DNS: Domain name service protocol• NIS: Network information service (a.k.a. “YP”)• LDAP: Protocol for talking to the management

information database (MIB) on a computer• NFS: Network file system protocol for UNIX• X11: X-server display protocol• Web: HyperText Transfer Protocol (HTTP), and SSL (one

of the widely used security protocols)

Page 22: CS514: Intermediate Course in Operating Systems

Typical hardware options

• Ethernet: 10Mbit CSMA technology, limited to 1400 byte packets. Uses single coax cable.

• FDDI: twisted pair, self-repairing if cable breaks

• Bridged Ethernet: common in big LAN’s, ring with multiple ethernet segments

• Fast Ethernet: 100Mbit version of ethernet • ATM: switching technology for fiber optic

paths. Can run at 155Mbits/second or more. Very reliable, but mostly used in telephone systems.

Page 23: CS514: Intermediate Course in Operating Systems

Implications for reliability?

• Protocol designers have problems predicting the properties of local-area networks

• Latencies and throughput may vary widely even in a single installation

• Hardware properties differ widely; often, must assume the least-common-denominator

• Packet loss a minor problem in hardware itself

Page 24: CS514: Intermediate Course in Operating Systems

Technology trends

Source: Scientific American, Sept. 1995

0

100

200

300

400

500

600

700

1985-1990

1990-1995

1995-2000

2000-2005

CPU MIPS

Memory MB

LAN Mbits

WAN Mbits

O/S overhead

Note tremendous growth in WAN speeds

Page 25: CS514: Intermediate Course in Operating Systems

Typical latencies (milliseconds)

0.01

0.1

1

10

100

1000

1985

-199

0

1990

-199

5

1995

-200

0

2000

-200

5

Disk I/O

EthernetRPC

ATMroundtrip

WANroundtrip

Note dramatic drop inLAN latencies over ATM

WAN, disk latencies arefairly constant due to physical limitations

Page 26: CS514: Intermediate Course in Operating Systems

O/S latency: the most expensive overhead on LAN communication!

05

10

1520253035

40

1985-

1990

1995-

2000

O/Soverhead aspercentage

Page 27: CS514: Intermediate Course in Operating Systems

Broad observations

• A discontinuity is currently occuring in WAN communication speeds!

• Other performance curves are all similar• Disks have “maxed out” and hence are looking

slower and slower• Memory of remote computers looks “closer

and closer”• O/S imposed communication latencies has

risen in relative terms over past decade!

Page 28: CS514: Intermediate Course in Operating Systems

Implications?

• The revolution in WAN communication we are now seeing is not surprising and will continue

• Look for a shift from disk storage towards more use of access to remote objects “over the network”

• O/S overhead is already by far the main obstacle to low latency and this problem will seem worse and worse unless O/S communication architectures evolve in major ways.

Page 29: CS514: Intermediate Course in Operating Systems

More Implications

• Look for full motion video to the workstation by around 2005 or 2010

• Low LAN latencies: an unexploited “niche”• One puzzle: what to do with extremely high

data throughput but relatively high WAN latencies

• O/S architecture and whole concept of O/S must change to better exploit the “pool of memory” of a cluster of machines; otherwise, disk latencies will loom higher and higher

Page 30: CS514: Intermediate Course in Operating Systems

Reliability and performance

• Some think that more reliable means “slower”– Indeed, it usually costs time to overcome failure– For example, if a packet is lost probably need to resend it,

and may need to solicit the retransmission

• But for many applications, performance is a big part of the application itself: too slow means “not reliable” for these!

• Reliable systems thus must look for highest possible performance

• ... but unlike unreliable systems, they can’t cut corners in ways that make them flakey but faster

Page 31: CS514: Intermediate Course in Operating Systems

Back to the internet: IP layer

• Addresses have a machine address and a “port” number. The port selects the application when a packet arrives. (If none is bound to that port, packet is dropped)

• Fixed maximum size of 1400 bytes• Each machine can have multiple addresses• Special “broadcast” address delivered to all• “Class D” addresses used for multicast groups• Running out of addresses, so tricks with

addresses are increasingly common

Page 32: CS514: Intermediate Course in Operating Systems

IP gotcha’s

• IP messages are not authenticated in any way. The sender can lie about who it is, and can send to any host or port on the network

• A system can lie about its own machine address

• IP messages are not reliable. They can be (and are) dropped at many stages

• IP routing: basically static these days

Page 33: CS514: Intermediate Course in Operating Systems

IP multicast

sends to: “123.45.87.51”

both blue machines acceptIP address “123.45.87.51”

• Key insight: IP address is just an abstraction.• Any machine can potentially “spoof” any other!

Page 34: CS514: Intermediate Course in Operating Systems

UDP protocol

• Lives above IP and looks much like IP• Permits larger packets (fragments them

into a burst of IP packets; if any is lost UDP packet will be dropped on receive side)

• Also can run in a multicast mode• Most applications use UDP; IP layer is

typically “reserved” for kernel-level applications

Page 35: CS514: Intermediate Course in Operating Systems

UDP loss rates can be very high!

• Hunt experimented with this (book reproduces some of his data)

• UDP is normally very reliable• If sender overruns receiver, even

briefly, 100% of data may be lost!• Easy to provoke this problem even with

source, dest on same machine!!!• O/S makes no effort to detect or avoid

loss!

Page 36: CS514: Intermediate Course in Operating Systems

TCP protocol

• Implemented over IP, considered “reliable”• Supports a byte-stream model, like a pipe.• Implemented using sliding-window protocol• Many variations on protocol to optimize

performance, window size, reduce header size, etc. We’ll focus on the “most basic” TCP protocol here and won’t look at the optimizations

Page 37: CS514: Intermediate Course in Operating Systems

TCP sliding window

mi+k mi+k-1 .... mi

- - mi+k-2 - mi+k-3 ... mi

sender provides data

receiver consumes data

IP packets carry segments

window has k “segments”

receiver replies with acks and nacks. sender resends missing data

Page 38: CS514: Intermediate Course in Operating Systems

Observations

• Trick is to have a big enough window so that data flow is continuous. This permits sender, receiver to “match” their data rates

• Must retransmit as soon as possible but not so soon that duplicates get through (undesirable overhead)

• Channel “breaks” after many retries, i.e. after a crash or if the network gets “lossy” for a while

Page 39: CS514: Intermediate Course in Operating Systems

TCP failures: not “failstop”

• Many applications treat a broken TCP channel as a sign of remote crash

• But in fact, a TCP channel can easily break under network overload or due to other transient conditions!

• Applications would then believe destination to have failed when it is actually still running.

Page 40: CS514: Intermediate Course in Operating Systems

A basic insight

• In fact, there is no way to know if a message has reached its destination unless it is explicitly acknowledged

• And if it is, the sender of the ack. has no way to know if it was received!

• Distributed systems are always slightly out of sync! This will be a very big issue for us later

Page 41: CS514: Intermediate Course in Operating Systems

Most distributed systems use UDP or TCP

• Applications typically lack permission to use IP (otherwise could “break” TCP protocol)

• UDP multicast is hard to use because it isn’t always available and information on hardware layout of a LAN is not available in any standard format or from any standard service

• Under heavy load, both UDP and TCP begin to misbehave (high loss rates, broken channels, etc)

Page 42: CS514: Intermediate Course in Operating Systems

End-to-End argument

• Suppose an IP packet will take n hops to its destination, and can be lost with probability p on each hop

• Now, say that we want to transfer a file of k records that each fit in one IP (or UDP) packet

• Should we use a retransmission protocol running “end-to-end” or n TCP protocols in a chain?

Page 43: CS514: Intermediate Course in Operating Systems

End-to-End argument

source

dest

Probability of successful transit: (1-p)n,Expected packets lost: k-k*(1-p)n

Page 44: CS514: Intermediate Course in Operating Systems

Saltzer et. al. analysis

• If p is very small, the overhead of the n TCP protocols (excess bytes sent, lost CPU time, etc) will be higher than if we just send the whole file, then retransmit the missing records

• Generalization: low-level transport systems should focus on speed, not reliability; the application layer should worry about “properties” needed by the application

Page 45: CS514: Intermediate Course in Operating Systems

Example

• Suppose that 2% of file will be resent. Wouldn’t want to impose a 10% overhead on the net and slow the transfer down by 7% for this purpose!

• But the end-to-end argument would not apply if:– p or n is large, hence (1-p)n approaches 0– cost of recovery when a problem occurs is very high– reliability property is hard for users to implement

• Justify complex mechanisms against these points.

Page 46: CS514: Intermediate Course in Operating Systems

For next time

• Read chapters 1-3• If you were in charge of extending the

communication layer, what would you change or add?

• If computers can set their own addresses and fill out packets in any way they like, how can a system like UNIX possibly support user-id’s over a network? Or are user-id’s a big fake?