CS 603 Review April 24, 2002. Seminar Announcements Saurabh Bagchi, “Hierarchical Error Detection...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of CS 603 Review April 24, 2002. Seminar Announcements Saurabh Bagchi, “Hierarchical Error Detection...
Seminar Announcements
• Saurabh Bagchi, “Hierarchical Error Detection in a Distributed Software Implemented Fault Tolerance (SIFT) Environment”– April 25, 10:30-11:30, MSEE 239
• Fabian E. Bustamante, “The Active Streams Approach to Adaptive Distributed Systems– April 29, 10:30-11:30, CS 101
Review
• Why do we want distributed systems?– Scaling– Heterogeneity– Geographic Distribution
• What is a distributed system?– Transparency vs. Exposing Distribution
• Hardware Basics– Communication Mechanisms
Basic Software Concepts
• Hiding vs. Exposing– Distribution – Distributed OS– Location, but not distribution – Middleware– None – Network OS
• Concurrency Primitives– Semaphores– Monitors
• Distributed System Models– Client-Server– Multi-Tier– Peer to Peer
Communication Mechanisms
• Shared Memory– Enforcement of single-system view– Delayed consistency: δ-Common Storage
• Message Passing– Reliability and its limits
• Stream-oriented Communications
• Remote Procedure Call
• Remote Method Invocation
RPC Mechanisms
• DCE– Language / Platform Independent– Implementation Issues:
• Data Conversion• Underlying Mechanisms
– Fault Tolerance Approaches
• Java RMI• SOAP
– Interoperable– Language independent– Transport independent (anything that moves XML)
Naming Requirements
• Disambiguate only
• Access resource given the name
• Build a name to find a resource
• Do humans need to use name?
• Static/Dynamic Resource
• Performance Requirements
Registry Example: X.500
• Goal: Global “white pages”– Lookup anyone, anywhere– Developed by Telecommunications Industry– ISO standard directory for OSI networks
• Idea: Distributed Directory– Application uses Directory User Agent to
access a Directory Access Point
• Basis for LDAP, ActiveDirectory
Directory Information Base(X.501)
• Tree structure– Root is entire directory– Levels are “groups”
• Country• Organization• Individual
• Entry structure– Unique name
• Build from tree– Attributes: Type/value
pairs– Schema enforces type
rules• Alias entries
X.500
• Directory Entry:– Organization level – CN=Purdue University, L=West
Lafayette– Person level – CN=Chris Clifton, SN=Clifton,
TITLE=Associate Professor
• Directory Operations– Query, Modify
• Authorization / Access control– To directory– Directory as mechanism to implement for others
X.500 – Distributed Directory
• Directory System Agent• Referrals• Replication
– Cache vs. Shadow copy– Access control– Modifications at Master only– Consistency
• Each entry must be internally consistent• DSA giving copy must identify as copy
Clock Synchronization
• Definition: All nodes agree on time– What do we mean by time?– What do we mean by agree?
• Lamport Definition: Events– Events partially ordered– Clock “counts” the order
Event-based definition(Lamport ’78)
Define partial order of processes• A B: A “happened before” B: Smallest
relation such that:1. If A and B in same process and A occurs first, A
B2. If A is sending a message and B is receipt of a
message, A B3. If A B and B C, then A C
• Clock: C(x) is time x occurs:– C(x) = Ci(x) where x running on node i.– Clocks correct if a,b: ab C(a) < C(b)
Lamport Clock Implementation
• Node i Increments Ci between any two successive events
• If event a is sending of a message m from i to j,– m contains timestamp Tm = Ci(a)– Upon receiving m, set Cj ≥ current Cj and > Tm
• Can now define total ordering. a b iff:– Ci(a) < Cj(b)– Ci(a) = Cj(b) and Pi < Pj
What if we want “wall clock” time?
• Ci must run at correct rate: κ << 1 such that | dCi(t)/dt – 1 | < κ
• Synchronized: small ε such that i,j: | Ci(t) – Cj(t) | < ε
• Assume transmission time between μ and μ+ξ• Algorithm: Upon receiving message m,
set Cj(t) = max(Cj(t), Tm+μ)• Theorem: Assume every τ seconds a message
with unpredictable delay ξ is sent over every arc. Then
t ≥ t0 + τd, ε ≈ d(2κτ + ξ)
Clock Synchronization:Limits
• Best Possible: Delay Uncertainty– Actually ε(1 – 1/n)
• Synchronization with Faults– Faulty clock– Communication Failure– Malicious processor
• Worst case: Can only synchronize if < 1/3 processors faulty– Better if clocks can be authenticated
Process Synchronization
• Problem: Shared Resources– Model as sequential or parallel process– Assumes global state!
• Alternative: Mutual Exclusion when Needed– Coordinator approach– Token Passing– Timestamp
Mutual Exclusion
• Requirements– Does it guarantee mutual exclusion?– Does it prevent starvation?– Is it fair?– Does it scale?– Does it handle failures?
Mutual Exclusion:Colored Ticket Algorithm
• Goals:– Decentralized– Fair– Fault tolerant– Space Efficient
• Idea: Numbered Tickets– Next number gets resource– Problem: Unbounded Space– Solution: Reissue blocks
Multi-ResourceMutual Exclusion
• New Problem: Deadlock– Processes using all resources– Each needs additional resource to proceed
• Dining Philosophers Problem– Coordinated vs. truly distributed solutions
• Problems with deterministic solutions• Probabilistic solution – Lehman & Rabin
– Starvation / fairness properties
Distributed Transactions
• ACID properties• Issues:
– Commit Protocols– Fault ToleranceWhy is this enough?
• Failure Models and Limitations• Mechanisms:
– Two-phase commit– Three-phase commit
Two-Phase Commit(Lamport ’76, Gray ’79)
• Central coordinator initiates protocol– Phase 1:
• Coordinator asks if participants can commit• Participants respond yes/no
– Phase 2:• If all votes yes, coordinator sends Commit• Participants respond when done
• Blocks on failure– Participants must replace coordinator– If participant and coordinator fail, wait for recovery
• While blocked, transaction must remain Isolated– Prevents other transactions from completing
Transaction Model
• Transaction Model– Global Transaction State– Reachable State Graph
• Local states potentially concurrent if a reachable global state contains both local states
– Concurrency set C(s) is all states potentially concurrent with s
• Sender set S(s) = {local states t | t sends m and s can receive m}
• Failure Model– Site failure assumed when expected message not
received in time– Independent Recovery
Problems with 2-PC
• Blocking on failure– 3-PC as solution
• Theorems on recovery limits– Independent recovery: No two-site failure– Non-independent recovery
• Anything short of total failure okay• Recovery protocol for total failure
Data Replication
• Fault Tolerance– Hot backup– Catastrophic failure
• Performance– Parallelism– Decreased reliance on network
• Correctness criterion: Replication invisible– One-copy serializability (1SR)
Data Replication: How?
• Goal: Ensure one-copy serializability• Write-all solution: All copies identical
– Write goes to every site– Read from any site– Standard single-copy concurrency control– Guarantees 1SR
• Single-copy concurrency control gives serializable execution
• Equivalent to serial execution where all writes happen in one transaction
Problem: Site Failure
• Failure causes write to block– Must maintain locks– Clogs up entire systemIs this fault tolerance?
• What about “write all available”?– T0: w0[xA] w0[xB] w0[yC] c0
– B-fails– T1: r1[yC] w1[xA] c1
– B-recovers– T2: r2[xB] w2[yC] c2
• What is the serial equivalent order?
Solutions
• Validate availability on commit– Check if any failed writes now available– Check that all sites read or written still available– Enforces serializability for site failures
Doesn’t work with communication failures!
Formalisms for Relaxed consistency
• Goal: Relaxed consistency constraints– Meet application needs– Outperform true transparent replication
• How do we ensure constraints meet needs?– Formalisms to describe application needs– Methods to prove constraints adequate
Quasi-Copies(Alonso, Barbará, Garcia-Molina ’90)
• Data Caching– Each site keeps copy of data likely to be used
locally– Propagation cost of writes high
• User-Defined Cache• Controlled Divergence
– Weak consistency constraints– Bounds on the differences between copies– User defines constraints
Assumptions
• Read-only copies– Updates sent to master copy– E.g., ORACLE Materialized View
• User Specified Coherency– Strict limits– “Hints”
• Example: Stock Purchase– Place order based on delayed price– Limit order to ensure price paid okay
Selection Conditions
• Identification clause– Select/Project Query
• Modifier Clause– Add / drop from cache– Compulsory or advisory cache– Static / Dynamic: As new objects meet the
identification clause, are they cached?• Triggering delay on dynamic
Coherency Conditions
• Default (always enforced): Value was true once• Delay W(x,α): Max time lag• Version V(x): Number of updates• Periodic P(x): Time for refresh• Arithmetic A(x): Bounded Difference• Combine conditions with logical operators• Multi-object conditions
– Consistency conditions on a group– Order of application in a group
Remote Operation Mechanisms
• Client-Server Model:• Remote Procedure CallProblem: Remote Site must already know what we
want to do!• Process consists of:
– Code– Resources (files, devices, etc.)– Execution (data, stack, registers, etc.)
• Fork copies everything– Is this needed?
• Solution: Copy part of the process
So where are we?
• Models for Remote Processing– Server: Request documented service– RPC: Request execution of existing
procedure
• What if operation we want isn’t offered remotely?
• Solution: Agents / Code Migration
Resource Binding
Resource to Machine Binding
Process to Resource Binding
Unattached Fastened Fixed
Identifier Move Global Reference
Global Reference
Value Copy Value Global Reference
Global Reference
Type Rebind Locally
Rebind locally
Rebind Locally
DCOM – What is it?
• Start with COM – Component Object Model– Language-independent object interface
• Add interprocess communication
DCOM:Distributed COM
• Looks like COM to the client• Built on DCE RPC
– Extends to support full COM functionality
Locating Objects:Activation
• CoCreateInstance(Ex)(<CLSID>)– Interface pointer to uninitialized instance– Same as COM
• CoiGetInstanceFromFile, FromStorage– Create new instance
• CoGetClassObject(<CLSID>)– Factory object that creates objects of <CLSID>– CoGetClassObjectFromURL
• Downloads necessary code from URL and instantiates• Can take server name as parameter
– Or default to server specified in DCOM configuration on client machine[HKEY_CLASSES_ROOT\APPID\{<appid-guid>}] "RemoteServerName"="<DNS name>“
• Also store information in ActiveDirectory
DCOM vs. CORBA
CORBA• Single interface name
• Multiple inheritance
• Dynamic Invocation Interface
• C++-style Exception Handling
• Explicit and Implicit reference counts
• Implemented by ORB with replaceable services
DCOM• Distinction between Class
and Instance Identifier• Implement multiple
interfaces• Type libraries for on-
demand marshaling• 32 Bit Error Code
• Explicit reference count only
• Implemented by many independent services
What is .NET?
• Language for distributed computation– C#, VB.NET, JScript
• Protocols– SOAP, HTTP
• Run-time environment– Common Language Runtime (CLR)– ActiveDirectory– Web Servers (ASP.NET)
COM/DCOM .NET
DCOM• IDL
• Name, Monikers• Registry / ActiveDirectory
• C++, Visual Basic• DCE RPC• DCOM Network protocol
(based on DCE standards)
.NET• Web Services Description
Language (WSDL)• DISCO (URI grammar)• Universal Description
Discovery and Integration (UDDI)
• C#, VB.NET• SOAP• HTTP (presumed
ubiquitous), SMTP (!?)
How .NET works
• Query UDDI directory to get service location
• Query service to get WSDL (interface specification)
• Build call (XML) based on WSDL spec.
• Make call using SOAP• Parse XML results based
on WSDL spec.
Jini:Java Middleware
• Tools to construct federation– Multiple devices, each with Java Virtual Machine– Multiple services
• Uses (doesn’t replace) Java RMI• Adds infrastructure to support distribution
– Registration– Lookup– Security
Service
• Basic “unit” of JINI system– Members provide services– Federate to share access to services
• Services combined to accomplish tasks
• Communicate using service protocol– Initial set defined– Add more on the fly
Infrastructure:Key Components
• RMI– Basic communication model
• Distributed Security System– Integrated with RMI– Extends JVM security model
• Discovery/join protocol– How to register and advertise services
• Lookup services– Returns object implementing service (really a local
proxy)
Programming Model
• Lookup
• Leasing– Extends Java reference with notion of time
• Events– Extends JavaBeans event model– Adds third-party transfer, delivery and
timeliness guarantees, possibility of delay
• Transaction Interfaces
Jini Component Categories
• Infrastructure – Base features• Programming Model – How you use them• Services – What you build
Java / Jini Comparison
Failure Models
• Failure: System doesn’t give desired behavior– Component-level failure (can compensate)– System-level failure (incorrect result)
• Fault: Cause of failure (component-level)– Transient: Not repeatable– Intermittent: Repeats, but (apparently) independent
of system operations– Permanent: Exists until component repaired
• Failure Model: How the system behaves when it doesn’t behave properly
Failure Model(Flaviu Cristian, 1991)
• Dependency– Proper operation of Database depends on proper
operation of processor, disk
• Failure Classification– Type of response to failure
• Failure semantics– State of system after given class of failure
• Failure masking– High-level operation succeeds even if they depend on
failed services
Failure Classification
• Correct– In response to inputs, behaves in a manner consistent with the
service specification
• Omission Failure– Doesn’t respond to input
• Crash: After first omission failure, subsequent requests result in omission failure
• Timing failure (early, late)– Correct response, but outside required time window
• Response failure– Value: Wrong output for inputs– State Transition: Server ends in wrong state
Crash Failure types(based on recovery behavior)
• Amnesia– Server recovers to predefined state independent of
operations before crash
• Partial amnesia– Some part of state is as before crash, rest to
predefined state
• Pause– Recovers to state before omission failure
• Halting– Never restarts
Failure Semantics
• Specification for service must include– Failure-free (normal) semantics– Failure semantics (likely failure behaviors)
• Multiple semantics– Combine to give (weaker) semantics– Arbitrary failure semantics: Weakest possible
• Choice of failure semantics– Is class of failure likely?
• Probability of type of failure
– What is the cost of failure• Catastrophic?
Failure Masking
• Hierarchical failure masking– Dependency: Higher level gets (at best) failure
semantics of lower level– Can compensate for lower level failure to improve this
• Group Failure Masking– Redundant servers– Allows failure semantics of group to be higher than
individuals
• k-fault tolerant– Group can mask k concurrent group member failures
from client
Fault Tolerance• A distributed program A is said to tolerate faults from a fault class F for an
invariant P iff there exists a predicate T for which:1. At any configuration where P holds, T also holds (i.e., P T)2. Starting from any state where T holds, if any actions of A or F are executed, the
resulting state will always be one in which T holds (i.e., T is closed in A and T is closed in F)
3. Starting from any state where T holds, every computation that executes actions from A alone eventually reaches a state where P holds
• If a program A tolerates faults from a fault class F for invariant P, we say that A is F-tolerant for P.
Forms of fault tolerance
• For each entry, determine:– F: Fault class
handled– T: Set of states that
can be reached
Live Not live
Safe Masking Fail safe
Not safe Nonmasking none
Reliable Multicast
• Classes:
• Sender-initiated: Acknowledge all packets– Scales poorly in normal operation
• Receiver-initiated: Request missing packets– Sender doesn’t need receiver list– Scales poorly on failure (cascading failure?)
• Tree-based, Ring-based protocols
Disaster Recovery
• Problem: complete failure at single site– Must have multiple sites– Thus a distributed problem
• Two examples– Distributed Storage: Palladio
• Think wide-area RAID
– Distributed Transactions: Epoch algorithm
Epoch Algorithm (Garcia-Molina, Polyzois, and Hagmann 1990)
• 1-Safe backup– No performance
penalty
• Multiple transaction streams– Use distribution to
improve performance
• Multiple Logs– Avoid single bottleneck
Algorithm Overview
• Idea: Transactions that can be committed together grouped into epochs
• Primaries write marker in log– Must agree when safe to write marker– Keep track of current epoch number– Master broadcasts when to end epoch
• Backups commit epoch when all backups have received marker
Correctnes Criteria
• Atomicity: If any writes of a transaction appear at backup, all must appear– If W(Tx, d) at backup then
W(Tx, d’), W(Tx, d’) exists at backup
• Consistency: If Ti Tj at primary, then – Local: Tj installed at backup Ti installed at backup– Mutual: If W(Ti, d) and W(Tj, d), then
W(Ti, d) W(Tj, d)
• Minimum Divergence: If Tj is at the backup and does not depend on a missing transaction, then it should be installed at the backup
Single-Mark Algorithm
• Problem: Is it locally safe to mark when broadcast received?– Might be in the middle of a transaction
• Solution: Share epoch at commit– Prepare to commit includes local epoch number– If received number greater than local, end epoch
• At Backup: When all sites have epoch ○n, Commit transactions where– C(Ti) ○n
– P(Ti) ○n, local site is not coordinator, and coordinator has C(Ti) ○n
Test Basics
• Mechanics: Open book/notes– No electronic aids
• Two questions– Each multi-part– Will include scoring suggestions
• Underlying question: Do you understand the material?– No need to regurgitate “best in literature” answer
• Reasonable self-designed solution fine
– Key: Do you really understand your answer• Can you build CORRECT distributed systems?