Design of Distributed Real-Time Systems Ramani Arunachalam.
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
1
Transcript of Design of Distributed Real-Time Systems Ramani Arunachalam.
Case Study: MARS
● MARS (Maintainable Real-time system)– Distributed, fault-tolerant, hard real-time– Objectives
● Guaranteed timeliness● Testability● Maintainability● Fault-tolerance● Systematic software development
– Time-triggered architecture
Objectives● Guaranteed timeliness
– Based on resource adequacy at peak load– Statistical assurances not enough
● Testability– Architecture should support testability of timeliness
● Maintainability– Needed to remedy hardware faults, design errors and
respond to change requests– Localized consequences -> minimized effort
Objectives
● Fault Tolerance– Redundancy– On-line maintenance
● Systematic software development– No 'trial and error' integration– OS guarantees predictable temporal behaviour
State View
● Time Triggered observation of states– Observe RT entities at predefined intervals
● Intelligent input output– Observation grid– Intelligent sensor
● Preprocesses raw data from input device ● observes at finer granularity called Perception granularity
State View
● Intelligent actuator – Post-processes data from computer system before
sending to output device● State Messages
– Produced at observation points– Minimal synchronization requirement – No need for buffer management– Unidirectional (from RT entity)
Structure● Clusters
– Autonomous subsystems– Disjoint name spaces– State message exchanges– Composed of Fault-tolerant units (FTUs)– Real-time communication channel (TDMA)
● FTU– Composed of replicated components– Active and shadow components
Structure● Component
– Smallest replaceable unit– Fail-silent (Correct results or none)– Termination upon failure
● Task Execution
– Task : Software inside component– Starts at predefined time– Proceeds without any communication or
synchronization– Execution time is deterministic
Operation
● Results of periodic tasks sent as state messages● Execution time of communication is also
predefined● A Real-time transaction is a progression of
processing and communication actions between a stimulus from and a response to the environment.
● Static scheduling (at compile time!)● At run-time, no surprises● Modes (operating, emergency)
Fault-tolerance
● Two levels of redundancy● Active redundancy at FTU level
– If a component fails, standby becomes active
● Time redundancy at component level– Every task is executed twice and results compared
● TDMA monitor– Monitors temporal behaviour
– Controls the output from component
● Distributed clock synchronization
Fault-tolerance● Replica determinism
– All replicated components perform the same state changes at the same point in time
– Prohibit reading of local time– All replicas should agree when to change mode
● Component reintegration– i-state, h-state– Reintegration point: when size of h-state is small– New component gets the h-state at this point
Summary● Maintenance
– Failed component doesn't affect FTU– On-line reintegration after repair– Change in software
● Does it fit in current schedule?● Otherwise, new mode with new schedule
● Summary– Strict separation of functionality, timeliness and
dependability.– Designed for temporal behaviour, testing simplified.
Delta-4 XPA● Objectives
– “A real-time system is not assured to meet deadlines outside operational envelope”
– Bounded-demand school● operational envelope is predictable● Impractical assumption for complex systems
– Unbounded-demand school● Complete definition of operational envelope is not possible● Graceful degradation if it falls outside the envelope
– XPA implements hard real-time but falls into best-effort behaviour when required.
DELTASE
Group management Layer
Time and Group communication
Abstract network layer(physical + MAC+ firmware)
Architecture
● Network infrastructure– FDDI supports urgent traffic, built-in fault tolerance– Token bus/ring has media redundancy for availability
● Time– Internal time maintained by distributed time server– Clocks synchronized to tens of microseconds– External time – one of the standard time
● Group communication– Services from atomic multicast to datagram– Very fast services of varying reliability
Architecture
● Group communication– Distributed replication management
● BestEffortN – guarantee delivery to N elements● BestEffortTo - guarantee delivery to named elements● AtLeastN, atLeastTo – guaranteed service even when
sender fails
● Group management– Distributed Group manager object– Management and distribution of groups of objects– Incorporates knowledge of various modes of
replication
Architecture
● Application support environment (Deltase)– Client-server and producer-consumer interactions– Apps written using deltase or converted using
preprocessors● Timeliness
– What to do under overload conditions?● Static off-line scheduling – too many possibilities● On-line scheduling – can find feasible schedules if not
overload.
Timeliness● Scheduling policy uses “precedence”
– Combination of priority and earliest-deadline– Few priority classes to avoid unfairness– Within priority class, earliest-deadline-first.
● Design-time and run-time timeliness– Targetline : instant chosen by designer for provision
of service– Liveline and deadline: earliest and latest time at
which service may be provided– Violation of these detected at runtime and design-time
actions defined.
Preemption
● Leader-follower model for replication– Decisions made by a privileged replica i.e. Leader– Preemption point
● Point at which an interrupt will be served
– High precedence msg arrives for a process not running currently
● Increase the process's precedence to that of msg● Causes the process to be scheduled● These actions propogated to followers● Followers perform identical operations
Desynchronization
● Followers must not be too apart from leaders● Followers too fast
– Reach the preemption point before leader– remain blocked until leader notifies
● Followers too slow– Leader timestamps notifications– If follower didn't execute the action by T+t(desync)
● Desynchonization event raised● Another follower takes over
Summary
● Communication support using groups– Oriented to distributed computing
● Tradeoffs between QOS and efficiency– Group mgr uses atomic multicast for orderly delivery– Leader-follower uses reliable, non-ordered delivery
● Group management service– Executes leader-follower, detects replica failure– Clone the replica at another node.