CIS 573 Software Engineering Carl A. Gunter Fall 1999.

79
CIS 573 Software Engineering Carl A. Gunter Fall 1999

Transcript of CIS 573 Software Engineering Carl A. Gunter Fall 1999.

CIS 573Software Engineering

Carl A. GunterFall 1999

Contact Information

Course Web Page: http://www.cis.upenn.edu/~cis573.

Course announcements. Lecturer: Carl Gunter.

» Office hour: Thursday, 12:30-1:30, 370 Moore, 898-9506

Graduate Assistant: Mike McDougall.» Office hour: Wednesday 1:00-2:00, Moore 057a,

898-8116

What will I learn in the course?

Software engineering generally. Safety engineering as it offers lessons and ideas

for software. General principles for building safety critical

software systems. Techniques to achieve high confidence. How to analyze accidents.

Pre-requisites

Interest in both software and the systems in which it is used.

Programming in Java. Basic skill in mathematics; ability to learn some logic.

What am I Expected to Do?

Participate in classes. Read designated materials. Projects: individually or on a team. Final Exam: assesses understanding of lectures,

reading, and project presentations.

Do Silly Icons Help?

Yes!

Participation and Reading

Slides distributed on course web page Textbook: Leveson, Safeware: System

Security and Computers. Other materials will be distributed.

Projects

Achieving confidence. Verifying software. Specifying software. Coding from a specification. Testing software.

Project Rules

No partner on first project. Groups of two are allowed on all subsequent

projects, but your partner must be different on each project.

Partners provide equal effort on a project.

Verification

Computer hardware and software can be mathematically described.

Hence, computers can be used to automate the verification of computer hardware and software systems.

Verification and Testing

Testing is like verification since each successfully-passed test is like a little theorem that has been proved about the implementation.

Verification has the capacity to cover large sets of cases exhaustively, eliminating the need for coverage conditions or statistical measures of confidence.

Verification by Reading

Testing

Simulation

FormalVerification Walkthrough

Audit

TechnicalReview

SoftwareInspection

Product Project

ManagementReview

From the IEEE Standard for Software Reviews and Audits

Verification and Validation

Verification can be used to show that the software or hardware conforms to a rigorous description of its expected behavior.

It cannot show that the behavior described is the one the user wanted.

Verification: building the system right Validation: building the right system

First Assignments

Reading: Chapters 1,2,3,4 of LevesonProject: Dekker, correctness of mutual

exclusion algorithms

Recommended for Fun and Profit

The Psychology of Everyday Things. Donald A. Norman, Basic Books, 1988.

Peter G. Neumann, Computer Related Risks, Addison Wesley, 1995. Drawn from the bulletin board: news:comp.risks.

Normal Accidents. Charles Perrow, Basic Books, 1984.

The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage. Clifford Stoll, Mass Market Paperback,1995.

Bad Bytes: Why We Should Not Depend on Software. Lauren Ruth Wiener, Addison Wesley, 1993.

What is Risk?

Probability of Failure * Loss from Failure

Mitigate risk by increasing reliability or decreasing severity.

Risk and Opportunity

Many opportunities are held back by the expense of high risk.

Better assurance techniques break these barriers.

When is the risk low enough?

Risk and Opportunity Now

Fuel Injection and anti-lock breaking Fly-by-wire aircraft and computer-controlled

landings Reduced time gaps between trains Credit card purchases on the web and banking

online Online shareholder voting

Risk and Opportunity in the Future

Intelligent vehicles and highways

Electronic wallets

Genetically engineered organisms

Course Strategy

V V&T is a technology for increasing confidence in a system.

Its most vigorous application is in areas where the cost of failure is high.

We will focus primarily on software of this kind.

“High Risk” Computer Systems

Safety critical» Transportation and power systems» Medical and emergency systems

Security critical» Military systems» Electronic commerce

Mission critical» Key information systems» Key control systems

Low Risk Systems

When is a system non-critical? It is subjective and depends on use.

Some software is not strongly backed by its maker. Here is a standard industry disclaimer:

The entire risk as to the quality and performance of the program is with you. Should the program prove defective, you … assume the entire cost of all necessary servicing, repair or correction

Low Risk Systems continued

A refreshingly straightforward disclaimer:

We don't claim EasyFlow is good for anything---if you think it is, great, but it's up to you to decide. If EasyFlow doesn't work; tough. If you lose a million because EasyFlow messes up, it's you that's out the million, not us. If you don't like this disclaimer; tough. We reserve the right to do the absolute minimum provided by law, up to and including nothing.

Rigid Distinctions?

There are significant differences between the classes of high-risk systems.» Analysis of energy for safety systems.» Concept of an adversary in security systems.

But there are also many common themes.» Reliability of components.» Replication.» Backup.» Controlled failure modes.

Clayton Tunnel

Tunnel

A

B

Needle Telegraph

Signal Man

Semaphore

Gerard HoltzmanDesign and Validation of Computer Protocols

Signals In! Train in tunnel. Clear! Tunnel is free. Ok? Has the train left the tunnel?

In!

OK?

Clear!

Buzz!

In!

In!

OK?

Clear!

Classes of Risks

Business Risk» Inadequate consumer interest in product» Standard for product controlled by competitor(s)

Project Risk» Inadequate time» Inappropriate personnel

Operational Failure Risk» Unavailability» Erroneous operation

Risk Factors

Appearance of new hazards Increasing complexity Increasing exposure Increasing amounts of energy Increasing automation of manual operations Increasing centralization and scale Increasing pace of technological change

Acceptable Risk

When is risk low enough? Risk-benefit analysis does not resolve moral

issues. Often the people taking the risk are not the ones

benefiting from the opportunity. Can we walk away from technical opportunity?

Computers as Chameleon Machines

Role of Computers

Providing information or advice to a human operator upon request.

Interpreting data and displaying it to the controller, who makes the control decisions.

Issuing commands directly, but with a human monitor of the computer’s actions providing varying levels of input.

Eliminating the human from the control loop.

Terminology

Operator Process (under computer control) Control Display Sensor Actuator

Four Roles for the Computer

Less Obvious Implications with Indirect Control

What Makes it Hard to Build Software?

Complexity of the functions required. Conformity to existing artifacts and

standards. Changeability of functions required. Invisibility of the software artifact, making it

hard to model or visualize.

Fred BrooksNo Silver Bullet---Essence andAccident in Software Engineering

What Makes Software Different?

Software is primarily a design, with no manufacturing variation, wear, corrosion or aging aspects.

It has a much greater capacity to contain complexity. It is perceived to be easy to change.

Software errors are systematic, not random. It is intangible.

Motor Industry Software Reliability Association (MISRA), Development Guidelines for Vehicle Based Software.

The Curse of Flexibility

The Concept of Causality

The cause of an event is a set of conditions, each of which is necessary and which together are sufficient for the event to occur.

Individual conditions are called causal conditions or factors.

Chemical Process Industry

Classification of hazards:» fire» explosion» toxic release.

Factors influencing risk:» Size of inventory» Energy» Time» Intensity/distance relationship» Exposure

Bhopal

In December, 1984, release of methyl isocyanate (MIC) from Union Carbide chemical plant in Bhopal India resulted in the worst industrial accident in history.

MIC is used in pesticides. Demand for MIC pesticides had dropped after

1981 so plant was experiencing budgetary cutbacks.

Storage of MIC

610 611

Capacity: 60 tonsLimit: half fullTemperature: 0°CPressure: 3psi

50 tons 21 tons

619

1 ton

(thought to contain 20 tons)

(thought to contain 15 tons)

Backups

Vent gas scrubber.Flare tower.Water curtain.Siren.

Events

10.30pm, December 2, 1984. A new worker was cleaning some valves.

11.00pm. Pressure was 10psi, temperature was 20°C. 11.30pm. Leak was discovered, workers notice eye

irritation. 12.40am. 40psi, 25°C, rumbling noise in tank, concrete

casing cracked. Then 400°C, began release of 50,000 pounds of MIC gas.

12.50am-12.55am. Siren sounded when MIC seen escaping from vent stack.

Cause and Effect

2,000 to 3,000 people killed, 10,000 with permanent disabilities, 200,000 injured.

Blamed by management on `human error’. Masking the complexity of causal factors.

Over-simplification

Human error.Technical failures.Organizational factors.Multiplicity of factors.Legal financial responsibility.

Legal View

Cause in fact is established by evidence showing that a defendant’s act or omission was a necessary antecedent to plaintiff’s injury.

Legal (or proximate) cause is a device for limiting liability of a defendant to consequences bearing some reasonable relationship to the risks he or she created.

Legal Cause

Example: car is negligently driven, strikes another car which hits a lamp post, thereby causing a power outage in a region. Legal responsibility may be limited to only part of the total consequences.

Primary classes of limitation» Unforeseen consequences» Intervening causes

Root Causes

DC-10 Cargo Door

In March 1974 a Turkish Airline DC-10 crashed near Paris resulting in 346 deaths.

Flight control cables in the DC-10 are routed under the cabin floor rather than along the airframe.

Cargo hold depressurization could collapse cabin floor. Improperly closed cargo door caused flight control

cables to be cut. Root cause: operator error?

Cabin Floor

Root Causes of Accidents

Flaws in Safety Culture Ineffective Organizational Structure Ineffective Technical Activities

Safety Culture

Discounting riskExcessive reliance on redundancyUnrealistic risk assessment Ignoring high-consequence, low-

probability events

Safety Culture, continued

Assuming risk decreases over timeUnderestimating software-related risks Low priority for safety Ignoring warning signsFlawed resolution of goals.

Designing for Failure

Organizational Activities

Diffusion of responsibility and authorityLack of independence and inadequate

rank of safety personnelLimited communication

Technical Activities

Superficial safety efforts Ineffective risk controlFailure to eliminate basic design flawsBasing safeguards on false

assumptions

Technical Activities, continued

Complexity Using safety devices to reduce safety

margins Inadequate collection and recording of

information Failure to use information Failure to evaluate changes

Flixborough In 1974 an explosion occurred at the Nypro Ltd.

chemical works at Flixborough killing 28 people working at the plant, including all 18 people in the control room.

The plant was making caprolactum, an intermediary product for manufacturing nylon.

The process used cyclohexane, a chemical with properties similar to gasoline.

Plant was under commercial pressure; competitors held the patent on a safer process for making caprolactum.

Events

Six reactors connected by 28-inch pipes were used. An escape was detected in reactor 5.

A change was made to bypass the reactor using 20-inch pipe with a dogleg.

This appeared to work for two months but eventually escaping cyclohexane created a vapor cloud that was ignited by a discharge tower.

Causal Factors

Changes that were not reviewed. Conflicting priorities. Organizational structure. Superficial safety activities.

Seveso In 1976 a cloud of dioxin was produced by the Icmesa

chemical factory in northern Italy and was washed by rain onto the town of Seveso. Numerous people were affected and a large region was contaminated.

Trichlorophenol is used to make bactericides and herbicides.

During processing tetrachlorodibenzodioxine (dioxin) can be produced. Dioxin is very toxic.

Changes were made in the production system to save money. The new process had increased risk of heat release and dioxin formation.

Events

Reaction and distillation cycle was started 10 hours later than usually on a Friday and the reactor was left to run unattended for the weekend.

Heat increased to 450°-500° and created conditions for the production of dioxin.

A valve released a toxic cloud that was carried by rain into Seveso.

Effects were first noticed in burned vegetation and sores on children.

It took some time to recognize that dioxin from the factory was the cause and act on this.

Causal Factors

Changes that were not communicated or reviewed.

Discounting risk. Ineffective safety measures.

» Inadequate warning.» Slow analysis.» Valve release unsafe.

Therac-25

Between June 1985 and January 1987, six patients received massive overdoses from a computer-controlled radiation therapy machine called the Therac-25.

History of the development of the device.» AECL and CGR» Electron and X-ray» Dual mode electron accelerator, Therac 20 and 25

Therac-25 Facility (after Final CAP)

Upper Turntable Assembly

Operator Screen Layout

Hazard Analysis Assumptions

Programming errors have been reduced by extensive testing on a hardware simulator and under field conditions. Residual software errors are not included in the hazard analysis.

Program software does not degrade due to wear, fatigue, or reproduction process.

Computer execution errors are caused by faulty hardware components and by random errors induced by alpha particles and electromagnetic noise.

Events

Kennestone Regional Oncology Center, June 1985

Ontario Cancer Foundation, July 1985 Yakima Valley Memorial Hospital, December

1985 East Texas Cancer Center, March 1986 East Texas Cancer Center, April 1986

Code Blamed for the Tyler Accidents

Program written in PDP-11 assembly language using its own standalone realtime operating system.

Four major components:» Stored data» Scheduler» Critical and non-critical tasks» Interrupt services

Routines in Tyler Accidents

Pseudo-code for Key RoutinesDatent if mode/energy specified then begin calculate table index repeat fetch parameter output parameter point to next parameter until all parameters set call Magnet if mode/energy changed then return end if data entry is complete then set Tphase to 3 if data entry is not complete then if reset command entered then set Tphase to 0 return

Magnet Set bending magnet flag repeat set next magnet call Ptime if mode/energy has changed then exit until all magnets are set return

Ptime repeat if bending magnet flag is set then if editing taking place then if mode/energy has changed then exit until hysteresis delay has expired Clear bending magnet flag

Events, continued

FDA declares the Therac 25 defective, 2 May 1986

Yakima Valley Memorial Hospital, January 1987

Yakima Software Flaw

Causal Factors

Overconfidence in software. Confusing reliability with safety. Lack of defensive design. Failure to address root causes. Inadequate investigation or follow-up on accident

reports. Software reuse. Safe versus friendly user interfaces. Government and user oversight and standards.

Causal Factors, continued

Inadequate software engineering practices.» Software specifications and documentation should

not be an afterthought.» Rigorous quality assurance needed.» Design needs to be simple.» Error detection needed from the beginning.» More than system testing needed.» Error messages and displays need careful design.