12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington,...

19
1 2004 MAPLD Aerospace Mishaps and Lessons Learned Aerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004

Transcript of 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington,...

Page 1: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

12004 MAPLD Aerospace Mishaps and Lessons Learned

Aerospace Mishaps and Lessons Learned

2004 MAPLD International Conference

Washington, D.C.

September 7, 2004

Page 2: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

22004 MAPLD Aerospace Mishaps and Lessons Learned

"... most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices."

Nancy Leveson in Safeware, 1995.

Page 3: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

32004 MAPLD Aerospace Mishaps and Lessons Learned

Seminar ProgramTime Speaker Affiliation Mishap Title

9:00 Richard KatzNASA Office of Logic Design

Introduction

9:15 Faith Chandler NASA HQ Using Root-Cause Analysis to Understand Failures

10:00 Jonathan F Binkley Aerospace Corp. The Space System Engineering Database (SSED)

10:45 BREAK    

11:00 Owen Brown DARPA Apollo 13 Mishap

12:00 Kathryn Anne Weiss MIT An Analysis of Causation in Aerospace Accidents

12:45 LUNCH    

1:30 Susan C. Lee JHU/APLThe Near Earth Asteroid Rendezvous (NEAR) Rendezvous Burn Anomaly

2:45 Rick Obenschain NASA GSFC SEASAT: Lessons Learned and Not Learned

3:30 BREAK    

3:45 Keith E. Van Tassel NASA JSC STS-86/SAFER

4:30 Paul Cheng Aerospace CorpAerospace 100 Questions That Should Be Asked During Technical Reviews

5:15 Keith Avery Mission Research Corp. STRV-1c/1d Mishap

6:00 SESSION ENDS    

Page 4: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

42004 MAPLD Aerospace Mishaps and Lessons Learned

Training vs. Education

• The NASA Office of Logic Design works to educate design engineers, not train them.– Training promotes rote responses– Education promotes thinking and the ability to

adapt to and cope with new situations.

• Hence, MAPLD hosts seminars and not training sessions.

Page 5: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

52004 MAPLD Aerospace Mishaps and Lessons Learned

Design Seminars• These case studies are real and are not contrived examples.

Many of the leaders have first hand knowledge of these mishaps.

• Contribute: Discuss the topics presented, disagree with them, present interesting cases you wish to share, additional lessons, or alternative viewpoints.

• Do not sit there quietly and expect to be treated like a cocker spaniel being trained and drilled to emit Pavlovian responses in response to stimuli (bell for dogs, donuts for engineers).

Page 6: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

62004 MAPLD Aerospace Mishaps and Lessons Learned

Material

• Material will be made available on– CD-ROM– Hardcopy– klabs.org

• All public domain, you may use the material as you wish.

Page 7: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

72004 MAPLD Aerospace Mishaps and Lessons Learned

I Was Reading AW&ST …

Aviation Week & Space Technology, August 23/30, 2004, pp. 29-30

Page 8: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

82004 MAPLD Aerospace Mishaps and Lessons Learned

Barto's Law: Every circuit is considered guilty until proven innocent.

Page 9: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

92004 MAPLD Aerospace Mishaps and Lessons Learned

A Recent Mishap(that gave me the idea for this seminar)

Page 10: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

102004 MAPLD Aerospace Mishaps and Lessons Learned

Background

• Popular single board computer

• Everything was working fine

• Ran vibration test– Unpowered and unmonitored

• Subsequently failed to boot intermittently– Testing at manufacturer’s also showed

intermittent failures, although at a lower rate than observed at the contractor.

Page 11: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

112004 MAPLD Aerospace Mishaps and Lessons Learned

Project’s Corrective Action

• Unit (S/N 031) pulled from the flight instrument

• New unit (S/N 034) installed in the flight instrument

• Repeated testing with the new unit was successful

• Signed off, ready for launch

Page 12: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

122004 MAPLD Aerospace Mishaps and Lessons Learned

Risk Reduction Effort• Reviewed problem/failure report

– No root cause or failure mechanism identified

– Conclusion of the Verification and Analysis Section stated:

– No direct or indirect evidence given in the “Verification and Analysis” section to support a workmanship issue.

– No analysis given to show that the workmanship problem was not systemic to all units. Since the unit is clearly marginal and it is difficult to make fail, it is not shown that other units have sufficient margin to support operation in all operating environments over the design life of the unit. …

Each time there was a failure to boot, the power was cycled and the computer subsequently rebooted. The result of the testing at XXXXXX was that the most probable cause of the boot failure was a workmanship issue specific to SN034 and is not endemic to the XXXXXXXX computer and therefore does not affect SN031.

Page 13: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

132004 MAPLD Aerospace Mishaps and Lessons Learned

Risk Reduction Effort– Note: the “analyst” consistently remarks that after a

failed boot the next power cycle results in correct operation of the board. Yet the board fails multiple times. This is evidence of the “PC mentality” seen in many Projects where, when there is a problem, the solution is to switch the power off and back on to “correct it.”

– Contractor and Project claimed repeatedly that the unit was troubleshot and nothing more could be done.

Page 14: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

142004 MAPLD Aerospace Mishaps and Lessons Learned

Let’s Take a Closer Look

• Examination of failures at manufacturer– The failures reported were a result of test equipment;

there was zero failures detected at the manufacturer

– Intermittent operation of the computer could not be supported. Electrical environment suspicion grows

– “What if” analysis results in a large number of possible failure mechanisms

Page 15: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

152004 MAPLD Aerospace Mishaps and Lessons Learned

Let’s Take a Closer Look

• Examination of troubleshooting at contractor– Previously claimed fully troubleshot

– Examination shows that no oscilloscope probe ever touched the board

• Examined at interface points only

– Throughout organization “failures to boot” were routine

• Many failures reports written over many units.

– Contractor did not use available diagnostic signals and port to ascertain status of the CPU and computer

Page 16: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

162004 MAPLD Aerospace Mishaps and Lessons Learned

Troubleshooting Again

• Contractor fought hard to prevent– Stalled effort for many months

• Initial examination showed that the protection signals for the EEPROM memories did not behave as predicted by the analysis– Contractor would not show the analysis

• Examination of diagnostic signals quickly showed that the CPU had halted

Page 17: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

172004 MAPLD Aerospace Mishaps and Lessons Learned

Troubleshooting Results

• Cause of failure determined– Known issue with pipeline timing

– Software service routines not installed to handle all conditions

– Project previously had assured the independent review that software was installed to handle all conditions

• Did not fail at manufacturer since test software installed properly handled the interrupt from the pipelining issue

• No support for “a workmanship issue specific to SN034 …”

• Flight software rewritten

Page 18: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

182004 MAPLD Aerospace Mishaps and Lessons Learned

Lessons and Suggestions

• Problem/Failure Reports– Examine original documents.

– Request and examine all related P/FRs from all units

• Provide direct evidence (at a minimum!) for determination of the cause of failure– Intermittent’s after vibration test led to the conclusion of a

workmanship error; the “bad solder joint” was never identified

– “Failures” at the manufacturer reinforced the false conclusion as those “failures” were not examined in detail and were a result of a testing error.

• Do not conduct reviews in a board room with PowerPoint slides– Pack up your oscilloscope and go into the lab

Page 19: 12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

192004 MAPLD Aerospace Mishaps and Lessons Learned

Enjoy your seminar!