trouble shooting methodology

36
TROUBLESHOOTING GENERALLY CONSISTS OF THE FOLLOWING STEPS. DIFFERENT METHODOLOGIES MAY CALL THEM BY SLIGHTLY DIFFERENT NAMES, BUT THE SIMILARITIES ARE PRETTY OBVIOUS. Troubleshooting Methodology 

Transcript of trouble shooting methodology

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 1/36

T R O U B L E S H O O T I N G G E N E R A L L Y C O N S I S T S O F T H EF O L L O W I N G S T E P S . D I F F E R E N T M E T H O D O L O G I E S M A Y C A L L

T H E M B Y S L I G H T L Y D I F F E R E N T N A M E S , B U T T H ES I M I L A R I T I E S A R E P R E T T Y O B V I O U S .

Troubleshooting Methodology 

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 2/36

Troubleshooting Methodology 

Investigation

 Analysis

Implementation

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 3/36

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 4/36

 Analysis

Brainstorm: Gather Hypotheses: What might have caused theproblem?

Identify Likely Causes: Which hypotheses are most likely?

Test Possible Causes: Schedule the testing for the most likely hypotheses. Performanynon-disruptivetestingimmediately.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 5/36

Implementation

Implement the Fix: Complete the repair.

 Verify the Fix: Is the problem really fixed?

Document the Resolution: What did we do? Get a sign-off fromthe system owner.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 6/36

Problem statement

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 7/36

Problem statement

y The problem statement must be broad enough todescribe the problem, but narrow enough to focusthe investigation. It should not contain value

 judgements. It should be a factual answer to thequestion "What is wrong?´

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 8/36

Problem description

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 9/36

Problem description

y Gather all symptoms, including error messages, coredumps, descriptions of any service outages, andcontrasting descriptions of what still works. As near

as possible, we need to identify the time of theincident.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 10/36

Identify Differences and Changes

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 11/36

Identify Differences and Changes

y Identify differences between the faulted system andany similar working systems. Also identify any recentchanges to the system.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 12/36

Brainstorm

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 13/36

Brainstorm

y In this stage, we need to come up with as many possible explanations for the problem as possible. Itis sometimes helpful (especially in a group setting) to

use an Ishikawa diagram to organize our thoughts sothat we don't leave any possibilities unconsidered.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 14/36

 Visual approach

y The Ishikawa diagram

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 15/36

Ishikawa

y Generate an Ishikawa diagram by drawing a³backbone´ arrow pointing to the right at theproblem statement. Then attach 4-6 ³ribs,´ each of 

 which represents a major broad category of items which may contribute to the problem. Each of ourcomponents should fit on one or another of theseribs.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 16/36

Ishikawa

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 17/36

Identify Likely Causes

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 18/36

Identify Likely Causes

y  We need to consider how likely each potential causeis. We should only eliminate hypotheses when they are absolutely disproven.

y For more complex problems, something like anInterrelationship Diagram may be useful inidentifying which potential cause may be might be aroot cause.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 19/36

Identify Likely Causes

y Interrelationship Diagrams use boxes containingphrases describing the potential causes. Arrows

 between the potential causes demonstrate influence

relationships between these issues. Each relationshipcan only have an arrow pointing in one direction.(Where the relationship's influence runs in bothdirections, the troubleshooters must decide which

one is predominant.) Items with more ³out´ arrowsthan ³in´ arrows are causes. Items with more ³in´arrows are effects.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 20/36

Identify Likely Causes

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 21/36

Test Possible Causes

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 22/36

Test Possible Causes

y  We need to perform testing in the least disruptivefashion possible. Data should be backed up if possible before testing proceeds.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 23/36

Test Possible Causes

y The best approach is to schedule testing of the mostlikely hypotheses immediately. Then start to performany non-disruptive or minimally disruptive testing of 

hypotheses.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 24/36

Test Possible Causes

y In some cases, it may be possible to test thehypothesis directly in some sort of test environment.This may be as simple as running an alternative copy 

of a program without overwriting the original. Or itmay be as complex as setting up a near copy of thefaulted system in a test lab.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 25/36

Test Possible Causes

y Depending on the situation, it may even beappropriate to test out the hypotheses by directly applying the fix associated with that problem.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 26/36

Implement the Fix

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 27/36

Implement the Fix

y The fix needs to be implemented in the least-disruptive, lowest-cost manner possible. Ideally, thefix should be performed in a way that will completely 

 verify that the fix itself has resolved the problem.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 28/36

 Verify the Fix

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 29/36

 Verify the Fix

y  We need to check that the problem is resolved, andalso that we have not introduced any new problems.Each service in your environment should have a test

suite associated with it so that you can quickly eliminate the possibility that we have introduced anew problem.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 30/36

 Verify the Fix

y Part of this verification should include a root-causeanalysis to make sure that the real problem has beenresolved. Band-Aid solutions are not really solutions.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 31/36

Document the Fix

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 32/36

Document the Fix

y Over time, the collection of data on resolvedproblems can become a valuable resource.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 33/36

Document the Fix

y It can be referenced to deal with similar problems.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 34/36

Document the Fix

y It can be used to track recurring problems over time, which can help with a root cause analysis.

y Or it can be used to continue the troubleshootingprocess if it turns out that the problem was not really resolved after all.

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 35/36

Last but not least

y La méthode de quatre questions :

y Quel est les problème ?

y Quelle est la cause du problème ?

y Quelles sont les solutions possibles à ce problème ?

y Quelle est la meilleure solution ?

8/7/2019 trouble shooting methodology

http://slidepdf.com/reader/full/trouble-shooting-methodology 36/36

Thank you