Problem Management - Problem Investigation and Diagnosis

Problem Investigation and Diagnosis

The result of an investigation for a problem will be a root cause diagnosis or a RCA report. The resolution should be the sum of the appropriate level of resources and skills used to find it. There are a number of useful problem solving techniques that can be used to help diagnosis and resolved problems.

  • The CMS must be used to help determine the level of impact and to assist in pinpointing the point of failure.
  • The Known Error Database or KEDB should be accessed and checked in order to find out if the problem has occurred in the past, if so a resolution should be already in place.
  • The Chronological analysis, the events that trigged the problem will be checked in chronological order in order to have a timeline of events. The purpose is to see which event trigger the next event and so on, or to rule out some possible events.

The Pain Value Analysis contains a broader view of the impact of an incident or a problem on the business. Rather than analysing the number of incidents/problems of a particular type in a particular time interval, the technique focus on in-depth analysis of what level of pain has been caused to the business by these incidents/problems. A formula to calculate the level of pain should take into account:

  • the number of people affected
  • the duration of the downtime caused
  • the cost to the business

The Kepner and Tregoe method is used to investigate deeper-rooted problems. They defined the following stages:

  • defining the problem
  • describing the problem in terms of identity, location, time (duration) and size (impact)
  • establishing possible causes
  • testing the most probable cause
  • verifying the true cause

Pareto Analysis is a technique for separating important potential causes from trivial issues. The following steps should be taken:

  1. Form a table listing the causes and their frequency as a percentage
  2. Arrange the rows in the decreasing order of importance of the causes (the most important cause first)
  3. Add a cumulative percentage column to the table
  4. Create a bar chart with the causes, in order of their percentage of total
  5. Draw a line at 80% on the Y-axis, then drop the line at the point of intersection with the X-axis. From the chart you can see the primary causes for the network failures. These should be targeted first.
Network failures
Causes Percentage of total Computation % Cumulative
Network Controller 35 0+35% 35
File corruption 26 35% + 26% 61
Server OS 6 61%+6% 67%

Read more about this topic:  Problem Management

Famous quotes containing the word problem:

    How much atonement is enough? The bombing must be allowed as at least part-payment: those of our young people who are concerned about the moral problem posed by the Allied air offensive should at least consider the moral problem that would have been posed if the German civilian population had not suffered at all.
    Clive James (b. 1939)