Problem Investigation and Diagnosis
The result of an investigation for a problem will be a root cause diagnosis or a RCA report. The resolution should be the sum of the appropriate level of resources and skills used to find it. There are a number of useful problem solving techniques that can be used to help diagnosis and resolved problems.
- The CMS must be used to help determine the level of impact and to assist in pinpointing the point of failure.
- The Known Error Database or KEDB should be accessed and checked in order to find out if the problem has occurred in the past, if so a resolution should be already in place.
- The Chronological analysis, the events that trigged the problem will be checked in chronological order in order to have a timeline of events. The purpose is to see which event trigger the next event and so on, or to rule out some possible events.
The Pain Value Analysis contains a broader view of the impact of an incident or a problem on the business. Rather than analysing the number of incidents/problems of a particular type in a particular time interval, the technique focus on in-depth analysis of what level of pain has been caused to the business by these incidents/problems. A formula to calculate the level of pain should take into account:
- the number of people affected
- the duration of the downtime caused
- the cost to the business
The Kepner and Tregoe method is used to investigate deeper-rooted problems. They defined the following stages:
- defining the problem
- describing the problem in terms of identity, location, time (duration) and size (impact)
- establishing possible causes
- testing the most probable cause
- verifying the true cause
Pareto Analysis is a technique for separating important potential causes from trivial issues. The following steps should be taken:
- Form a table listing the causes and their frequency as a percentage
- Arrange the rows in the decreasing order of importance of the causes (the most important cause first)
- Add a cumulative percentage column to the table
- Create a bar chart with the causes, in order of their percentage of total
- Draw a line at 80% on the Y-axis, then drop the line at the point of intersection with the X-axis. From the chart you can see the primary causes for the network failures. These should be targeted first.
Network failures | |||
---|---|---|---|
Causes | Percentage of total | Computation % | Cumulative |
Network Controller | 35 | 0+35% | 35 |
File corruption | 26 | 35% + 26% | 61 |
Server OS | 6 | 61%+6% | 67% |
Read more about this topic: Problem Management
Famous quotes containing the word problem:
“But a problem occurs about nothing. For that from which something is made is a cause of the thing made from it; and, necessarily, every cause contributes some assistance to the effects existence.”
—Anselm of Canterbury (10331109)