Purpose:

Identify and classify problems and their root causes and provide timely resolution to prevent recurring incidents. Provide recommendations for improvements.

Objective:

Increase availability, improve service levels, reduce costs, and improve customer convenience and satisfaction by reducing the number of operational problems.

Description:

<?>

Inputs:

  • Risk-related root causes
  • Criteria for problem registration
  • Problem log
  • Risk-related root causes
  • Incident resolutions
  • Closed service requests and incidents
  • <?>

Outputs:

  • Problem classification scheme
  • Problem status reports
  • Problem register
  • Root causes of problems
  • Problem resolution reports
  • Known-error records
  • Proposed solutions to known errors
  • Closed problem records
  • Communication of knowledge learned
  • Problem resolution monitoring reports
  • Identified sustainable solutions
  • <?>

Controls:

<?>

Task Instructions:

Identify and Classify Problems

    1. Identify problems through the correlation of incident reports, error logs, and other problem identification resources. Determine priority levels and categorization to address problems in a timely manner based on business risk and service definition.

    2. Handle all problems formally with access to all relevant data, including information from the change management system and IT configuration/asset and incident details.

    3. Define appropriate support groups to assist with problem identification, root cause analysis, and solution determination to support problem management. Determine support groups based on pre-defined categories, such as hardware, network, software, applications, and support software.

    4. Define priority levels through consultation with the business to ensure that problem identification and root cause analysis are handled in a timely manner according to the agreed-on SLAs. Base priority levels on business impact and urgency.

    5. Report the status of identified problems to the service desk so customers and IT management can be kept informed.

    6. Maintain a single problem management catalog to register and report problems identified and to establish audit trails of the problem management processes, including the status of each problem (i.e., open, reopen, in progress, or closed).

Investigate and Diagnose Problems

    1. Identify problems that may be known errors by comparing incident data with the database of known and suspected errors (e.g., those communicated by external vendors) and classify problems as a known error.

    2. Associate the affected configuration items to the established/known error.

    3. Produce reports to communicate the progress in resolving problems and to monitor the continuing impact of problems not solved. Monitor the status of the problem-handling process throughout its life cycle, including input from change and configuration management.

Raise Known Errors

    1. As soon as the root causes of problems are identified, create known-error records, and develop a suitable workaround.

    2. Identify, evaluate, prioritize, and process (via change management) solutions to known errors based on a cost-benefit business case and business impact and urgency.

Resolve and Close Problems

    1. Close problem records either after confirmation of the successful elimination of the known error or after agreement with the business on how to alternatively handle the problem.

    2.  Inform the service desk of the schedule of problem closure, e.g., the schedule for fixing the known errors, the possible workaround or the fact that the problem will remain until the change is implemented, and the consequences of the approach taken. Keep affected users and customers informed as appropriate.

    3. Throughout the resolution process, obtain regular reports from change management on progress in resolving problems and errors.

    4. Monitor the continuing impact of problems and known errors on services.

    5. Review and confirm the success of resolutions of major problems.

    6. Make sure the knowledge learned from the review is incorporated into a service review meeting with the business customer

Perform Proactive Problem Management

    1. Capture problem information related to IT changes and incidents and communicate it to key stakeholders. This communication could take the form of reports to and periodic meetings amongst incident, problem, change, and configuration management process owners to consider recent problems and potential corrective actions.

    2. Ensure those process owners and managers from the incident, problem, change, and configuration management meets regularly to discuss known problems and future planned changes.

    3. To enable the enterprise to monitor the total costs of problems, capture change efforts resulting from problem management process activities (e.g., fixes to problems and known errors) and report on them.

    4. Produce reports to monitor the problem resolution against the business requirements and SLAs. Ensure the proper escalation of problems, e.g., escalation to a higher management level according to agreed-on criteria, contacting external vendors, or referring to the change advisory board to increase the priority of an urgent request for change (RFC) to implement a temporary workaround.

    5. To optimize the use of resources and reduce workarounds, track problem trends.

    6. Identify and initiate sustainable solutions (permanent fix) addressing the root cause, and raise change requests via the established change management processes.