IT incident management is a process within IT service management (ITSM) that aims to restore service operations after an issue is detected, and minimize the effects of on a business and end users.
The incident management process consists of various steps for service restoration, including but not limited to issue detection, incident creation, prioritization / classification, investigation and analysis, remediation, remediation verification, incident closure and post mortem analysis.
Incidents are also typically divided into two buckets:
Systems should be categorized by importance and have SLAs around how long they can be unavailable before escalation. Impact and urgency will determine if normal incident or major incident processes are followed, and when SLAs exceeded, the organization has run out of time for experimentation and must move onto the IT service continuity / disaster recovery plan.
Because the different levels of incident management trigger different processes for response, it is critical to define roles and responsibilities for execution. These roles and responsibilities define who will drive process improvement, report key performance indicators (KPIs), and execute and enforce process workflow. They also define lines of communication between the IT team, the rest of the organization, vendors, and third parties.
To the right, is an example of the potential roles and responsibilities involved during a major incident. For more information, download our white paper, Streamlining the Major Incident Resolution Process: Define, Plan, Staff and Communicate.
Learn what IT professionals have to say about the state of incident management, and what challenges they face, especially when it comes to managing and responding to major IT incidents. This original research explores the tools, processes, and costs associated with incidents, as well as the most likely causes for downtime and outages.