Incident management and incident response have become focal points in highly technology-centric operational environments. As a new recipe for success, business organizations highly depend on people with skill sets, well-defined processes, and reliable enterprise technology systems to further the advancement and maintain ascendancy. However, the use of technology comes with associated vulnerabilities. Some vulnerabilities may be inherent to a specific organization or can be a part of the overall dimension of technology. Unlike in the past, it is no secret sophisticated technology-related threats originate from state-sponsored entities besides non-state entities. This phenomenon makes mitigation efforts more complex than ever due to their speed, scale, and intensity.
Most organizations use incident management and incident response frameworks to overcome technology-related adverse effects and challenges. Incident management is a well-crafted process with sequential steps an organization takes to identify, analyze, and resolve critical incidents to minimize disruptions and prevent future reoccurrence. In hindsight, incident response is a term used to describe the process adopted by an organization that manages security incidents to limit damages and reduce recovery time and costs.
The recent cybersecurity-related attacks experienced by Ukraine and Taiwan exemplified the magnitude of the overt use of cybersecurity-related threats as a new dimension in warfighting. In the corporate world, many organizations experience cyber-attacks in the form of malware, phishing, man-in-the-middle attacks, denial of service attacks, structured query language injection, zero-day-exploit, and domain name server tunneling. Such exploitations resulted in numerous reports of cryptocurrency thefts, data breaches, data losses, and supply chain-related disruptions creating ripple effects across the board.
The conventional wisdom in incident management, like in any other operational environment, is that it is neither possible to prevent all types of cybersecurity-related attacks nor have contingencies due to the dynamic nature of technology advancement. However, the help of an enterprise risk management framework enables organizations to lower the number of cybersecurity-related attacks, the severity of damages, and business disruptions.
In this context, the incident management system capable of responding to an incident and mitigating its impact while resolving critical issues to prevent reoccurrence plays a vital role. Further, incident response enhancement activities such as rapid detection and timely communication with stakeholders about incidents help to minimize loss and disruptions. Such enhancement further helps to mitigate technology and process-related weaknesses and restore information technology-related services within agreed service levels.
To achieve the desired effects in the incident response scenario: the organization needs to set up a highly skillful incident response team and design a response plan. The response team with a deep understanding of their organization and related information technology infrastructure is vital to execute functions as intended. As an integral component in incident response, an incident response plan serves as a blueprint with clearly defined stakeholders’ roles and responsibilities. The incident response plan aims to promote rapid detection, effective containment, eradication of threats, and restoration of information technology-related services to a predetermined level. As a response enhancement capability, planners need to identify means of reporting incidents and disseminating incident-related information to stakeholders on a need-to-know basis. While understanding the incident response is an ongoing process, it is also vital to identify reliable resources to alert and educate users on risks, security threats, and incident handling procedures. Such undertaking undoubtedly helps incident managers to navigate turbulent situations in their favor.
The key component of the incident response framework comprises an effective integrated threat analysis model to augment situational awareness capability and timely response to potential incidents. The concept of threat analysis aims to identify accidental, internal, or external threats and vulnerabilities, and assess the effectiveness of organizational security protocols. As a precursor, conducting a thorough gap analysis helps to identify security gaps.
While understanding the complex nature of threats due to the speed, scale, and intensity, institutional proactive incident response readiness is critical to detect security breaches and timely recovery at a predetermined level. As the name implies, security breaches involving sensitive data leaks such as personal information, proprietary information, and research findings create operational disruptions, financial losses, and reputational damages. These types of occurrences draw public attention and greater regulatory scrutiny. Hence, organizational incident readiness aims to sense and detect threats early while linking them to known threat actors through exhaustive link analysis. Such practices enable planners to adapt threat prevention strategies early and execute response activities to speed up recovery from security breaches while recording mission-critical information to prevent future incidents.
In a world full of automation beyond imagination, still, the human factor plays the most significant role in organizational success. However, like in technology, the human work-related behavior factor has its characteristics and limitations. In most instances, the inability to identify human limitations inevitably has led to undesirable events. Such errors, generally known as human errors, are the leading cause of cybersecurity incidents across all organizations. Human error is usually a circumstance where a planned activity fails to achieve its intended outcome and is one of the unresolved problems in this context. Some human errors in information technology include service misconfigurations, improper security patch management, leaving IT assets unattended, and failure to comply with standard operating procedures.
While recognizing the gravity of human error, it is vital to identify and eliminate potential human errors in the work environment at the earliest possible opportunity. The best approach to eliminate potential human error is to conduct firm-specific awareness programs with role-based training to develop the required skill sets. Undoubtedly, this serves as one of the best strategies to correct human errors while streamlining internal processes. The methodology needs to be realistic with hands-on experience while recognizing factors contributing to human errors. Well-established research and incident reviews have identified some of the contributory factors related to human errors as personal distractions, fatigue, excessive workloads, a distractive working environment, poor systems, and process design flaws. As a best practice, the organization needs to take actions to address those risk factors within the framework of incident management. Such undertaking helps to eliminate preventable errors at the earliest possible time with strict compliance to standard operating procedures.
As mentioned above, incident management consists of an interconnected process known as the incident management life cycle. The life cycle is as given below:
Incident identification and logging – Every incident identified requires timely logging regardless of circumstances. For this purpose, the organization may use emails, firm-specific web forms, incident management systems, SMS, or any other approved firm-specific channel of communication.
Incident categorization – Incidents are categorized based on the type of disruption or enterprise functions such as technology or business process.
Incident prioritization – Essentially, this involves the organization priority matrix stated in the specific incident response playbook. The level of impact on the business and its users forms the foundation for prioritization. Urgency is measured in time, meaning how quickly a resolution is required.
Incident routing and assignment – At this stage, through an automated incident routing system or at the discretion of a service desk representative based on information provided by the customer, incidents are routed to the appropriate group with relevant expertise to resolve the issue.
Creation and managing tasks – Based on the complexity and severity of the incident, it is prudent to subdivide tasks based on technicians’ expertise, as appropriate, to meet service level agreement.
Service level agreement management – SLA is expected to adhere to the escalation process to ensure timely incident response and resolution. In most cases, incident escalation helps to avoid an imminent breach of service level agreement or an event faced with an already breached scenario. The escalation process ensures handling of priority cases early.
Incident resolution – In this stage, assigned technicians have come up with a temporary workaround or a permanent solution for the reported issue.
Resolve and close incident – The incident is closed after the issue is resolved, and the user acknowledges the resolution and is satisfied with it. Lastly, the post-incident review initiated to better prepare the teams for future incidents and enhance the incident management process.
Incident management is a strenuous process which needs a timely response and has key performance indicators (KPIs) to assess the efficiency of the incident response systems. Widely used Information Technology Infrastructure Library (ITIL) key performance indicators are:
Average resolution time – Average time taken to resolve an incident.
Average initial response time – Average time taken to respond to each incident.
Service level agreement compliance rate – The percentage of incidents resolved within service level agreement.
First call resolution rate – The percentage of incidents resolved in the first call.
Number of repeated incidents – The number of identical incidents logged within a specific timeframe.
Reopen rates – The percentage of resolved incidents which were reopened.
Incident backlog – The number of incidents which are pending in the queue without a resolution.
Percentage of major incidents – The number of major incidents compared to the total number of incidents.
Cost per ticket – The average expense for each ticket.
End user satisfaction rates – The number of end users or customers who were satisfied with the information technology services delivered to them.
As mentioned above, incident response management (a technical process), is an integral part of cybersecurity operations. As in incident management, the incident response also has its process known as the incident response life cycle. It contains five sequential stages with a well-defined workflow in the framework of incident response management. The incident response cycle consists of the following:
Preparation – During preparation, the organization creates an incident response management plan capable of handling firm-specific incidents.
Detection and analysis – The incident response analyst is responsible for collecting and analyzing data to find evidence to help identify the source, nature of the attack, and impact on the systems.
Containment – In this step, all possible methods are used to prevent the spread of malware or viruses and block known threat actors’ internet protocol addresses.
Eradication – During this, action is taken to remove the threat from the operational systems and ensure security software is up to date to prevent future incidents.
Recovery – Once the external threat is eliminated, the recovery process starts to establish systems to predetermined service levels.
Post-event activity – During this, the organization needs to conduct a thorough review of the entire incident in three stages (pre-incident, during the incident, after incident activities) to understand the patterns and take necessary steps to prevent reoccurrence. This review helps the organization identify security gaps and enhance security protocols and strategies to maintain operational resilience.
In conclusion, incident management and incident response management have inherent limitations related to people, processes, and technology. The above limitations not only make organizations vulnerable to various forms of threats but also create uncertainty. In most cases, human error is the leading contributory factor causing undesirable situations leading to cybersecurity-related incidents. Evidence suggests user noncompliance issues related to cybersecurity pave the way for unauthorized access to vital systems and information. Even though such cases are highly preventable, organizations continue to face similar dilemmas resulting in sensitive information ending up on dark webs or failure to perform intended business functions.
As we see, the world is changing, and so are the threats. Therefore, organizations need to be proactive in dealing with threats and vulnerabilities. In this context, nothing can replace early detection of the harmful elements and actioning preemptive measures to accelerate incident response and minimize risks. As a best practice, organizations need to maintain situational awareness by adding current industry-relevant threat intelligence management into the incident response planning framework and conducting a thorough evaluation of existing security measures to identify loopholes. Lastly, organizations achieve success by having multi-disciplinary experts working in a highly collaborative setup, such as a security operations center or command center environment, as the first line of defense to detect and eliminate threats faster. Such an environment undoubtedly will enable effective communication between stakeholders for timely response to incidents and to continue their businesses in an intended manner.