By Isaac Basque-Rice, EDR Security Engineer at Adarma and Cian Heasley, Threat Lead at Adarma
In the early hours of June 19, CrowdStrike released an update to the Falcon EDR platform, intending to make some minor changes to the behaviour of the
Falcon Sensor, a lightweight malware detection and response agent that is present on upwards of 8.5 million corporate Windows devices.
Unfortunately, this update caused a logic error that resulted in system crashes and “Blue Screens of Death” (BSODs) across a sizeable proportion of the CrowdStrike customer base, including emergency services, airports and airlines, banks, retailers, and the NHS. The issue was quickly discovered, and within an hour and a half, CrowdStrike had issued a fixed version of the file and took steps to remediate the problem. However, at this point, the ripple effect of this error was already being felt globally.
At the beginning of the incident, remediation was a manual task performed by an organisation’s IT team, which, as you can imagine, could potentially be a monumental task for some organisations. Since the incident, Microsoft has released a new recovery tool to streamline this process.
Now that CrowdStrike has remediated the issue and a route back to business as usual is clear, the Adarma EDR and Threat teams have put together an evaluation of the incident, including technical details, lessons learned, and an outline of similar previous incidents – CrowdStrike were not the first and are unlikely to be the last.
The file, C-00000291*.sys (or more specifically, C-00000291-00000000-00000031.sys), is a so-called “Channel File”, which is part of the Rapid Response Content that CrowdStrike use to fine–tune changes to the behaviour of their sensors. Channel files are dynamic; they are received often (on occasion multiple times a day) and are configuration files that define specific rules for the sensor, such as detection logic, communication settings and response actions. They are managed by the Early Launch Anti-Malware (ELAM) process, a phase of the Windows boot process that is initialised early and loads any anti-malware drivers present on the system before other drivers. This process allows anti-malware to maintain as comprehensive coverage as possible.
According to the Technical Details blogpost CrowdStrike released, this file controls how Falcon “evaluates named pipe execution on Windows systems” (a named pipe is a method of inter-process communication). Crowdstrike deployed this updated file in response to recent changes in standard Command and Control suites used by malicious actors. An error in this file triggered a system crash with the error code PAGE_FAULT_IN_NONPAGED_AREA.
It’s important to note that page faults are a regular (albeit avoidable) part of computing. In an ideal page fault scenario, the computer will grab the required data from a space in the hard drive. This process will usually take longer but will only affect the device a little beyond that. However, in this instance, it appears the file tried to access a space that either did not exist or was not accessible to that process, resulting in a system crash.
This file is accessed early in the boot process when the Windows Early Launch Anti-Malware system starts, including CrowdStrike drivers that need this file. This stage starts drivers and supporting files including the file that caused this incident before startup. This is at the root of the problem. Fixing this on the individual level involves entering “safe mode,” a Windows mode that only loads essential drivers for the basic operation of the operating system, and then deleting the file through the command line interface there.
CrowdStrike have confirmed that this incident is unrelated to any reports of null byte exceptions, a type of error that occurs in programming when an unexpected null byte is encountered during the execution of a program in this or any other channel file.
For further information on the technical specifics of this incident, we recommend you read CrowdStrike’s Post-Incident Review.
While this might be the most significant global IT outage, this is not the first time something like this has happened. In late April of 2010, which resulted in the software identifying a crucial Windows binary, svchost.exe, which manages processes run from DLLs as malware.
McAfee’s anti-virus software blocked this process, resulting in boot loops and system crashes like this most recent CrowdStrike incident. This disruption impacted various institutions, including the University of Michigan’s medical school, police departments, jails, hospitals, and Australian supermarkets.
In August of 2022, VMware released guidance on an update detailing issues relating to their Carbon Black EDR solution, causing boot loops and BSOD issues on some Windows systems. In this case, VMware revealed that the problem was caused by updated threat research rulesets that had not triggered any issues during internal testing. Therefore, we must use this incident to enhance our cyber resilience to avoid similar outcomes in the future.
Human error continues to be one of the most significant weaknesses in cybersecurity. EDR and anti-virus protection are maintained by people, who can be fallible. Faulty updates and unintended consequences happen, so how can we pinpoint when they have occurred and take the appropriate action? It’s important to ask yourself and be able to answer the following three questions:
- Do your IT team and technical stakeholders know where to check for reliable fixes on faulty software updates?
- Do you have good lines of communication with the vendors who provide you with technical security solutions?
- Do you have an up-to-date asset inventory listing where security controls are deployed?
Once you can answer yes to the questions above, you have a strong foundation on which to start building a solid contingency plan.
We recommend implementing a robust backup system that includes both full-system and incremental backups. Restoring systems to their pre-incident state after a problematic file is removed can prevent significant manual effort.
Eliminate a shared single common point of failure. The CrowdStrike outage was so disruptive because many devices shared a single common point of failure. To avoid this, organisations should implement redundancy with business-critical software platforms and begin to diversify their systems and processes.
Develop, test, and implement a well-designed incident response plan. This plan is essential for assigning specific roles and responsibilities, establishing effective communication protocols, and outlining step-by-step procedures for a prompt and coordinated response. It provides a structured and organised approach to identifying, addressing, and rebounding from cybersecurity incidents. This proactive approach minimises the impact of an incident and maintains business continuity and adherence to regulatory requirements. Regular drills and updates to the plan ensure that the organisation remains well-prepared to tackle emerging threats confidently.
During the height of the incident, the National Cyber Security Centre reported increased related phishing activity. Within hours of the beginning of the outage, malicious actors had begun registering domains such as “crowdstrikebluescreen[.]com” and “crowdstrikefix[.]com” with the intention of tricking users into believing that they were legitimate sources of information associated with CrowdStrike and, ultimately, to steal information or money from unsuspecting and already stressed victims. This behaviour underscores the importance of informing other users about official channels for information related to business-critical processes. It also highlights how important it is to have security measures in place to mitigate against these threats.
Adarma provides customised cybersecurity solutions to assist businesses in achieving future-ready cyber resilience. Our approach enables organisations to decrease cyber risks by implementing effective threat intelligence, exposure management, and detection and response capabilities. We offer tailored threat intelligence, technological solutions, and strategic consultations that cater to our customers’ specific security requirements and business goals. Our expertise guarantees a balanced approach between security and operational efficiency, safeguarding our customers’ most crucial infrastructure and data.
Discover our tailored services and find out why we are the preferred security partner for FTSE 350 firms.