First off, let me start by saying that while this event is being regarded as the first successful cyberattack to cause a disruption to the US power grid, there was NO IMPACT TO THE PHYSICAL FLOW OF ELECTRONS. Typically, when words like disruption or disturbance are used in the context of the power grid, somewhere a meter has stopped spinning or some serious transients have been introduced that cause undesirable grid conditions. I don’t consider a communications failure a “grid disruption” since there is no impact to the physical grid. Also, this specific incident now classified as an “attack” could have been nothing more than an automated bot scanning WAN facing IPs looking for the specific vulnerable firewall to exploit. Nevertheless, as always, events of this nature can serve as an excellent lessons learned opportunity.
So What Happened: From 9:12 AM to 6:57 PM on March 5th 2019 a registered entity (this could be a power company or system operating company) experienced communication failures between a low-impact control center and multiple remote generation sites. The failure, which is described as being “brief (i.e., less than five minutes)”, was the result of multiple firewall reboots spanning over a 10-hour period. The vendor was called in to help diagnose the event and investigate the logs. Later that evening a patch was applied to one of the firewalls and tested in a non-critical environment. Afterwards, the firmware patch was applied to one of the remote generating sites and monitored throughout the night. After seeing no adverse affects of the patch, the rest of the vulnerable assets were patched.
Vulnerability Details: The root cause of this event is believed to be an externally exploitable WAN facing firewall. The Cisco Adaptive Security Appliance (ASA) Web Services Denial of Service vulnerability CVE-2018-0296 was first published on June 6th 2018 and has an impact rating of 8.6. Specifically, the vulnerability allows an unauthenticated, remote attacker to cause an affected device to reload unexpectedly, resulting in a denial of service (DoS) condition. The vulnerability is due to lack of proper input validation of the HTTP URL. An attacker could exploit this vulnerability by sending a crafted HTTP request to a vulnerable device. An exploit could allow the attacker to cause a DoS condition or an unauthenticated disclosure of information. This vulnerability applies to IPv4 and IPv6 HTTP traffic. Cisco has released software updates that address this vulnerability. There are no workarounds that address this vulnerability. Additional details are provided in the Cisco Advisory.
Known Exploits: An exploit intended to test the presence of the CVE-2018-0296 vulnerability was made public on June 21st 2018. This ‘tool’ which is intended to be used for good can instead be used for evil. The exploit code which comes complete with command line instructions is less than 58 lines of Python. The main lines actually exploiting the vulnerability are just 2 lines of code and they leverage the common Python requests package to make the HTTP connection. A user can run the script via the command line by simply typing ‘python cisco_asa.py <url>’ where url is the node you want to test (or attack). In addition to causing a denial of service attack, the exploit code will return the content of the current directory being traversed on the vulnerable firewall, files in +CSCOE+, active sessions, as well as enumerate valid usernames. Additionally, included in the Readme file of the GitHub repo housing the exploit are instructions on how to find vulnerable nodes using OSINT tools like Shodan and Censys. These tools can automatically return the IP address of the exploitable firewalls.
Here are my preliminary thoughts on the event and key takeaways:
- There was no “Grid Disruption” – A grid disruption means a physical impact to the flow of electrons.
- Monitoring Firmware and Vulnerabilities – The firewall was externally exploitable because there was a known vulnerability that existed. Exploit code existed before the vendor issued an advisory. This stresses the importance of knowing what assets are installed, the firmware version of those assets, and in using a tool like VigilantGrid to track vulnerabilities and firmware patches. Such tools are capable of notifying key personnel that a vulnerability exists and a patch is available. Additionally, depending on which tool you use, it should offer context as to how critical the patch is in the context of the power system and what the impact would be if the vulnerability is exploited.
- Monitoring for Suspicious Activity – Similar to monitoring for vulnerabilities and patches, the assets themselves can be made cyber-aware and can send event logs to a remote SIEM for analysis. In the case of this event, when the directory traversal exploit was initiated, the request likely could have been captured sending a Syslog message to the SIEM for analysis. Once deemed suspicious, the SIEM then sends an alert (text, email, automated phone call, etc) to an individual immediately identifying the cause of the communication failure and source IP of the attacker. SIEMs like VigilantGrid are capable of this type of analysis and can save time when investigating an event.
- Network Security Monitoring wouldn’t have Detected the Intrusion – The vulnerable firewall was externally exploited. Most network security monitoring (NSM) tools are on the LAN side of a firewall and therefore don’t inspect communication that terminates at the firewall. One of the main reason for installing a NSM tool on the LAN side of a firewall is to perform deep packet inspection. If placed on the WAN side the payload information would be encrypted making it impossible for the NSM tool to perform any type of deep packet inspection. Additionally, if placed on the WAN side, the NSM tool would likely trigger multiple false positive alarms in response to random drive-by scans. The fact that a NSM would be useless in detecting this type of attack stresses the importance of using and configuring assets to report events to a SIEM.
- Was the Grid Really Targeted? – If I were a betting man, my money would be on a bot or a script kiddie that carried out the attack. Given that this vulnerability has been exploited in the wild and it is for a common back-office IT vendor device, it is highly unlikely that the bot or attacker knew that it was targeting a power grid asset owner. Additionally, given the modulatory of the exploit code posted to GitHub and how easy it is to find vulnerable nodes via OSINT, it is more likely a drive-by attack from a bot.
- Using Universal Firewalls for the Grid – Cisco firewalls and switches were originally designed for back office IT environments and are commonly used across multiple sectors (finance, healthcare, government, etc). This means Cisco is one of the most targeted vendors on the market. If WAN facing, assets can be identified as being a Cisco product and then vulnerabilities and exploit code can be leveraged to compromise the system. Sadly, this is a side effect of using highly adopted vendors and is something that should be considered when designing power system environments.
- Comms Failure Happen all the Time, Grid still works – For the most part, the power grid is tried and true. It is specifically designed to work in isolation and in the presence of a communication failure. Communication failures happen all the time. Additionally, most asset owners at the distribution level have zero remote control over their field assets. In this particular event the comm failure was between a control center and a remote generation site(s). If the generation site(s) is a power plant, there will be staff on site who are capable of locally controlling and monitoring the site. These types of sites are manned 24×7, and if the control center wants to change a set value for the generator they can just pick up the phone and call the on-site operator.
- Goes Back to Engineering Design
- Whitelist Communication Links – Power system environments are extremely static, and all allowable communication links are known. This means that the network rules and access control lists can be restricted to only allow legitimate communication with the WAN facing asset.
- Redundancy – In system protection engineering (relaying and automation), there are multiple requirements to eliminate single points of failure. This has caused a many sleepless nights for engineers as they begin to think about how a system could fail and how to engineer redundancy and resiliency into the system. Some asset owners go to the extreme of having two separate protection relays from two separate vendors to protect the same piece of electrical equipment from damaging system faults. Do we need to start applying that same approach here? If there was a redundant backup communication link that relied on a non-Cisco firewall, comms between the control center and the remote generation site would have never dropped.
Thank you for reading and if you have any questions feel free to reach out at firstname.lastname@example.org.
Special thanks to @BlakeSobczak and @chrissistrunk for linking me back to the original articles.