Recently, I got the call from a firm that had been attacked by ransomware. With all servers infected, they refused to pay the ransom, deleted their VMs, and worked for five straight days on the restoration process. Despite an able team, the system was still only running at 10 percent of potential. The situation was dire: the company was unable to purchase products to sell until their environment was operating normally.
My staff and I were called in to handle two basic processes: resolve this problem and drastically improve performance of their applications, as well as set up their high availability. Our 96-hour engagement proved to be not only a lesson in terms of how to recover, but how to employ rigorous and dedicated security practices to prevent them in future.
This article is in no way a judgment that it’s easier to simply pay, I’ll specify that upfront. While I all loathe ransomware hackers, and hate the idea of anyone giving them money, whether to pay or not pay a ransomware demand must always be the choice of the individual company dealing with the attack. The solution is unique to the business – an algorithm rather than any overarching ethical policy.
You have to consider:
- Outcomes – There is no guarantee you will get your data back even if you pay. Nor is there any power-train warranty against future attacks. You can be attacked again within days, because the ransomware virus acts like fire, and the smoking embers can often be rekindled.
- Resources – How robust is your disaster recovery plan? Do you have sufficient readily accessible resources – either in-house or contracted outside resources – to enable that recovery?
- Time – How long do you have to restore your environment before your company suffers catastrophic losses which jeopardize everyone?
In this client’s case, they had a very adept CTO with a solid DR plan and a capable team. It still wasn’t easy.
Fixing the Problem
We came in with an ascetic’s conscience. We were clinical, weighing what was necessary to restore business over what had been done in the past because it was optimal for company practices and behaviors. Because the servers were rebuilt from scratch and the hosting provider didn’t have a record of the configuration of the old virtual machines, we were forced to start our work without a baseline. When we started looking at the system, I took a few minutes to gather some performance metrics on the server, which are shown below.
We could immediately see the CPU load on the server was incredibly high. While talking with the client, we discovered there was a daily process that previously took about 20 minutes to complete. After the servers were rebuilt, that same process took more than 40 minutes, a situation which disabled the company’s ability to purchase products to sell.
We went through the various settings on the server and adjusted the settings for best practices. This included working with the hosting provider to ensure the VMs were setup for maximum performance. This included:
- setting the vNUMA configuration correctly so it matched the host
- configuration of the paravirtualization virtual SCSI driver on three virtual SCSI controllers
- spreading the hard drives on the VM across the paravirtualization virtual controllers.
On top of this we looked through the SQL Server and the code which was executed against the SQL server looking for performance tuning options and discovered several indexing changes that could be made. This included adjusting settings such as:
- “maximum degree of parallelism” so it was no more than ½ of the number of CPU cores per virtual NUMA node
- The “cost threshold for parallelism” from 5 to 50.
Once completed, the system was performing much better. Average CPU load of the SQL server was down from 50 percent to 11 percent and the CPU spikes were down from 100 percent to 46 percent, as shown in graph below.
Once completed, the coveted 20-minute daily process was running in under a minute, the fastest it had ever run.
An Ounce of Prevention is Worth More Than a Pound of Cure
Paying or not paying the ransom doesn’t guarantee recovery either way. Recently, another company that did pay the ransom was unable to recover and 300,000 people lost their jobs. Preventative measures are the best vaccine you can employ.
- Ensure normal accounts used day-to-day by the admin team have no access to the production servers. This is normally achieved by either having “admin” accounts which do not have access to a user’s email account but does have access to the servers. In a perfect world, an “admin” account would have no ability to logon to any user’s desktop.
- Allow only very limited access from the workstations to the production servers. If possible, only have remote desktop access to the production servers, and possibly access to things like the SQL server service.
- Prevent access from the production servers to the Internet unless specific access to specific websites is required. This prevents any virus, spyware, or botnet software from connecting to its command and control server to receive instructions. While this will likely make it complicated for users to be able to download software, this makes it incredibly hard for viruses to access the Internet.
These sorts of precautions prevent unauthorized software from gaining access to the production servers, even if the workstations are compromised. If the workstations were to be compromised, the servers would be able to continue to function, allowing business to continue.
Finally, routinely test your backups. Having proper backups and testing them regularly will tell you before a catastrophic event if there is a problem with the restore process. It’s always a good idea to keep documentation for bringing back critical infrastructure stored somewhere a catastrophic event like ransomware can’t access (either a hard copy or shared online).
Ransomware Isn’t Going Away
Ransomware won’t be disappearing any time soon, nor will operator errors. Revisit your DR plans regularly, do security best practice checks randomly. Because even as prepared as you may be, ransomware is adapting, just as each company is adapting in their ability to combat it. Be sure you have the resources to react with speed once attacked – NDA’s with contractors can cost valuable hours that can be otherwise spent recovering and preventing losses.