As an industry professional, you're eligible to receive a printed copy of the journal.

Fill out your address below.






Please reset your password to access the new DRJ.com
Reset my password
Welcome aboard, !
You're all set. We've send you an email confirmation to
just to confirm you're you.

Welcome to DRJ

Already registered user? Please login here

Existing Users Log In
   

Create new account
(it's completely free). Subscribe

Recently, I got the call from a firm that had been attacked by ransomware. With all servers infected, they refused to pay the ransom, deleted their VMs, and worked for five straight days on the restoration process. Despite an able team, the system was still only running at 10 percent of potential. The situation was dire: the company was unable to purchase products to sell until their environment was operating normally.

My staff and I were called in to handle two basic processes: resolve this problem and drastically improve performance of their applications, as well as set up their high availability. Our 96-hour engagement proved to be not only a lesson in terms of how to recover, but how to employ rigorous and dedicated security practices to prevent them in future.

Background

This article is in no way a judgment that it’s easier to simply pay, I’ll specify that upfront. While I all loathe ransomware hackers, and hate the idea of anyone giving them money, whether to pay or not pay a ransomware demand must always be the choice of the individual company dealing with the attack. The solution is unique to the business – an algorithm rather than any overarching ethical policy.

You have to consider:

  1. Outcomes – There is no guarantee you will get your data back even if you pay. Nor is there any power-train warranty against future attacks. You can be attacked again within days, because the ransomware virus acts like fire, and the smoking embers can often be rekindled.
  2. Resources – How robust is your disaster recovery plan? Do you have sufficient readily accessible resources – either in-house or contracted outside resources – to enable that recovery?
  3. Time – How long do you have to restore your environment before your company suffers catastrophic losses which jeopardize everyone?

In this client’s case, they had a very adept CTO with a solid DR plan and a capable team. It still wasn’t easy.

Fixing the Problem

We came in with an ascetic’s conscience. We were clinical, weighing what was necessary to restore business over what had been done in the past because it was optimal for company practices and behaviors. Because the servers were rebuilt from scratch and the hosting provider didn’t have a record of the configuration of the old virtual machines, we were forced to start our work without a baseline. When we started looking at the system, I took a few minutes to gather some performance metrics on the server, which are shown below.

We could immediately see the CPU load on the server was incredibly high. While talking with the client, we discovered there was a daily process that previously took about 20 minutes to complete. After the servers were rebuilt, that same process took more than 40 minutes, a situation which disabled the company’s ability to purchase products to sell.

We went through the various settings on the server and adjusted the settings for best practices. This included working with the hosting provider to ensure the VMs were setup for maximum performance. This included:

  • setting the vNUMA configuration correctly so it matched the host
  • configuration of the paravirtualization virtual SCSI driver on three virtual SCSI controllers
  • spreading the hard drives on the VM across the paravirtualization virtual controllers.

On top of this we looked through the SQL Server and the code which was executed against the SQL server looking for performance tuning options and discovered several indexing changes that could be made. This included adjusting settings such as:

  • “maximum degree of parallelism” so it was no more than ½ of the number of CPU cores per virtual NUMA node
  • The “cost threshold for parallelism” from 5 to 50.

Once completed, the system was performing much better. Average CPU load of the SQL server was down from 50 percent to 11 percent and the CPU spikes were down from 100 percent to 46 percent, as shown in graph below.

Once completed, the coveted 20-minute daily process was running in under a minute, the fastest it had ever run.

An Ounce of Prevention is Worth More Than a Pound of Cure

Paying or not paying the ransom doesn’t guarantee recovery either way. Recently, another company that did pay the ransom was unable to recover and 300,000 people lost their jobs. Preventative measures are the best vaccine you can employ.

  • Ensure normal accounts used day-to-day by the admin team have no access to the production servers. This is normally achieved by either having “admin” accounts which do not have access to a user’s email account but does have access to the servers. In a perfect world, an “admin” account would have no ability to logon to any user’s desktop.
  • Allow only very limited access from the workstations to the production servers. If possible, only have remote desktop access to the production servers, and possibly access to things like the SQL server service.
  • Prevent access from the production servers to the Internet unless specific access to specific websites is required. This prevents any virus, spyware, or botnet software from connecting to its command and control server to receive instructions. While this will likely make it complicated for users to be able to download software, this makes it incredibly hard for viruses to access the Internet.

These sorts of precautions prevent unauthorized software from gaining access to the production servers, even if the workstations are compromised. If the workstations were to be compromised, the servers would be able to continue to function, allowing business to continue.

Finally, routinely test your backups. Having proper backups and testing them regularly will tell you before a catastrophic event if there is a problem with the restore process. It’s always a good idea to keep documentation for bringing back critical infrastructure stored somewhere a catastrophic event like ransomware can’t access (either a hard copy or shared online).

Ransomware Isn’t Going Away

Ransomware won’t be disappearing any time soon, nor will operator errors. Revisit your DR plans regularly, do security best practice checks randomly. Because even as prepared as you may be, ransomware is adapting, just as each company is adapting in their ability to combat it. Be sure you have the resources to react with speed once attacked – NDA’s with contractors can cost valuable hours that can be otherwise spent recovering and preventing losses.

ABOUT THE AUTHOR

Denny Cherry

Denny Cherry is a world-renowned author, speaker, and Microsoft MVP as well as the principal and founder of award-winning Gold Microsoft Partner Denny Cherry and Associates Consulting. DCAC assists companies with achieving Azure, SQL, and big data goals while finding ways to save on costs. With clients from Fortune 50 corporations to small business, the commitment to each is the same: to provide a deft, high-speed IT environment.

4 Keys to Business Success During the Pandemic and Beyond
Remember all the way back in early 2020, when you started rolling out your carefully crafted business plans for the...
READ MORE
Optimizing Your Data Center’s Disaster Recovery Plan
As part of every business plan, there should be a disaster recovery approach that plans for natural, cyber, and emergency...
READ MORE
Backup Software’s New Countermeasures to Detect, Protect, and Recover from Ransomware
Scarcely a month goes by without some organization publicly reporting its need to recover from a ransomware attack. This continuing...
READ MORE
Tips to Making the Best Cloud Backup Decision
Organizations of all sizes increasingly turn to cloud providers such as Amazon, Microsoft, Google, and others. They utilize them to...
READ MORE