3-2-1-1-0 Rule for Backups: The New Gold Standard

Gold bars being pulled from a water bath

Revisiting An Old Friend

The 3-2-1 rule for backups has been around for decades. It's a rather simple principle you can find documentation on almost anywhere backups are discussed. The rules are as follows:

  • 3 copies of your data
  • 2 media types
  • 1 copy is off-site

3 Copies of Your Data

The principal is simple, you have 3 copies of your data (at a minimum) consisting of the following:

  • 1 working copy (your production data)
  • 1 local backup copy for fast backup & restoration tasks
  • 1 off-site copy for disaster recovery

2 Media Types

This one is hotly debated to mean many things depending on who you talk to. On the extreme end, some will say that it means having something vastly different in storage like Hard Disk Drives (HDD) as one media type and Linear Tape-Open (LTO) as another. The thought being that if you have different "media types" that you would avoid a common mode of failure. Concerns of a bug or virus being able to exploit a weakness in a common operating system, file system, hard drive manufacturer, etc. By having completely different media types it's presumed that there would be no common mode of failure that could corrupt all 3 of your data copies.

Others will take a less extreme approach, but at the very least what everyone agrees is that your backups cannot share the same storage media with one another. I have personally witnessed people configuring backups to a second partition of the very hard drive that their production data resides on. This is a clear breach of the second rule by anyone's standards. So, at the bare minimum... your 3 copies should all be stored on 3 different "storage mediums."

1 Copy is Off-Site

This last one, being off-site, was initially intended for disaster recovery purposes. Basically protecting against things like the building burning down. Depending on how you get your data off-site, it may not be the latest and greatest copy of your data. You may experience some data loss compared to your production environment if for instance you only send backups off-site in the evening hours. But, the thought process is that it's better to have something over nothing.

The New Gold Standard

The 3-2-1 rule has been a great place to start and it's amazing how many people don't follow it. But the new sheriff in town is the 3-2-1-1-0 rule and it builds on the old standard and closes the gap on a new threat landscape and new technologies being used. The new standard is as follows:

  • 3 copies of your data
  • 2 media types
  • 1 copy is off-site
  • 1 copy is off-line
  • Zero defects in your backups

The first 3 rules as you can see are the same. It's the last two that need some discussion.

1 Copy is Off-Line

The adage that "the only safe computer is one that's turned off" is more or less what this rule hints at. In the initial rule about one backup being off-site, depending on how you conducted your off-site backups, that may fulfil this rule too. In my best grandpa voice "Back in the old days, we had to backup all of our data to big tape drives and someone would drive them to the backup storage facility"... and there you would have fulfilled your "off-line backups" by using that particular off-site backup scheme.
Fast forward to today and high speed connections to almost anywhere in the world are available for not just big business, but small businesses and even individuals. Furthermore, these data connections are often times available as persistent connections via site-to-site VPNs, MPLS, SD-WAN, or some other site bridging technology. If you are trying to protect against a building burning to the ground, these persistent connections won't usually present any problem. However, with the threat landscape now including ransomware viruses... some of which are smart enough to seek out and destroy or disable your backups.. now those persistent connections become a real liability issue.

When Off-Line Isn't?

While in an ideal situation your backups would be truly "off-line"... aka "powered off"... the reality is that it just needs to be controlled in such a fashion that it is inaccessible from the other two backup systems. If for instance you send your data off to a cloud storage provider that you only have access to via API calls and only during the specified backup window, then for all intents those backups are "off-line" with respect to your production system even though the remote servers are still clearly up and running. That's not to say they are invulnerable though because there's the real possibility that the malicious attack could have some understanding of your backup system and initiate the control session with that remote station and corrupt your backups. In fact, even if you use a LTO system, it's possible that it could hijack the tape library system and corrupt your backups.
So, short of actually powering it down or pulling drives or tapes from a system and placing them on a shelf, how does one deal with this in a system that mandates as much automation as possible with as little human intervention as possible? Immutable Backups!

Immutable Backups

These are relatively new but help (can't stress "help" enough) keep up the automation of things while adding an extra layer of protection. With immutable backups, when data is written to the storage device a file lock is placed on it. The lock is similar to a time capsule that says "do not delete or modify until XYZ." Once that date is expired, the lock is lifted and the user or applications are free to do as they please (pruning or purging old backups). But, until that time passes, not even the highest privileged user on the system has rights to delete or modify the data. Ransomware will therefore be stopped cold in it's tracks from compromising your data right? WRONG! Well... mostly wrong. While it can't alter or delete the files, it remains a possibility for it to reach into the underlying subsystems of the host device and flat out destroy the drive partitions which contain the data. It's not much in the way of a ransom at that point... but still a noteworthy issue for consideration (and bear in mind that the disgruntled employee is just as much a threat as a virus).
The point being, plan appropriately to ensure the most likely of attack vectors in your environment are covered. Having an always on connection between all of your backups is the worst case scenario with the highest risk for something like a ransomware attack being able to destroy everything. If you can't achieve an ideal true "off-line" status with one of your data copies, there are a multitude of ways to achieve a pseudo-off-line status to enable automation while simultaneously mitigating some of the more prevalent threat vectors.

Zero Defects In Your Backups

There's an old saying that goes "trust but verify." This final rule is basically an embodiment of that statement. You go through great efforts to purchase necessary hardware, software, planning, etc. But, it's all for not if they day you need to use them you find out that they don't work. Depending on where you look you'll find statistics that will say all kinds of shocking numbers. The short version of them all is simply that an amazing number of people never bother to test if they can actually recover from their backups. Going hand in hand with that will be the crazy numbers of backups that simply fail to work when they finally are called to do the very service for which so much time, money, and effort was spent.
Sometimes these failures are human error such as only backing up the data partition of a server and forgetting to also backup the boot partition so that a server can be restored. Other times it can be as innocent as the underlying storage media undergoing "bit-rot" and rendering an otherwise good backup useless. Regardless of the failure mechanism, if your backups don't work in your moment of need, you've just purchased a ticket to be a part of another grueling statistic... the number of companies that close their doors in X years after a significant data loss event (again, statistics vary but are non the less dismal).
How you go about verification can take on several methods. One is to do a complete read of your data and do a CRC or hash check. Others will add on a partial or complete restore of your data. Some will go even further and not only restore the data, but conduct a series of battery tests in an isolated lab environment. As an example, Veeam Backup & Replication has a feature called SureBackup which can do this type of testing as an automated scheduled task. But, regardless of what you use, the underlying point is simply trust but verify!

Summary

While the 3-2-1 rule has served as the gold standard of backups in the IT community for many years, there are inherent shortcomings in the limited list of requirements it provides. Solving some of those problems is possible through the implantation of the 3-2-1-1-0 rule. But, bear in mind that although it is titled as a "rule" it is really more of a "guideline" and simply following it is not a get out of jail free card. In fact, no plan, no matter how elaborate or well executed, will ever be 100% perfect. Any backup plan will require a balance of the likely risks weighed against the cost and complexity. The 3-2-1-1-0 rule simply establishes a foundational set of questions you need to answer during your planning and ongoing implementation of any given solution.

Note: Post migrated from old Blogger website.