Skip to main content

Lessons Learned In Pain: Disk Defragmentation

The joke is that the definition of an expert is someone who's made all possible mistakes in a very narrow field of study. Well... Apparently I'm not an expert yet because I just had made a mistake this past week. Hopefully by sharing this, someone else can learn from my mistake to prevent them from going through the same pain I just did.

Thankfully, this mistake wasn't an earth shattering issue. More of an annoyance really than anything. Still... not something I would want to go through again. So, without further adieu... I share with you this "Lesson Learned In Pain: Disk Defragmentation"


As a Managed Service Provider (MSP), one of my services is to provide storage for off-site backups. And, as one would expect, I manage the server that these backups land on. One of the storage servers is winding down on its service life but is not yet completely empty. There are still a handful of backups (and most of those are for company internal use) that had either not aged off yet, or have not been moved to another storage repository. The volume that these backups are stored on uses the ReFS file system and takes advantage of Block Cloning to help save space. On this particular volume, it had 20TB of available space, of which only 3TB of it was still in use. However, if you go into Windows File Explorer to the root of the drive, select all and then right click and select properties... It would report back that there was nearly 6TB of data on the drive. And, that's not an error. The reason for the discrepancy is that the Block Cloning from ReFS gave me roughly a 2-to-1 space savings for that particular set of backups (for what it's worth, many of my ReFS backup repositories are typically closer to 5-to-1). Here's a perfect example of a 5:1 space savings with a screenshot of the two property windows side by side. On the left, it shows a 3TB volume (Drive G:) formatted with ReFS reporting that only 2.47TB of the drive is in use. On the right is the file level properties for the same volume (G:\), this time showing that 12.3TB of data is stored on the volume.


ReFS Block Clone Savings


So, how is all of this a problem? Well... In my wisdom, I decided I was going to defragment the volume. While the Windows Drive Optimizer (formerly Disk Defragmenter) tool "works" (and that's all I've been using on the volume up until now), I've found through experience on other systems that it's not actually all that good at defragmenting, and in particular compacting space. As I was planning on compacting this volume, which benefits from having as much contiguous free space as possible, I wanted to use something better than the stock Windows Defrag tool.

There are dozens if not hundreds of defragment tools on the market to choose from, but I decided to go with an old favorite of mine... O&O Defrag. I've been using that program on and off again for over two decades with great success. So, I figured why not use it in this case too? Well... before jumping into the deep end of the pool, I did what I felt was sufficient due diligence to verify that it would work. I initially did a generic Internet search that turned up nothing about block cloning and their software. I then searched the O&O website and their document repository without a single mention of ReFS much less block cloning. So, I sent out an email to O&O software asking if their defrag utility was acceptable for use with ReFS and in particular with block cloning. The email reply I got back from one of their senior technical support engineers was a wee bit confusing (ESL or Google Translate perhaps?... it's a German company.), but in general the email said to go ahead and use the "SPACE" defragmentation profile.... So... I did. (BTW, there are a bunch of profiles to pick from... defrag by name, accessed, modified, etc... each one with a different optimization algorithm).

Well... 45 hours later the drive finally finished its defragmentation. And... that's when I noticed what it did in the process. Apparently the O&O Defrag software knows how to deal with the ReFS file system in general terms... but NOT how to deal with Block Cloning within ReFS. During the defrag process and moving data around on the drive... it fully rehydrated all the backup files... essentially removing all the block cloning savings. So, now the data on that volume takes up nearly 6TB of space! 😢

As stated in the beginning... this isn't an earth shattering issue. More of an annoyance really than anything. In hindsight, I could have done a "trust but verify" trial run where I stood up a test environment and verified the results before throwing this into a production environment. At the time I didn't feel it was necessary because:

  1. I've used their product hundreds of times before on other systems without a single issue.
  2. O&O Defrag has been around for well over 2 decades, has won several awards, and was even certified by Microsoft for all of it's "current NTFS-based operating systems" (sooo... not for Windows 98 and earlier I guess???).
  3. I had an email confirmation from one of their senior technical support engineers giving me the thumbs up to go ahead with using the product on an ReFS volume with block cloning. 

In any case, I guess the ultimate moral of the story is "Trust but verify." If it's the first time you are going to attempt something, it may be worth your while to set up a test environment to try it out yourself before taking someone's word for it that it will actually work. In this case, I could have honestly lost all the data on this drive and not been any worse for the wear. It's just old backups after all. In the end, I still have all the data, just that I managed to loose all the block cloning benefits associated with them.

📌A Final Note: I am not writing this article to disparage or discredit O&O Software in any way. As I've stated, I've used their products hundreds of times in the past with much delight. I have also written to them about my experience with the ReFS block cloning issue and hopefully their development team is able to make adjustments to the software in the very near future. The main point of the article is to simply point out that you need to mind your P's and Q's when doing something new. This was a first for me using this software with ReFS. I fully intend to keep using the software, only now with a little bit more knowledge on what it can/can't do.

Comments

Popular posts from this blog

3-2-1-1-0 Rule for Backups - The New Gold Standard

  Revisiting An Old Friend The 3-2-1 rule for backups has been around for decades. It's a rather simple principle you can find documentation on almost anywhere backups are discussed. The rules are as follows: 3 copies of your data 2 media types 1 copy is off-site 3 Copies of Your Data The principal is simple, you have 3 copies of your data (at a minimum) consisting of the following: 1 working copy (your production data) 1 local backup copy for fast backup & restoration tasks 1 off-site copy for disaster recovery 2 Media Types This one is hotly debated to mean many things depending on who you talk to. On the extreme end, some will say that it means having something vastly different in storage like Hard Disk Drives (HDD) as one media type and Linear Tape-Open (LTO) as another. The thought being that if you have different "media types" that you would avoid a common mode of failure. Concerns of a bug or virus being able to exploit a weakness in a common operating system,

Shrinking a 20TB Virtual Disk Formatted With ReFS

Let us start with a bit of backstory before we dive into the "how to" portion shall we? In the early days of using Veeam, the original backup and replication server I started with was a bare metal/physical server. The primary repository had a capacity of 20TB of usable space and my clients were barely using any of it at the time. Fast forward a few years and I was approaching 16TB of that storage consumed and it was looking like there was no end in sight for growth, so I needed to find a replacement solution. While there were tons of options available, what I chose to go with (for better or worse) was a server which had over 5 times the storage capacity as the original. But, this time I intended to use it as a hypervisor to pull double duty and make storage management of backups a bit easier to deal with... or so I thought at the time...and I was more or less right in the end. Rather than stand up a new Veeam Backup & Replication server and start all over... and in doing