The 5 Principles of a Solid Backup System
Backups are one of those things you don't think about until disaster strikes. Whether it's a hardware failure, theft, ransomware, or fire, having a reliable backup system which you trust can save you from losing valuable data. I've recently been involved in making some serious changes to our backups. The previous system worked but there were too many ways in which it could be better. I believe there are 5 core principles to proper backups:
- Simplicity
- Rotation/Pruning
- Monitoring/Visibility
- Testing
- Redundancy
Simplicity
Any semi-tech-savvy person should be able to figure out where the data is and create a copy of it.
We were using Bup to store our backups, it's a clever bit of software inspired by Git that uses deduplication and compression to save space. This was part of the problem, it was totally opaque, you needed special software to even list your files, it took a long time to run and while it's open-source it was far from simple.
The new approach was simply just to keep copies of files on disk, the only trick is to use hardlinks to save space. Hardlinks are feature of a filesystem which allow multiple file names to point to the same data on disk, saving space while keeping files accessible like regular files, a bit like having two entries in an index which point to the same page.
Hardlinks have been supported on Linux and Windows since the 1990s, they are a fundamental feature of filesystems and aren't going to stop being supported, hard to say the same for any other backup solution out there. They also don't require the user to have any special knowledge the files can be copied/moved etc just like regular files.
Rotation/Pruning
Storing more than you need means longer backup/restore times and, while disks are cheap, they have a finite capacity. Storing PII for years? That is bad practice at best and a liability at worst.
Our backups were rarely rotated. Bup did support it (`bup prune-older`) but I didn't trust it to get it right and it was hard to verify after the fact, it was a blackbox.
We were also unable to prune a subset of files from all historic backups. For example we keep backups of client data, databases, assets, code, etc and when we stop working with them we have no reason to keep the data, it should be deleted, but it was stuck in the backups.
Careful consideration of the structure of your backups can make pruning a lot easier. In my experience grouping your backups by ownership, for example per client, makes the most sense, rather than by database or serevice. It allows flexibility and makes discovering what data is stored much easier. It also makes it easier to do things like have different rotation policy per client or to be able to delete a single one.
Monitoring/Visibility
Backups are one of those silent background tasks.. Are they running? Did they backup 0 bytes? You need to know.
Things change, software updates break things, disks fill up, there are a million reasons why your backups could silently stop working correctly. Backups need constant and automated monitoring.
Every time one of our backup processes finishes it reports at least 4 metrics to Prometheus:
- If the backup succeeded
- How long it took to processes
- The size of the backup
- The timestamp
Alerts are then sent if the backup: failed to run, hasn't run in the last 24 hours or if the size changed by more than ±10% from the average. The last one catches issues when the backup process succeeds but only copies a subset of the expected data.
Testing
Until you've tested your backups restore correctly you don't have any backups at all.
There are horror stories of organisations losing large amounts of data after trying to restore from backups after an incident only to find out that their backups which have been 'completing' are corrupt. How you go about doing this is entirely dependant on what you're backing up, but remember backups are your safety net, they only work if they're set up correctly and tested. It's time well spent, future-you will thank now-you.
Redundancy
There is a well known rule called the 321 of backups: 3 copies of your data, on 2 different media and 1 offsite.
How well you need to protect your data depends on a combination of how bad it would be if you lost the data and how likely it is to happen. A cold offsite copy is a hard requirement in my opinion, cloud solutions are technically offsite but it only takes one leaked password or phishing email and it's gone. A physical harddisk in a separate secure location is ideal.
Technical Notes
Some useful notes to bear in mind if you are creating your own backup solution
- Be nice - Using tools like `ionice` and `nice` are an easy way to prevent your backup processes from interfering with other workloads
- Rsync is your friend - It's battle tested, well known and can do partial transfers. It also natively supports creating hardlinked differential copies via the `--link-dest` option
- Make files immutable - If you're using an ext filesystem you can mark files as immutable with `chattr -i` which helps further protect backups - it adds an extra step to rotation/pruning but it's worth it to prevent accidents.