Things about backing up your data

You need to back up your data.

tl;dr: There is no such thing as “a backup”. Do all the things.

To understand why and what the most appropriate way to back up your data is, you need to know what your threat model is – what you are are protecting against.

For the purposes of this discussion, we’ll assume that all data in contained in files. A file tree refers to all files and folders (subdirectories) underneath a root directory.

Some common scenarios:

  1. Your entire computer fails or is lost.
  2. A file is accidentally deleted.
  3. A file is corrupted through program crashes, power loss, or physical disk errors.
  4. You need to access your data on another device.
  5. Malware deletes, corrupts, or encrypts your files.

Some central tenets of backups:

  1. A backup is a copy of known good data that can be restored if needed.
  2. The best backup solutions involved not only multiple backups of each piece of data, but multiple kinds of backups. One backup is of course better than none, but is strongly considered by professionals to be more or less equivalent to having no backups..A good backup strategy involves at least two copies, preferably more, with at least one of them in a different physical location.
  3. Data, more or less, will be split into three large categories that will likely not need to be treated in the same way – content data (your files), configuration data (preferences, encryption keys, configuration files), and caches (temporary data that usually is possible to reconstruct but may be time consuming to do so, or is only important for a running process).
  4. Metadata about files may or may not be important (permissions, modification times, etc…). Generally for content data files they are not critical, but for some configuration files this can be important. This may differ if you’re backing up data for multiple users at once.

Kinds of Backups:

Files can be individually copied one at a time. Generally this is the weakest kind of backup and is reserved for when you know you’re making potentially dangerous edits and may need to restore a single file.

A file sync refers to making a single exact copy of a file tree. Syncs protect against the loss of an entire computer, subject to the last time they were synced. Syncs usually will not protect against file deletions or corruption, and they can be dangerous in that both deletions and corruption may propagate to the synced copy. Some file sync services may have a “trashcan” or “recently deleted files” list where you can restore files, usually within a certain time window (commonly somewhere between 7 and 60 days).

Incremental or versioned backups will keep a point in time of a file. This usually provides the best protection against local data loss.

RAID is not backup, but at higher levels it is handy to survive the death of a single drive with less hassle, and also for hot swapping.

Local vs. Offsite:

Your internet speed and the amount of data you have to back up will greatly affect the utility of online backups for you. This is one of the best use cases for fast upload speeds!

An external hard drive directly connected to the device you’re backing up will be the fastest and most useful connection, but also the most brittle – they can only be used for one device at a time, and share the same physical risks as your primary source. Local backups over the network are somewhat more convenient if you have more than one computer to back up, but are also at physical risk (flood, fire, theft, etc…). Local backups may still be vulnerable to malware.

Cloud backups are the slowest, but also distribute the most risk. They also offer a good level of protection against ransomware.

Scheduling and automation:

Backups can be performed manually (on-demand) or automated according to a schedule. Some kinds of manually-triggered backups can be acceptable if they’re part of a rigorous process (and even then I’d recommend that they be scripted but initiated manually), but for the most part, backups should be completely automated and not require any human intervention to happen. Anything that a human has to do, a human will forget to do.

Verification and Restores:

It can be tricky to know if your backups are working properly. A backup that is corrupted or otherwise unusable is worse than a good backup, because you think you’re protected but you’re not. It’s sometimes possible to run a full verification to ensure that a backup is an exact copy of the original, but this can be very time consuming. The only reall way to know for sure is to do a complete restore and check all of the files, but that may also be impossible to do in a practical way. In that case, spot checking of individual files can help give you some more confidence.

What I do:

Obviously your setup and needs may be different, but here’s what I do. I am entirely on Macs and iOS:

  1. iCloud Documents and iCloud Photo Library. These are sync services that provide some level of protection, but they’re mostly for the convenience of being able to access my data across all of my machines.
  2. Google Drive and Dropbox: These are sync services with some limited versioning. I use these sparingly for occasional sync and sharing purposes.
  3. A lot of other services have their own cloud sync these days, and I make some use of them, but don’t depend on them for backups.
  4. Time Machine on a local USB drive. I use this as my primary backup in case I need to rebuild the entire machine or recover a lot deleted file. It is a full automatic backup of the entire machine with very fine-grained version snapshots and good metadata support, and Apple’s setup process will read directly from it, making it the easiest way to do a full restore.
  5. Local Synology NAS with rsync. This is a sync service I use for making backup copies of specific media folders, automated with a script and run nightly. Synology has some other options for backups which I would probably use if I didn’t have everything else. It doesn’t have great support for file metadata, but that’s fine for what I use it for.
  6. Backblaze. I use this for offsite emergency backups. Restoration is a pain – individual critical files can be downloaded directly, but a full restore involves them shipping you a drive, which takes about two weeks. It doesn’t have great support for file metadata, but if I need to restore from this, things are bad anyway. This is reasonable insurance.

Make sure you keep your credentials/keys for encrypted offsite backup somewhere safe!

In general, the more backups you have and the less you have to think about them the better.