Data hoarding is a truly unique experience. Just my two cents
-
raid is not a backup. Don’t use raid5 unless you’re using a filesystem like zfs that checksums your data. Raid5 is vulnerable to scenarios with a “write hole” that leads to bit rot.
-
split up your dataset into smaller more manageable datasets so you can more easily back it up in different ways like external drives, cloud storage, etc. You can then limit the dataset size to never exceed the same of your backup target.
-
snapshots, use them. Snapshots in your filesystem can make your backups more manageable by only sending the differential data as opposed to something like Rsync which may need to rsync an entire file.
I use ZFS and have found that compression with ZSTD works pretty well for getting extra use out of your disks but unless you have a lot of RAM and some special metadata NVME disks, don’t use reduplication as it will be a serious performance impact.
Now if you aren’t using a FOSS system like truenas and instead you’re using a system like a qnap off the shelf, the qnap hybrid backup and sync manager has a really elegant solution for doing policy based differential backups to back blaze b2 storage. Not only does this give you a copy of your data, you also get immutable points in time archives of your data.
Good luck in your data hoarding endeavors!
Intellectual property is theft. Is there a WikiLeaks for medicine? WikiMeds perhaps?