Ask HN: How do you deal with data backups in servers?

4 pointsposted 12 hours ago
by atomicnature

Item id: 44491029

3 Comments

codegeek

7 hours ago

Some rules for backups that you must follow:

1. Backups must be taken offsite on a separate server (obvious but surprisingly some people miss this)

2. Backups must be tested frequently. If you cannot test a backup, you don't have a backup.

3. Frequency depends on your criticality of data, your contract/SLA with your customer etc. Ideally, you should be able to have Point-in-time-Restore (PTR) going back to certain number of hours/days/weeks

4. Make sure to have notifications for backup failures. If a backup failed, you must be notified to correct it manually.

5. Bonus: Have a backup reconciliation script that runs additionally to recon all backups for a certain period.

Bender

11 hours ago

For me both professionally and personally having a manifest of all non-OS and non git repo committed data e.g. code artifacts that are restored by code deployment clearly defines what needs to be backed up. This must be tested routinely by restoring only what exists in the role based manifest along with the role based procedure and doing QA testing on the restored nodes. Procedures will vary by role but there must be a manifest that defines what directories contain live data. Each role must have its own clearly defined procedure for data restoration and the role must be defined in the manifest. So for example DBA's will be responsible for writing their role based procedure for primary and secondary databases. Ideally role based data should be neatly contained to a corporate specific directory structure meaning that every role could in theory be restored to a single node without overlapping ports for stand-alone QA testing on a developer laptop.

Personally I also like to have a local snapshot using rsnapshot of live/ephemeral data so that I can quickly get a node back in service assuming the backup volume only accessible by root has not been tainted or tampered with. OSSEC is one of the many tools that can checksum data and alert on tampering. AuditD with well written rules is also useful for real time monitoring. Anti-tampering is an entire topic by itself.

I like to keep these concepts outside of configuration management tools but design them so they can be easily pulled into said tools. This makes replacing a tool much easier. So if for example ones company desires switching from Chef to Ansible for whatever reasons the process is already a well known-known allowing a quick semi-automated migration.

penis123429

an hour ago

google for "3-2-1 backup rule"

should be easy