Databases

20 min read

Postgres backups that actually restore

Most teams have backups. Far fewer have backups they have ever restored. This playbook is the small set of habits that means a 3am incident does not become a six-figure incident: daily pg_dump, point-in-time recovery via WAL, off-site to R2, and a monthly restore drill that takes 20 minutes.

Why this matters

For a solo founder, a lost database is the company. Six months of customer records, paid invoices, support history, and product state, gone in the time it takes to type the wrong DROP. For an enterprise the number is bigger but the shape is the same: regulatory fines, a board meeting, a 9KB press release that uses the word "incident" four times. The numbers differ by three zeros; the pit in your stomach is identical.

I once watched a team recover from a corrupted volume by emailing a six month old SQL dump from the founder's laptop, because nobody had checked the nightly job in eleven weeks. They got most of it back. Most. The customers they lost in the gap never came back. The fix is not heroics at 3am, it is fifteen minutes of cron and one calendar invite a month. That is the entire playbook.

A backup you have never restored is not a backup. It is a hope, written to disk.

Three layers of backup

One layer is not a strategy. Three layers cover three completely different failure modes, and they cost almost nothing to run together.

Layer one, the logical dump. A daily pg_dump in custom format. This is your "give me the orders table from yesterday" tool. Human readable schema, portable across Postgres versions, easy to restore a single table from. Cheap, slow on huge databases, but if your database fits in a few gigabytes it is the simplest insurance you can buy.

Layer two, the physical WAL archive. Continuous archiving of write-ahead log segments to object storage. This is your "give me the state at 14:32:01, just before the bad migration ran" tool. Point-in-time recovery (PITR) is the difference between losing 24 hours of data and losing 30 seconds.

Layer three, off-site. Same files, different building. Cloudflare R2 in a different region from your primary, with a lifecycle rule that ages out old segments after 90 days. If the datacentre burns, this is what you restore from. If you skip this layer, the other two are still tied to the fate of one provider.

Daily pg_dump via cron

The dump runs at 03:15 on the database host, owned by a dedicated postgres-backup user with read-only access. Schema goes into its own file so I can grep it without unpacking a multi-gigabyte archive. Data goes into --format=custom, which is compressed and supports parallel restore.

bash
#!/usr/bin/env bash
# /usr/local/bin/pg-daily-backup.sh
set -euo pipefail

BACKUP_ROOT=/var/backups/postgres
STAMP=$(date -u +%Y-%m-%dT%H-%M-%SZ)
DEST="${BACKUP_ROOT}/${STAMP}"
DB=app_production
PGUSER=postgres-backup

mkdir -p "${DEST}"

# Schema only, plain SQL so it diffs cleanly
pg_dump --host=/var/run/postgresql --username="${PGUSER}" \
  --schema-only --no-owner --no-privileges \
  --file="${DEST}/schema.sql" "${DB}"

# Data + schema, custom format, parallel friendly
pg_dump --host=/var/run/postgresql --username="${PGUSER}" \
  --format=custom --compress=9 --no-owner --no-privileges \
  --jobs=1 \
  --file="${DEST}/full.dump" "${DB}"

# Checksum so we can detect bitrot later
sha256sum "${DEST}/full.dump" > "${DEST}/full.dump.sha256"
sha256sum "${DEST}/schema.sql" > "${DEST}/schema.sql.sha256"

# Hand off to the off-site step
/usr/local/bin/pg-offsite-upload.sh "${DEST}"

# Retention: keep 14 days locally
find "${BACKUP_ROOT}" -mindepth 1 -maxdepth 1 -type d -mtime +14 -exec rm -rf {} \;

The cron entry. One line, runs as root, logs to syslog so you actually see failures.

bash
# /etc/cron.d/postgres-backup
15 3 * * * root /usr/local/bin/pg-daily-backup.sh 2>&1 | logger -t pg-backup

Continuous WAL archiving

pg_dump is a snapshot. WAL archiving is the film between snapshots. Every time Postgres finishes a 16 MB write-ahead log segment, it hands the file to an archive command that ships it off the host. Restore the last full base backup, replay the WAL up to a moment in time, you are back to that exact second.

I use wal-g because it is a single Go binary with first-class S3-compatible support, which means R2 works natively. pgBackRest is the heavier, more featureful alternative; pick it if you need delta backups or multi-host orchestration.

ini
# /etc/postgresql/16/main/postgresql.conf
wal_level = replica
archive_mode = on
archive_command = '/usr/local/bin/wal-g wal-push %p'
archive_timeout = 60          # force a segment switch at least every minute
max_wal_senders = 3

wal-g reads its config from environment variables. Drop them in a file owned by postgres, mode 600, and load it from the systemd unit.

bash
# /etc/postgresql/wal-g.env, mode 600, owned by postgres
WALG_S3_PREFIX=s3://sarmalinux-pg-wal/app_production
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_ENDPOINT=https://<account-id>.r2.cloudflarestorage.com
AWS_REGION=auto
AWS_S3_FORCE_PATH_STYLE=true
WALG_COMPRESSION_METHOD=brotli

Take a weekly base backup so the WAL chain stays short. A four-hour-long PITR is rarely what you want at 3am.

bash
# /etc/cron.d/wal-g-basebackup
30 2 * * 0 postgres /usr/local/bin/wal-g backup-push /var/lib/postgresql/16/main 2>&1 | logger -t wal-g

Off-site to Cloudflare R2

The WAL stream already lives in R2 thanks to wal-g. The daily logical dump also needs to leave the host. Same R2 endpoint, different bucket, configured via the AWS CLI because R2 is S3 compatible at the wire level.

bash
#!/usr/bin/env bash
# /usr/local/bin/pg-offsite-upload.sh
set -euo pipefail

SRC="$1"
STAMP=$(basename "${SRC}")
BUCKET=sarmalinux-pg-dumps
ENDPOINT=https://<account-id>.r2.cloudflarestorage.com

export AWS_ACCESS_KEY_ID=$(cat /etc/postgresql/r2-access-key)
export AWS_SECRET_ACCESS_KEY=$(cat /etc/postgresql/r2-secret-key)

aws s3 cp --endpoint-url "${ENDPOINT}" --recursive \
  "${SRC}" "s3://${BUCKET}/${STAMP}/"

# Notify on failure (the set -e above means we only get here on success)
curl -fsS -X POST "https://api.telegram.org/bot${TG_TOKEN}/sendMessage" \
  -d "chat_id=${TG_CHAT}" \
  -d "text=pg backup ${STAMP} uploaded to R2" >/dev/null || true

The lifecycle rule lives in the R2 dashboard or in a one shot CLI call. Anything older than 90 days is deleted; anything older than 30 days transitions to infrequent access if you ever care about pennies.

json
{
  "Rules": [
    {
      "ID": "expire-old-dumps",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "Expiration": { "Days": 90 }
    }
  ]
}

Verification on every backup

A successful upload is not a successful backup. Three checks run inside the same cron window: the SHA256 you wrote earlier is re-computed after upload, the dump is opened with pg_restore --list to prove the file parses, and the schema is loaded into a throwaway Docker Postgres to prove it is syntactically valid. Any failure pings Telegram.

bash
#!/usr/bin/env bash
# /usr/local/bin/pg-verify-backup.sh
set -euo pipefail

SRC="$1"
DUMP="${SRC}/full.dump"
SCHEMA="${SRC}/schema.sql"

fail() {
  curl -fsS -X POST "https://api.telegram.org/bot${TG_TOKEN}/sendMessage" \
    -d "chat_id=${TG_CHAT}" -d "text=BACKUP VERIFY FAILED: $1" >/dev/null
  exit 1
}

# 1. Checksum still matches
sha256sum --check "${DUMP}.sha256" >/dev/null || fail "checksum mismatch on ${DUMP}"

# 2. pg_restore can read the table of contents
pg_restore --list "${DUMP}" >/dev/null || fail "pg_restore --list failed on ${DUMP}"

# 3. Schema loads into a throwaway container
CID=$(docker run -d --rm -e POSTGRES_PASSWORD=x postgres:16-alpine)
sleep 4
docker cp "${SCHEMA}" "${CID}:/schema.sql"
docker exec -u postgres "${CID}" psql -d postgres -f /schema.sql >/dev/null \
  || { docker kill "${CID}" >/dev/null; fail "schema replay failed"; }
docker kill "${CID}" >/dev/null

echo "verified ${SRC}"

The monthly restore drill

This is the habit that separates teams who have backups from teams who have a recovery strategy. First Sunday of the month, 20 minute calendar invite, non-negotiable. Spin up a fresh container, restore yesterday's dump, run a smoke query, compare row counts against production. If anything is off, you have a month to fix it before you need it.

bash
#!/usr/bin/env bash
# /usr/local/bin/pg-restore-drill.sh
# Run on the first Sunday of every month, manually, with a stopwatch.
set -euo pipefail

LATEST=$(ls -1dt /var/backups/postgres/*/ | head -n1)
DUMP="${LATEST}/full.dump"

echo "== Restore drill, source: ${DUMP} =="
START=$(date +%s)

# 1. Fresh empty Postgres
CID=$(docker run -d --rm -p 55432:5432 -e POSTGRES_PASSWORD=x postgres:16-alpine)
sleep 5

# 2. Restore in parallel
docker cp "${DUMP}" "${CID}:/full.dump"
docker exec -u postgres "${CID}" \
  pg_restore --dbname=postgres --create --jobs=4 /full.dump

# 3. Smoke query: row counts on the tables that matter
docker exec -u postgres "${CID}" psql -d app_production -c "
  SELECT 'orders' AS t, count(*) FROM orders
  UNION ALL SELECT 'users', count(*) FROM users
  UNION ALL SELECT 'invoices', count(*) FROM invoices;
"

# 4. Compare with prod (you keep yesterday's snapshot in a file)
diff -u /var/backups/postgres/last-counts.txt <(
  docker exec -u postgres "${CID}" psql -d app_production -At -c "
    SELECT count(*) FROM orders;
    SELECT count(*) FROM users;
    SELECT count(*) FROM invoices;
  "
) || echo "WARNING: row counts drifted, investigate"

END=$(date +%s)
echo "== Drill complete in $((END - START))s =="
docker kill "${CID}" >/dev/null

Write the elapsed time into a log. If it ever creeps past 20 minutes, your dataset has outgrown plain pg_dump and it is time to lean harder on WAL plus base backups.

RTO and RPO, honestly

RTO, recovery time objective: how long are you allowed to be down. RPO, recovery point objective: how much data you are willing to lose. Big shops write whole documents about these. For a solo project the honest answer is usually "RTO one hour, RPO one day of WAL". That is what this stack gives you out of the box: a base backup restores in minutes, the WAL replay catches up to within the last archived segment (which, thanks to archive_timeout = 60, is at most a minute old).

If you need tighter RPO than that, you are in streaming replication territory and the playbook gets longer. For most apps under a million users, "one hour, one day" is the right balance between cost and paranoia. Write it down. Put it in the runbook. Now the question at 3am is not "what do we do", it is "are we within budget".

Pitfalls

pg_dump --data-only that loses sequences

A data-only dump does not include sequence values. Restore it and your next insert hits a duplicate key. Always dump schema and data together, or explicitly include `--include-sequences` semantics by dumping both.

Restoring to the wrong Postgres version

A custom-format dump from Postgres 16 will refuse to restore on 14, and may restore on 15 with subtle differences. Pin your verification container to the exact version of production, and upgrade them together.

No compression on the wire

Uploading a multi-gigabyte uncompressed SQL file to R2 every night will eat your bandwidth and your egress budget. Use `--format=custom --compress=9`; the difference is often 10x.

Single point of failure on the backup host

If the backup script runs on the database server, and the database server is the thing that died, your verification is now blocked too. Run the verify and drill scripts from a separate machine that pulls from R2.

Never testing the restore

The most common failure mode by a mile. The drill is the entire point of this playbook. A backup you have never restored is exactly as useful as no backup at all, with the added downside of false confidence.

Wrap up

Three layers, four scripts, one calendar invite. None of it is clever. All of it is the difference between a quiet Tuesday and the worst week of your year. Set it up once, watch the Telegram pings arrive every morning, run the drill on the first Sunday, and forget about it. When something does go wrong (and at some point something always does), the recovery is a documented twenty minutes instead of a documented apology.

Want this done for you?

If you would rather skip the YAK shave and have someone who has done this fifty times set it up properly, that is what I do for a living.

Start a project