Tuesday, April 23, 2013

PostgreSQL Backup Script III (encrypted with shared password)

Backing up every database on every version and each running cluster on a host in a single script. And also encrypting the backupfiles on-the-fly and generating a script for retrieving the encrypted backupfiles from the backup-host... :)
In my previous blogentry I didn't explain any of the prerequisites for using the script. So I'll try to cover these below.
I'm having a postgresql database running in a remote location (backend for some webapps/websites) - and I wanted to expand my backup-script, so that I could download my (encrypted) backupfiles from this location over plain http.
So I have a linuxbox (httpserver) that I use as my backup-host. On this box I create a backupuser:
root@httpserver:~#useradd -m <backupusername>
So now I have a user that is locked (no password) with a home in /home/<backupusername>. I want to only be able to logon with a key, so this is ok. But it's probably easier to give the backupuser a temporary password, so that i can use ssh-copy-id to transfer the key. So lets do that:
root@httpserver:~#passwd <backupusername>

Then on my postgresql linuxbox (logged in as postgres), I generate a ssh-key pair (and do this with no password!):
postgres@postgresqlserver:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
Created directory '/var/lib/postgresql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.
The key fingerprint is:.....
And then transfer this key to the backuphost:
postgres@postgresqlserver:~$ ssh-copy-id <backupusername>@httpserver
<backupusername>@httpserver's password:
Now try logging into the machine, with "ssh '<backupusername>@httpserver'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.
And if I try to log on:
postgres@postgresqlserver:~$ ssh <backupusername>@httpserver
I shouldn't be prompted for a password.
Back on the backuphost - I now can remove the backupusers password, so that it's not possible to connect without the ssh-key. One way to do this is to lock the password.
root@httpserver:~#passwd -l <backupusername>
What happens is that a '!' is placed in front of the password-hash in /etc/shadow - and so it is possible to unlock the password at a later time. (So to keep it clean, just open /etc/shadow and remove the password-hash and leave just the '!').
Create a directory ($BACKUP_HOST_PATH) for the backups with the appropriate (only) write-access for the backupuser and read-access for the http-server.
On the postgresql-server (logged in as postgres) I create a cronscript directory e.g. in ~/cron-scripts/daily/ and save the script as backup_all_postgresql_databases. Make it executable:
postgres@postgresqlserver:~$chmod u+x backup_all_postgresql_databases
Create a file crontab.entries (run the script every day at midnight):
# m h dom mon dow command
0 0 * * * /var/lib/postgresql/cron-scripts/daily/backup_all_postgresql_databases
Load the crontab.entries file.
postgres@postgresqlserver:~#crontab crontab.entries

Since the number of databases, clusters and versions can vary, it comes in handy to just create a list of the backup files that are to be downloaded from the backuphost (httpserver).
And with those words I leave you with the script. Enjoy!
#!/bin/bash
# PostgreSQL backup script (encrypt with shared key)
# Run as postgres user

BACKUP_HOST=replace-with-your-backup-host
BACKUP_HOST_USER=replace-with-your-backup-host-user
BACKUP_HOST_PATH=replace-with-your-backup-host-/path/to/backup/of/postgresql
PASSWORD_FILE=replace-with-your-password-file # make sure that ONLY postgres user has read access to file (chmod 400)
CIPHER=aes-256-cbc # Or choose another - See: man enc
DOWNLOAD_LIST_FILE=replace-with-your-download-list-file
#Using a temporary file to collect list of backupfiles to be downloaded - Don't know if there is a way to use a variable when using pipes in loops (i.e transfering variables to subshells)
DOWNLOAD_LIST_TMP_FILE=/tmp/pg-download-list.tmp
DOWNLOAD_FROM=http://$BACKUP_HOST$BACKUP_HOST_PATH #The download url could be this, but probably isn't, so change it appropriately

# For all running PostgreSQL clusters
awk -- '{ print $1"/"$2 }' <(pg_lsclusters --no-header | grep online) | while read cluster;
do

# Create cluster-path on backuphost
ssh -q $BACKUP_HOST_USER@$BACKUP_HOST "mkdir -p $BACKUP_HOST_PATH/$cluster";

# Global stuff from cluster: roles etc.
pg_dumpall
--cluster $cluster --globals-only | openssl $CIPHER -e -salt -pass file:$PASSWORD_FILE | ssh -q $BACKUP_HOST_USER@$BACKUP_HOST "dd of=$BACKUP_HOST_PATH/$cluster/globals.sql.$CIPHER" > /dev/null 2>&1;
echo "wget $DOWNLOAD_FROM/$cluster/globals.sql.$CIPHER" >> $DOWNLOAD_LIST_TMP_FILE;

# And then each database (except templates)
psql --cluster $cluster --no-align --tuples-only --command="SELECT datname from pg_database WHERE NOT datistemplate" | while read databasename;
do pg_dump --cluster $cluster --format=c -- $databasename | openssl $CIPHER -e -salt -pass file:$PASSWORD_FILE | ssh -q $BACKUP_HOST_USER@$BACKUP_HOST "dd of=$BACKUP_HOST_PATH/$cluster/$databasename.sqlc.$CIPHER" > /dev/null 2>&1;
echo "wget $DOWNLOAD_FROM/$cluster/$databasename.sqlc.$CIPHER" >> $DOWNLOAD_LIST_TMP_FILE; done;

done;

cat $DOWNLOAD_LIST_TMP_FILE | openssl $CIPHER -e -salt -pass file:$PASSWORD_FILE | ssh -q $BACKUP_HOST_USER@$BACKUP_HOST "dd of=$BACKUP_HOST_PATH/$DOWNLOAD_LIST_FILE" > /dev/null 2>&1;

rm $DOWNLOAD_LIST_TMP_FILE;