Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

4. Backing Up > 40. Automate Data Dumps for PostgreSQL Databases

Automate Data Dumps for PostgreSQL Databases

Building your own backup utility doesn’t have to be scary.

PostgreSQL is a robust, open source database server. Like most database servers, it provides utilities for creating backups. PostgreSQL’s primary tools for creating backup files are pg_dump and pg_dumpall. However, if you want to automate your database backup processes, these tools have a few limitations:

  • pg_dump dumps only one database at a time.

  • pg_dumpall dumps all of the databases into a single file.

  • pg_dump and pg_dumpall know nothing about multiple backups.

These aren’t criticisms of the backup tools—just an observation that customization will require a little scripting. Our resulting script will backup multiple systems, each to their own backup file.

Creating the Script

This script uses Python and its ability to execute other programs to implement the following backup algorithm:

  1. Change the working directory to a specified database backup directory.

  2. Rename all backup files ending in .gz so that they end in .gz.old. Existing files ending in .gz.old will be overwritten.

  3. Clean up and analyze all PostgreSQL databases using its vacuumdb command.

  4. Get a current list of databases from the PostgreSQL server.

  5. Dump each database, piping the results through gzip, into its own compressed file.

Why Python? My choice is one of personal preference; this task is achievable in just about any scripting language. However, Python is cross-platform and easy to learn, and its scripts are easy to read.

The Code

#!/usr/local/bin/python

# /usr/local/bin/pg2gz.py

# This script lists all PostgreSQL
# databases and pipes them separately
# through gzip into .gz files.

# INSTRUCTIONS
# 1.  Review and edit line 1 to reflect the location
#     of your python command file.
# 2.  Redefine the save_dir variable (on line 22) to
#     your backup directory.
# 3.  To automate the backup process fully, consider
#     scheduling the regular execution of this script
#     using cron.

import os, string

# Redefine this variable to your backup directory.
# Be sure to include the slash at the end.
save_dir = '/mnt/backup/databases/'

# Rename all *.gz backup files to *.gz.old.
curr_files = os.listdir(save_dir)
for n in curr_files:
        if n[len(n)-2:] =  = 'gz':
                os.popen('mv ' + save_dir + n + " " + save_dir + n + '.old')
        else:
                pass

# Vacuum all databases
os.popen('vacuumdb -a -f -z')

# 'psql -l' produces a list of PostgreSQL databases.
get_list = os.popen('psql -l').readlines( )

# Exclude header and footer lines.
db_list = get_list[3:-2]

# Extract database names from first element of each row.
for n in db_list:
        n_row = string.split(n)
        n_db = n_row[0]

        # Pipe database dump through gzip
        # into .gz files for all databases
        # except template*.
        if n_db =  = 'template0':
                pass
        elif n_db =  = 'template1':
                pass
        else:
                os.popen('pg_dump ' + n_db + ' | gzip -c > ' + save_dir + 
                          n_db + '.gz')

Running the Hack

The script assumes that you have a working installation of PostgreSQL. You’ll also need to install Python, which is available through the ports collection or as a binary package. The Python modules used are installed by default.

Double-check the location of your Python executable using:

% which python
/usr/local/bin/python

and ensure the first line of the script reflects your location. Don’t forget to make the script executable using chmod +x.

On line 22 of the script, redefine the sav_dir variable to reflect the location of your backup directory. As is, the script assumes a backup directory of /mnt/backup/databases/.

You’ll probably want to add the script to the pgsql user’s crontab for periodic execution. To schedule the script for execution, log in as pgsql or, as the superuser, su to pgsql. Once you’re acting as pgsql, execute:

% crontab -e

to open the crontab file in the default editor.

Given the following crontab file, /usr/local/bin/pg2gz.py will execute at 4 AM every Sunday.

# more /var/cron/tabs/pgsql
SHELL=/bin/sh
PATH=/var/cron/tabs:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

#minute    hour    mday    month    wday     command
0          4       *       *        0        /usr/local/bin/pg2gz.py

See Also

  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint