Andy's Debian BackupPC HOWTO

Contents

  1. Introduction
  2. Installation
  3. Configuration
  4. Off Site Backup
    1. Using TAR Archives
    2. Using DD to Clone the Partition to an External Disk
    3. Using RAID1 with an External Disk
    4. Removable Media
  5. Direct Restore Procedure
  6. TAR Restore Procedure
  7. Frequently Asked Questions

Introduction

This HOWTO discusses the setup of a automatic backup system using Debian Sarge and BackupPC.

BackupPC runs on a server, and can perform regular automatic backups of Windows, Linux and probably also MacOS-X systems. It can backup local clients across your network, or remote clients across WAN connections using SSH.

BackupPC makes highly efficient use of your disk space. It uses compression and a clever file pooling system to minimise the storage requirements associated with storing backups of multiple clients, and retaining several historical backups.

BackupPC can make highly efficient use of your network by using RSync (optionally over SSH) to transfer only the differences in files across the network. It can also use SMB (Windows standard file sharing protocol) and TAR.

BackupPC includes an easy to use web-based interface. Users can access their own backups through the interface, and can restore files directly to the client, or download their backups or individual files as ZIP or TAR archives.

The schedule by which backups are made is highly customisable, but by default full backups are made weekly and incrementals daily. BackupPC will email you to let you know if a backup failed to take place for any reason.

Installation

Install BackupPC:

# aptitude install backuppc rsync libfile-rsync-perl libfile-rsyncp-perl par2 bzip2

You must change the password for the backuppc apache user:

# htpasswd /etc/backuppc/htpasswd backuppc

Configuration

BackupPC's main configuration file is found at /etc/backuppc/config.pl. You should consider this file to contain BackupPC's default settings, since we will create additional configuration files to over-ride settings on a per-client basis.

Adding a New Client

Create a client configuration file with the hostname of the client you want to backup:

# ee /etc/backuppc/examplehostname

You can over-ride almost any default setting on a per host basis within the host's configuration file, but a typical configuration would be as follows:

# Minimum period in days between full and incremental backups:
$Conf{FullPeriod} = 6.97;
$Conf{IncrPeriod} = 0.97;

# Number of full and incremental backups to keep:
$Conf{FullKeepCnt} = 2;
$Conf{IncrKeepCnt} = 6;
# Note that additional fulls will be kept for as long as is necessary
# to support remaining incrementals.

# What transport to use backup the client [smb|rsync|rsyncd|tar|archive]:
$Conf{XferMethod} = 'rsync';

# The file system path or the name of the rsyncd module to backup when
# using rsync/rsyncd:
$Conf{RsyncShareName} = '/';

# If this is defined only these files/paths will be included in the backup:
$Conf{BackupFilesOnly} = undef;

# These files/paths will be excluded from the backup:
$Conf{BackupFilesExclude} = ['/proc', '/dev', '/cdrom', '/media', '/floppy', '/mnt', '/var/lib/backuppc', '/lost+found'];

# Level of verbosity in Xfer log files:
$Conf{XferLogLevel} = 1;

# Commands to run for client backups:
# Note the use of SSH's -C attribute. This enables compression in SSH.
$Conf{RsyncClientCmd} = '$sshPath -C -x -l root -o PreferredAuthentications=publickey $host $rsyncPath $argList+';

# Commands to run for client direct restores:
# Note the use of SSH's -C attribute. This enables compression in SSH.
$Conf{RsyncClientRestoreCmd} = '$sshPath -C -q -x -l root $host $rsyncPath $argList+';

# Compression level to use on files.  0 means no compression. See notes
# in main config file before changing after backups have already been done.
$Conf{CompressLevel} = 3;

Now add the newly configured archive host in to BackupPC's hosts file:

# ee /etc/backuppc/hosts
#
# The first non-comment non-empty line gives the field names and should
# not be edited!!
#
host        dhcp    user    moreUsers     # <--- do not edit this line
#farside    0       craig   jill,jeff     # <--- example static IP host entry
#larson     1       bill                  # <--- example DHCP host entry
examplehostname 0		backuppc

And finally restart BackupPC so your changes take effect:

/etc/init.d/backuppc restart

Backups should now occur for this client, and the client should also now appear within the web-based interface. You can initiate manual backups within the interface.

Off Site Backup

A number of different ways to go about off site backup when using BackupPC are discussed below.

TAR Archives using an 'archive' host

The simplest way to handle off-site backup is to use BackupPC's archive host feature to produce tar archives, and to then copy these archives to removable media.

BackupPC is configured to create tar archives by creating a new host with it's XferMethod set to archive. The name of the host should be appropriate to the type of archives the archive host will be configured to create. For example dvdarchive or tapearchive.

Create a new host configuration file as shown below:

# ee /etc/backuppc/dvdarchive
# Set this client's XferMethod to archive to make it an archive host:
$Conf{XferMethod} = 'archive';

# The path on the local file system where archives will be written:
$Conf{ArchiveDest} = '/mnt/archives';

# the type and level of compression used on the archive:
$Conf{ArchiveComp} = 'gzip';
$Conf{CompressLevel} = 1;

# The amount of parity data to create for the archive using the par2 utility.
# In some cases, corrupted archives can be recovered from parity data.
$Conf{ArchivePar} = 0;

# A size in megabytes to split the archive in to parts at.
# This is useful where the file size of the archive might exceed the
# capacity of the removable media. For example specify 700 if you are using CDs.
$Conf{ArchiveSplit} = 650;

# The full command to run to create archives:
$Conf{ArchiveClientCmd} = '$Installdir/bin/BackupPC_archiveHost'
        . ' $tarCreatePath $splitpath $parpath $host $backupnumber'
        . ' $compression $compext $splitsize $archiveloc $parfile *';
$Conf{ArchivePreUserCmd}  = undef;
$Conf{ArchivePostUserCmd} = undef;

# Logging verbosity:
$Conf{XferLogLevel} = 1;

Now add the newly configured archive host in to BackupPC's hosts file:

# ee /etc/backuppc/hosts
#
# The first non-comment non-empty line gives the field names and should
# not be edited!!
#
host        dhcp    user    moreUsers     # <--- do not edit this line
#farside    0       craig   jill,jeff     # <--- example static IP host entry
#larson     1       bill                  # <--- example DHCP host entry
archive     0       backuppc

And finally restart BackupPC so your changes take effect:

/etc/init.d/backuppc restart

The archive host will now appear within the web-based interface along with all your other hosts, except that instead of allowing you to start backups this host's interface will allow you to create archives of your other hosts.

Archives you create in this way will contain the most recent backup of a host, not a full set of historical backups. Its a good idea to cycle your removable media in such a way that you maintain a certain number of historical backups, and so that you are never in the situation where you are over-writing your most current backup with the new one.

Once the archive process has completed the tar archive can then be copied to removal media of one of the types suggested below, and then taken off site.

Archives can be created automatically using a cron job.

Using DD to Clone a Partition

It's not practical to simply copy BackupPC's pool from one partition to another using cp because following all the hard-links would cause it would take too long. It's far more efficient to clone the parition sector by sector using dd.

Simply unmount the drive holding BackupPC's pool, dd the partition to a partition on exnternal disk and then remount the BackupPC's pool partition.

If you have another machine capable of accepting the external disk then it would be a good idea to connect the disk and test it works.

Using RAID1 with an External Disk

A RAID 1 one array is a mirrored array; each disk has a full copy of the data.

Built a RAID 1 array with 3 member disks (yes, you can have 3 members to a RAID 1 array) where half the array is internal to the machine and the other half is on external storage.

Then break the mirror and move the external disk off-site, replacing it with a new disk, and then they initiate a rebuild of the array.

Add another device to your RAID1 with:

# mdadm --grow --raid-disk=3 /dev/mdX

Then you can:

# mdadm --add /dev/mdX /dev/sdXX

(partition, not the whole disk...)

And watch 'cat /proc/mdstat' to see when its done with the sync, then:

# mdadm --fail /dev/mdX /dev/sdXX
# mdadm --remove /dev/mdX /dev/sdXX

And disconnect. RAID1's are happy as long as you have at least one working device.

Removable Media

Note that SATA hot-plug is not yet supported in libata. There is however coldplug support. This is done by unmounting, removing the sata module from the kernel, removing/adding disks and modprobing the module again. When the new disk is recognized you can then mount the volume again.

Note that the actual capacity of a tape may not be the same as the tape's advertised capacity. Many tape drives include a compression function at the hardware level and the advertised capacity of it's tapes may account for this. If your tar archive has already been put through GZIP then it won't compress any further.

Note that I have not included external hard disks with a USB1 interface, only those with USB2 or Firewire interfaces. The original USB specification supports relatively slow transfer speeds, making it unsuitable for the transfer of very large files.



Direct Restore Procedure

The following procedure is suitable for performing a full restore directly from BackupPC's web-interface.

Base Debian install

Begin with a base Debian installation. Either install in the normal manner; from disk or across the network, or if you use a serial console then re-image.

Prepare the Target Host

Login to SSH on the target host:

# ssh target.example.com

Install your faviourite editor:

# aptitude install ee

Next backup the base-install configuration. You might need to re-instate a couple of these files once you have restored your file system from backup.

# cd /
# tar zcvf /root/etc.tar.gz /etc

Configure the hosts file, adding the DNS name for your backup box as a minimum:

# ee /etc/hosts

Enable public key authentication in SSH:

# ee /etc/ssh/sshd_config

To do so, simply uncomment the lines below:

PubkeyAuthentication yes
AuthorizedKeysFile     %h/.ssh/authorized_keys

And then create the authorized keys file, just as it was on the backed-up host:

# ee /root/.ssh/authorized_keys
# chmod 600 /root/.ssh/authorized_keys

Restart SSH:

# /etc/init.d/ssh restart

Install RSync:

# aptitude install rsync libfile-rsync-perl

Prepare the Backup Client

Remove any existing host key entries identifying the target host, since the host key will change:

# ee /root/.ssh/known_hosts
# ee /var/lib/backuppc/.ssh/known_hosts

Test that you can ssh to target host as root:

# sudo -u backuppc ssh -o preferredAuthentications=publickey root@test

Restore the File System

Restore the following parts of the file system from the BackupPC web-interface to the target host:

/bin
/etc
/home
/lib
/opt
/sbin
/tmp
/usr
/var

If the kernel from the backup is compatible with the hardware you are restoring to then you might also want to restore it. But otherwise you can simply omit the parts of the file system shown below and stick with the kernel installed during Debian installation.

/boot
/sys
/initrc

Note that we havn't restored:

/root

Its up to you to do this after this process is complete, if necessary.

Pre-Reboot Configuration Changes

It may be necessary to make some configuration changes to the freshly restored host before you reboot.

First we need to extract the backup we made of the base-configuration:

# cd /root
# tar zxvf etc.tar

The partition and file system schema of your restore target host might not match that of the backed-up host, so re-instate the file created during Debian installtion:

# cp /root/etc/fstab /etc/fstab

Similarly, the network configuration of the backed-up host might not be suitable for the restore target host, so re-instate the network configuration created during Debian installtion:

# cp /root/etc/network/interfaces /etc/network

For me it was necessary to also re-instate the files below, but you probably wont need to.

# cp /root/etc/profile /etc
# cp /root/etc/inittab /etc

I also deleted mail in Postfix's mail queue, since I knew that this mail had already have been delivered, and didnt want to deliver duplicates:

# postsuper -d ALL

TAR Restore Procedure

The following procedure is suitable for performing a full restore from a TAR archive that was created using BackupPC's archive host feature.

Base Debian install

Strictly speaking it's not necessary to begin with a base Debian install, but there are a couple of reasons why you might want to:

Install Debian Sarge to the target host in the normal manner, or re-image the server using your serial console.

Prepare the target host

Connect to target host:

# ssh target.example.com

Install your faviourite editor:

# aptitude install ee

Configure hosts file:

# ee /etc/hosts

Make a backup of your base-install configuration, in case you should need to restore anything having over-written with your backed-up file system:

# cd /
# tar zcvf /root/etc.tar.gz /etc

Restore the File System

If your tar archive is stored on physical media such as TAPE/CD/DVD then you can simply mount the media and extract directly from the media to the file system:

# mount /dev/cdrom /cdrom
# cd /
# tar --numeric-owner -zxvpf /cdrom/host.1.tar.gz

Or if your tar archive is located on an NFS export first install the necessary NFS client packages, then mount the export and extract from it directly to the file system:

# aptitude install nfs-common
# mkdir /mnt/archives
# mount nfs.example.com:/home/archives /mnt/archives
# cd /
# tar --numeric-owner -zxvpf /mnt/archives/host.1.tar.gz

To restore to a remote server, you could simply copy the tar archive across using scp and then extract it as shown above. However, while there may be sufficient storage space on the restore target host for the extracted file system, there may be insufficient for the extracted file system AND the tar archive. One solution is to extract the tar archive on the local backup server and pipe the output through SSH to the restore target, as shown below:

# ssh target.example.com "cd /; tar --numeric-owner -zxvpf -" < /home/archives/host.1.tar.gz

Pre-boot configuration changes

It may be necessary to make some configuration changes to the freshly restored host before you reboot.

First we need to extract the backup we made of the base-configuration:

# cd /root
# tar zxvf etc.tar

The partition and file system schema of your restore target host might not match that of the backed-up host, so re-instate the file created during Debian installtion:

# cp /root/etc/fstab /etc/fstab

Similarly, the network configuration of the backed-up host might not be suitable for the restore target host, so re-instate the network configuration created during Debian installtion:

# cp /root/etc/network/interfaces /etc/network

For me it was necessary to also re-instate the files below, but you probably wont need to.

# cp /root/etc/profile /etc
# cp /root/etc/inittab /etc

I also deleted mail in Postfix's mail queue, since I knew that this mail had already have been delivered, and didnt want to deliver duplicates:

# postsuper -d ALL

Left Over Files

You might notice that some packages from the Debin base-install which had been removed from the backed-up host are still present after you complete the restore from TAR process. For instance, you might have replaced exim4 with postfix, yet when you reboot exim tries to start.

This is because the file system restore process only creates and over-writes files. It doesnt remove files from the restore target that were not present in the backup.

You cannot use aptitude to remove the packages because aptitudes state has been restored from backup: it doesnt think the package is installed.

None of this should present any real world problems. Your restore should still succeed and leave you with a fully functional server, but it is a little untidy. And so I describe one process that worked for me in the case of the exim/postfix example given above. Your mileage may vary.

Download .deb packages for the packages you want to get rid of:

# wget http://ftp.us.debian.org/debian/pool/main/e/exim4/exim4_4.50-8_all.deb
# wget http://ftp.us.debian.org/debian/pool/main/e/exim4/exim4-base_4.50-8_i386.deb
# wget http://ftp.us.debian.org/debian/pool/main/e/exim4/exim4-config_4.50-8_all.deb
# wget http://ftp.us.debian.org/debian/pool/main/e/exim4/exim4-daemon-light_4.50-8_i386.deb

Install the packages through dpkg. Note the use of --force-conflicts and --force-overwrite where necessary. The later will cause files relating to the postfix package to be over-written. (ouch!)

# dpkg --force-depends --unpack exim4_4.50-8_all.deb
# dpkg --force-depends --unpack exim4-base_4.50-8_i386.deb
# dpkg --force-depends --unpack --force-conflicts exim4-config_4.50-8_all.deb
# dpkg --force-depends --unpack --force-conflicts --force-overwrite exim4-daemon-light_4.50-8_i386.deb

Purge the packages you just installed to remove all files relating to the packages:

# dpkg --purge exim4 exim4-base exim4-config exim4-daemon-light

Fix postfix by forcing the package to be reinstalled. This should re-instate the files that were over-written when forcing the install of exim4-daemon-light.

# aptitude reinstall postfix

Bingo!

Further Reading

 

Also see my Backup Research Notes.

Frequently Asked Questions

How can I backup a whole Windows disk over SMB?
All Windows NT based OS (NT, 2000, XP Pro), are configured by default to share the entire C drive as C$". This is a special share used for various admin functions, one of which is to grant access to backup operators.

Create a new domain user, specifically for backup. Then add the new backup user to the built in "Backup Operators" group. You now have backup capability for any directory on any computer in the domain in one easy step. This avoids using administrator accounts and only grants permission to do exactly what you want for the given user. Dor additional security, you may wish to deny the ability for this user to logon to computers in the default domain policy.
How can I schedule precisely when a backup takes place?
You can control the backupPC server by running BackupPC_serverMesg from cron. BackupPC_serverMesg is used to send a message to the BackupPC server telling it to start a backup of a given host. This is the preferred method of starting a backup at a definite time.

Another option is to set up a small backup window. With a small window from 2 - 5 am for example, 99% of the time the backup runs at 2am as expected. But every now and then it runs at 3 which means it missed the 2am run for some reason. This approach seems more resiliant.
How much storage will I require?
Consider the quantity of data you want to backup and the rate at which you expect this data to grow in size. Then consider the effect of compression and BackupPC's pooling system on your storage requirements.
How well will my data compress?
There is no point in compressing data which has already been compressed in some way. JPEG images, MPEG video and ZIP files for instance will not compress any further. When backing up this type of data you might as well disable compression and save your processor some work.
For other types of data you might expect to see a 40%-50% reduction in size, as a rough guide.
How will BackupPC's pooling system effect my storage requirements?
Duplication occurs within the data you are backing up when you want to store multiple historical backups for a target host, where a file is found within more than one of those backups, and where the file has not changed between those backups.
Duplication also occurs where you want to backup more than one target host and where an identical file is found on more than one of those hosts.
BackupPC uses a clever pooling system involiving hard-links to avoid the need to store duplicate files. The degree to which this effects your storage requirements will depend upon the amount of duplication occuring within the data you want to backup.
How frequently do I want to perform backups?
The short answer is that you should perform backups as often as possible. Consider how frequently your data changes, the importance of having an up to date copy of the data and the impact of performing backups on your network.
If you are backing up on a small quantity of frequently changing data then you may want to go so far as to perform several backups a day. For others weekly backups may be sufficient, but daily backups are a good idea.
How many historical backups do I want to keep?
Its a good idea to have backups going as far back in time as you can in case you don't immediately recognise the need to restore data from backup.
Imagine something occurs which affects the integrity of the data on the target host; perhaps a hacker gaining access to the system. You may not immediately realize this has occured, and its important to be able to restore to a point in time when you can reply on the integrity of your data. This may be several days or even weeks ago.
In short, store as many historical backups as you can.
What hardware should I use for my backup server?
Use the most powerful hardware you can afford.
If you intend to use compression then ensure your system has a powerful processor.
If you intend to use RSync as a transfer method then it's especially important to ensure you have lots of physical memory. RSync's memory usage increases in direct proportion to the number of files to be transferred and can grow to over a gigabyte when performing a backup of a target of host with many thousands of files.
RAID is also a good idea, primarilly for its redundancy, but also for its performance benefits.
What file system should I store my backups on?
Dont's use the EXT2 or EXT3 file systems. You will soon run in to trouble when backing up lots of files as you run out of free inodes. If you have UPS then use ReiserFS. If you don't have UPS then you may find Reiser prone to corruption when power fails. In this case XFS may be a better choice.
Use LVM to enable you to grow your file system as necessary.
How will I go about restoring data should I need to?
Individual files or directories can be restore directly through the web-interface, either by the systems administrator or by users themselves. For bare metal restore procedures keep reading.
What operating system should I use for my backup server?
Any good Linux distribution will suffice, but this HOWTO is orientated towards Debian Sarge.
What transfer method should I use to perform backups?
Use RSync over SSH where possible. RSync is highly efficient in terms of network bandwidth since it transfers only the difference between files. SSH adds encryption which makes it safe to backup sensitive data over WAN connections.
Alternatives include SMB, TAR and RSyncD. Remeber that SMD is not able to export extended file system meta-data such as file system permissions, so this information will be missing from your backups. RSyncD is simply a daemonzied Rsync running on the backup host and listening for connections. SMB and RSyncD do not use encryption and so are not suitable for use over WAN connections.

Cavaets

Hardlinks are not backed up (at time of writing: next version should)