Zimbra Backup Replication with RSYNC June 27, 2010

An Overview of Zimbra Backups

Creating a viable replica of a Zimbra backup can be a challenging task and the key to doing it right is to understand how Zimbra backups are structured and scheduled. The RSYNC tool is an excellent choice for an offsite replica of your Zimbra mailstore backup, as it provides options to deal with the structure. The first thing to consider is how you are backing up Zimbra in the first place. Here is a typical crontab entry detailing when and how zmbackup is called.

0 1 * * 1 /opt/zimbra/bin/zmbackup -f -a all
0 1 * * 0,2-6 /opt/zimbra/bin/zmbackup -i
0 0 * * * /opt/zimbra/bin/zmbackup -del 3m

Translated to English, this states that Zimbra is going to:

  • Line1: Do a full backup of all accounts at 1:00am on Monday
  • Line2: Do an incremental backup Sun, Tues, Weds, Thurs, Fri and Sat
  • Line3: Get rid of any backups that are older than 3 months, and check for this every day at midnight

Now, when Zimbra does its backup job, by default it puts the backup files in /opt/zimbra/backup. If you look in this directory, you will see something similar to this:

drwxr-xr-x   5 zimbra zimbra    4096 Jun 27 01:24 .
drwxr-xr-x  54 root   root      4096 Mar 28 02:23 ..
-rw-r-----   1 zimbra zimbra 1750611 Jun 27 01:24 accounts.xml
drwx------   2 root   root     16384 Apr 26  2009 lost+found
drwxr-x--- 148 zimbra zimbra   12288 Jun 27 01:24 sessions
drwxr-x---  13 zimbra zimbra    4096 Jun 27 01:24 tmp

The accounts.xml file contains vital information about the backups. Zimbra requires this information in the case of a restore and without this file, your backup will be useless. The information in this file includes:

  • The translation between the Zimbra UUID for a person and their account name
  • Where to look for the most recent full backup of an account

In your sessions directory, you should see the actual backup directories:

drwxr-x---   5 zimbra zimbra  4096 Jun 24 01:23 full-20100624.052333.282
drwxr-x---   5 zimbra zimbra  4096 Jun 25 01:20 full-20100625.052028.170
drwxr-x---   5 zimbra zimbra  4096 Jun 26 01:30 full-20100626.053024.490
drwxr-x---   5 zimbra zimbra  4096 Jun 27 01:24 full-20100627.052414.328
drwxr-x---   5 zimbra zimbra  4096 Mar 27 02:05 incr-20100327.050647.501
drwxr-x---   5 zimbra zimbra  4096 Mar 31 01:25 incr-20100331.050704.213
drwxr-x---   5 zimbra zimbra  4096 Apr  1 01:21 incr-20100401.050711.294
drwxr-x---   5 zimbra zimbra  4096 Apr  2 01:27 incr-20100402.050621.529
...

Zimbra automatically uses hard links to reference message files between backup sets that have not changed since the last backup. By only storing a message for a user once, this keeps the total size of your /opt/zimbra/backup/ directory to a minimum. However, you must consider these hard links when designing your backup solution. Otherwise, you can easily make this mistake:

  • If you blindly copy this data to another directory without considering the hard links, it will balloon in size at the target location, because the hard links between full backups will not be preserved and each full backup will be the full size of your mailstore. These backups should still be viable, but it would be a huge waste of space.

If you want true point-in-time recovery, you need to copy everything in /opt/zimbra/backup/ while preserving hard links. You should not pick and choose what to copy.

If you were hard-up to do something quick, a minimum emergency backup of Zimbra would include only taking the most recent full backup directory in /opt/zimbra/backup/sessions/, while preserving the /opt/zimbra/backup/ directory structure otherwise, along with the necessary accounts.xml file. However, this is not recommended. Zimbra intends for everything in /opt/zimbra/backup/ to be replicated and managed as a whole, and if you start picking it apart, you definitely will not have point-in-time recovery, and you run the risk of not getting everything that Zimbra needs to do a full restore. If you choose to pick apart the Zimbra backup scheme, make sure you understand entirely how it all works first. Also, if you use auto-group mode and if you replicate only the most recent full backup from a list of many available, a incomplete replica is all but guaranteed.

Replication with RSYNC

Zimbra stores individual messages in individual files. For small Zimbra installations, this is not a large management problem. For very large zimbra installs, this can be challenging to work with. In fact, it’s almost impossible to do this with any version of RSYNC before version 3. For more about this, see the post I did a few years ago. Since then, RSYNC has introduced a life-saving feature (“–recursive”). This is only available in RSYNC v3 and allows RSYNC to recursively traverse directories and start the initial transfer from sources with millions of files very quickly. Depending on the size of your backups, and with RSYNC v2, it could take days to even start transferring files from a Zimbra backup directory.

So, there are 2 important switches that you need to pass your RSYNC v3 process to create a fast, viable backup are:

  • –recursive This takes advantage of the recursive logic in RSYNC v3
  • -H This will preserve hard links, and keep the size of your /opt/zimbra/backup/ replica under control

And here it is put into an RSYNC command:

rsync -avzH --exclude="lost+found" --recursive --delete -e ssh zimbra@zimbra.server.com:/opt/zimbra/backup /my/local/replica/ > /my/local/replica.log

This assumes that you have an SSH key pair setup. Remember that RSYNC simply copies data from the source to the target. If you mistakingly delete your /opt/zimbra/backup/sessions/ directory or make some other horrible mistake, it will replicate that mistake. It is good practice to turn off your RSYNC replication before making any major changes as this replicate may be your only safeguard.

Now, about scheduling.. Run your rsync task after Zimbra typically completes its full backup and incremental backups, and leave enough time to ensure that the RSYNC completes before starting the next Zimbra backup. Depending on the size of your mailstore, this may mean staggering your Zimbra and RSYNC replica operations to each occur every other day.

Good luck with your Zimbra backup replication!

One Comments
Benjamin Smith August 1st, 2010

Seriously? Replication boils down to copying a backup directory? No instructions on hot standby or performance clustering, or anything like that?

GREAT topic for discussion, but I sure wish there was some more information…

Leave a Reply