Zimbra Backup Replication with RSYNC June 27, 2010
An Overview of Zimbra Backups
Creating a viable replica of a Zimbra backup can be a challenging task and the key to doing it right is to understand how Zimbra backups are structured and scheduled. The RSYNC tool is an excellent choice for an offsite replica of your Zimbra mailstore backup, as it provides options to deal with the structure. The first thing to consider is how you are backing up Zimbra in the first place. Here is a typical crontab entry detailing when and how zmbackup is called.
0 1 * * 1 /opt/zimbra/bin/zmbackup -f -a all 0 1 * * 0,2-6 /opt/zimbra/bin/zmbackup -i 0 0 * * * /opt/zimbra/bin/zmbackup -del 3m
Translated to English, this states that Zimbra is going to:
- Line1: Do a full backup of all accounts at 1:00am on Monday
- Line2: Do an incremental backup Sun, Tues, Weds, Thurs, Fri and Sat
- Line3: Get rid of any backups that are older than 3 months, and check for this every day at midnight
Now, when Zimbra does its backup job, by default it puts the backup files in /opt/zimbra/backup. If you look in this directory, you will see something similar to this:
drwxr-xr-x 5 zimbra zimbra 4096 Jun 27 01:24 . drwxr-xr-x 54 root root 4096 Mar 28 02:23 .. -rw-r----- 1 zimbra zimbra 1750611 Jun 27 01:24 accounts.xml drwx------ 2 root root 16384 Apr 26 2009 lost+found drwxr-x--- 148 zimbra zimbra 12288 Jun 27 01:24 sessions drwxr-x--- 13 zimbra zimbra 4096 Jun 27 01:24 tmp
The accounts.xml file contains vital information about the backups. Zimbra requires this information in the case of a restore and without this file, your backup will be useless. The information in this file includes:
- The translation between the Zimbra UUID for a person and their account name
- Where to look for the most recent full backup of an account
In your sessions directory, you should see the actual backup directories:
drwxr-x--- 5 zimbra zimbra 4096 Jun 24 01:23 full-20100624.052333.282 drwxr-x--- 5 zimbra zimbra 4096 Jun 25 01:20 full-20100625.052028.170 drwxr-x--- 5 zimbra zimbra 4096 Jun 26 01:30 full-20100626.053024.490 drwxr-x--- 5 zimbra zimbra 4096 Jun 27 01:24 full-20100627.052414.328 drwxr-x--- 5 zimbra zimbra 4096 Mar 27 02:05 incr-20100327.050647.501 drwxr-x--- 5 zimbra zimbra 4096 Mar 31 01:25 incr-20100331.050704.213 drwxr-x--- 5 zimbra zimbra 4096 Apr 1 01:21 incr-20100401.050711.294 drwxr-x--- 5 zimbra zimbra 4096 Apr 2 01:27 incr-20100402.050621.529 ...
Zimbra automatically uses hard links to reference message files between backup sets that have not changed since the last backup. By only storing a message for a user once, this keeps the total size of your /opt/zimbra/backup/ directory to a minimum. However, you must consider these hard links when designing your backup solution. Otherwise, you can easily make this mistake:
- If you blindly copy this data to another directory without considering the hard links, it will balloon in size at the target location, because the hard links between full backups will not be preserved and each full backup will be the full size of your mailstore. These backups should still be viable, but it would be a huge waste of space.
If you want true point-in-time recovery, you need to copy everything in /opt/zimbra/backup/ while preserving hard links. You should not pick and choose what to copy.
If you were hard-up to do something quick, a minimum emergency backup of Zimbra would include only taking the most recent full backup directory in /opt/zimbra/backup/sessions/, while preserving the /opt/zimbra/backup/ directory structure otherwise, along with the necessary accounts.xml file. However, this is not recommended. Zimbra intends for everything in /opt/zimbra/backup/ to be replicated and managed as a whole, and if you start picking it apart, you definitely will not have point-in-time recovery, and you run the risk of not getting everything that Zimbra needs to do a full restore. If you choose to pick apart the Zimbra backup scheme, make sure you understand entirely how it all works first. Also, if you use auto-group mode and if you replicate only the most recent full backup from a list of many available, a incomplete replica is all but guaranteed.
Replication with RSYNC
Zimbra stores individual messages in individual files. For small Zimbra installations, this is not a large management problem. For very large zimbra installs, this can be challenging to work with. In fact, it’s almost impossible to do this with any version of RSYNC before version 3. For more about this, see the post I did a few years ago. Since then, RSYNC has introduced a life-saving feature (“–recursive”). This is only available in RSYNC v3 and allows RSYNC to recursively traverse directories and start the initial transfer from sources with millions of files very quickly. Depending on the size of your backups, and with RSYNC v2, it could take days to even start transferring files from a Zimbra backup directory.
So, there are 2 important switches that you need to pass your RSYNC v3 process to create a fast, viable backup are:
- –recursive This takes advantage of the recursive logic in RSYNC v3
- -H This will preserve hard links, and keep the size of your /opt/zimbra/backup/ replica under control
And here it is put into an RSYNC command:
rsync -avzH --exclude="lost+found" --recursive --delete -e ssh zimbra@zimbra.server.com:/opt/zimbra/backup /my/local/replica/ > /my/local/replica.log
This assumes that you have an SSH key pair setup. Remember that RSYNC simply copies data from the source to the target. If you mistakingly delete your /opt/zimbra/backup/sessions/ directory or make some other horrible mistake, it will replicate that mistake. It is good practice to turn off your RSYNC replication before making any major changes as this replicate may be your only safeguard.
Now, about scheduling.. Run your rsync task after Zimbra typically completes its full backup and incremental backups, and leave enough time to ensure that the RSYNC completes before starting the next Zimbra backup. Depending on the size of your mailstore, this may mean staggering your Zimbra and RSYNC replica operations to each occur every other day.
Good luck with your Zimbra backup replication!
Seriously? Replication boils down to copying a backup directory? No instructions on hot standby or performance clustering, or anything like that?
GREAT topic for discussion, but I sure wish there was some more information…