My home backup system

After having spent the last five years feeling guilty, I now, finally, have my laptop backing up the data I care about to another machine on my network. Here’s how I did it. This is a relatively long and complicated process, but it means that it all happens automatically and by magic, and I don’t ever have to interact with it, which is what I want. The first component I needed was some backup space: a machine on my network that I could send the backups to. I did look at online backup space (Amazon’s S3 and similar) like all the cool kids, but I just can’t get on with it, and I resent paying because I’m a cheapskate. So, it was to be a box on my network. Now, there are useful NAS machines around, which just get plugged in and automatically export their disc space (normally as a Windows share, with Samba), and I looked at those too (there’s the Terastation, etc, etc). However, I needed an always-on server for another purpose anyway, so I decided to go with a real machine. A machine cobbled together out of the Big Box Of Machine Bits, of course.

Setting up the server

It’s got two disc drives in it, and I divided the first disc into two partitions, one with 1GB and the other with all the rest. Install Ubuntu Linux 6.10 Edgy, server edition, on the 1GB partition. (I actually installed dapper and then upgraded it to edgy, for that bleeding edge greatness; at this writing, edgy is only at RC stage.) After that, we want to take all the remaining space on the machine (one big partition on disc 1, and all of disc 2) and make them one big block of disc space; this is what LVM, the Linux Volume Manager, is for. Note that all this stuff can be done with proper GUI tools, but I don’t have a GUI on this machine because it’s a server and I’m trying to converse disc space. This bit’s also from memory, so be very careful and don’t just slavishly follow it.

# First, make the partition available to LVM, by 
# making it a "physical volume". This is LVM-speak for 
# "a bit of a disk that I can use"
pvcreate /dev/hda3 # the big partition on the first disc
pvcreate /dev/hdb # and all of the second disc

# Now, create a "volume group". This is LVM-speak for 
# "a big block of disc space all managed together"
vgcreate volumegroup /dev/hda3 /dev/hdb

# Next, create a "logical volume". LVM-speak: "something
# that looks like a disc drive, so you can mount it"
# First, find out how big it can be
vgdisplay | grep "Total PE"
  Total PE              11833
# now create the logical volume at that size
lvcreate -l 11833 volumegroup -n logical1

# You now have a device /dev/volumegroup/logical1
# which you can treat as if it were a disc
# Create a dir to put it in
mkdir /space
# and add it to /etc/fstab so it gets mounted. Add the line:
/dev/volumegroup/logical1 /space auto   defaults        0       0

After that complex little bit (again, if you aren’t tight like me, do it with the GUI, it’s easier), you will have a directory /space on the machine with loads of space in it. Install openssh-server and rsync, because we’ll need them later.

Rotating backups

The way I want my backups to work is as follows. Every night, each machine on my network should connect, and send everything that’s changed since yesterday. When I look on the backup server, there should be a folder for each machine, and there should be in there a folder per day. Each folder should look like a complete backup, but if a file hasn’t changed since yesterday it shouldn’t take up any more disc space. So, the folder structure should look, say, like this:

/space
  /stuart
    /2006-10-24
      /folder1
        /file1
        /file2
        /newfile1
      /folder2
        /file3
    /2006-10-23
      /folder1
        /file1
        /file2
      /folder2
        /file3

and the 2006-10-24 folder should have all the files in it but only take up as much space as newfile1. Complicated, but part of the reason I specified this is because I know it’s possible. (The main reason, of course, is that I’m tight and want to save disc space.) Making this happens involves two stages: making a hardlink tree, and using rsync.

The hardlink tree

If you can get over how much this sounds like something out of an Enid Blyton book, it’s a cool technique. I’m not going to explain hardlinks and inodes and things like that here, because there are many other descriptions elsewhere. Suffice to say that, if you have a folder, you can make a duplicate of that folder with cp -al folder newfolder, and that duplicate will look the same and be full of real files but not take up any disc space. My nightly backup therefore needs to do the following:

  1. Copy last night’s backup to a new folder, named for the current date
  2. Change the data in the new folder to look like my laptop, so it’s got all yesterday’s data but with any changes I’ve made today

The issue here is: how do you know what last night’s backup is called? I’ve solved this by making sure there’s a symbolic link called current which always points to the most recent backup. So, the above process actually becomes:

  1. Copy the current folder to a new folder, named for the current date
  2. Change the data in the new folder to look like my laptop, so it’s got all yesterday’s data but with any changes I’ve made today
  3. Change the current link so it points to the newly created most recent backup

The script that does this is stored in /space/begin-backup, made executable with chmod +x /space/begin-backup, and looks like this:

#!/bin/bash

PERSON=$1
BROOT=/space

if [ -z "$PERSON" ]; then
  echo You must pass the name of a backup dir
  exit 1
fi

PDIR=$BROOT/$PERSON/

# If person dir doesn't exist, create it
if [ ! -d $PDIR ]; then mkdir $PDIR; fi

# If there's no current dir, create an empty one and link it
if [ ! -d $PDIR/current ]; then
  mkdir $PDIR/first
  ln -s first $PDIR/current
fi

DT=$(date -Iseconds)

# Hardlink-tree the existing recent dir
cp -al "$(readlink -f $PDIR/current)" $PDIR/$DT
# and link current to the new hardlink tree
rm $PDIR/current
ln -s $DT $PDIR/current

We’ll come back to how you run this in a minute.

Rsync

The change the data in the new folder to look like my laptop bit is done with rsync, which is complex but brilliantly clever. In essence, rsync is like copy (or cp), except that it compares the source and the destination and only sends the changes over. On my laptop, I can do

rsync -avz --delete -e ssh 
    /some/folder/to/back/up 
    myserver:/space/stuart/current/

and that will copy /some/folder/to/back/up over to the server. Importantly, if that folder is already in the backup space, in the current folder (because we backed it up yesterday) then it’ll only copy the changes over. This is why we make sure that there’s a folder called current with the contents of last night’s backup! Exactly how we run this rsync command we’ll come on to in a minute. Patience, Iago.

Choosing what gets backed up

I don’t want to back up everything. I don’t have the space, and to be frank I have a lot of crap lying around on my machine. So I need a very easy way of tagging something for backups. This is a perfect use of emblems; I can “tag” a file or a folder in the file manager with a special “backup” emblem, and that should indicate to my backup process that that file or folder wants to be included in the backup. Ubuntu doesn’t have a backup emblem included by default, but adding one is easy, and explained in the docs. Pick yourself an image (I use this little tape) and add it as an emblem, and then go through your machine and add it to every file or folder that needs backing up. (This will, if applied to a folder, back up everything inside it. If you need it to back up only some of the stuff inside it then you’ll have to not apply it to the folder. Yes, this is awkward, but I don’t need to do that.) Applying emblems is also in the documentation; a quick way if you’re doing this a lot is to pop up the Edit > Backgrounds and Emblems window and just repeatedly drag your new backup emblem to everything.

SSH with no password

One final preparation step: in order that the backup can run without me being around, I need to be able to make an ssh connection from my laptop to the server without entering a password. I’m not going to describe how to do this because there are plenty of guides out there on the web.

Make it so

Now, finally, after lots of setup, it’s time to actually make it all happen. To summarise, then, to do a backup, we need to:

  1. Run, on the server, the copy-last-night’s-backup script
  2. Get the list of all the files with the backup emblem
  3. Use rsync to copy all those files into the new backup folder on the server

To get the list, we can use my findemblem.py script (and you thought I just wrote it for fun!). The final script, dobackup.sh, which actually does the work, just does the above steps, and looks like this:

#!/bin/bash

# Do backups to the rsync server
# You must have already set up a passphraseless ssh key to the ssh server
# so that "ssh servername" just logs you in.

BK=$(dirname $0)
BKNAME=stuart

# First, tell it to clock over the backup
ssh servername /space/begin-backup $BKNAME

# Now, do the backup
python $BK/findemblem.py backup | while read fn; do
  rsync -avzq --delete -e ssh "$fn" 
    servername:/space/$BKNAME/current
done

All that remains now is to schedule this script to run every night, by editing your tasklist with crontab -e and adding the line

40  4  *   *   *     /full/path/to/dobackup.sh

And, lo and behold, you have overnight backups. All done and dusted. Phew.

11 thoughts on “My home backup system

  1. Will says:

    Did you consider LVM snapshots? (http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html) If so, why did you go with massive trees of hardlinks instead?

  2. sil says:

    I did briefly consider them, but, well, I don’t understand them. For example: does the snapshot space get deducted from the space in your LVM, or do you need space *outside* the LVM to put them in? The documentation seems to pretty much assume that you already know about snapshots, and I don’t…

  3. [...] aquarius, did you know about rsnapshot before doing this? It seems to do pretty much everything you want. I’ve been using it for several years, and currently back up two local and twelve remote machines ever four hours. I’m very happy with it. [...]

  4. pel says:

    Hi,

    Very cool! I’ve been doing basicly the same thing for years – but i must say that the use emblems is just awesome!

    I did some modification to your dobackup.sh to handle remote and local backups (or both).

    #!/bin/bash

    # Do backups to local media or/and an rsync server
    # You must have already set up a passphraseless ssh key to the ssh server
    # so that “ssh servername” just logs you in.

    BK=$(dirname $0)
    BKNAME=$USER
    #BKROTATE=begin-backup
    BKROTATE=”rotate-backup”
    BKREMOTEHOST=
    BKREMOTEDIR=
    BKLOCALDIR=/media/LACIE/backup
    BKEMBLEMNAME=”Backup”

    RSYNCFLAGS=”-avzq -delete”

    #Rotate the backup

    ## Try to rotate remote backup location
    if [ -n "$BKREMOTEHOST" ]; then
    ssh $BKREMOTEHOST $BKREMOTEDIR/$BKROTATE $BKNAME
    if [ $? != 0 ]; then
    echo “Failed to connect to backup host (${BKREMOTEHOST}).”
    else
    ### Now, do the backup
    $BK/findemblem.py $BKEMBLEMNAME | while read fn; do
    rsync $RSYNCFLAGS -e ssh “$fn” $BKREMOTEHOST:$BKREMOTEDIR/$BKNAME/current
    done
    fi
    fi

    ## Try to rotate local backup location
    if [ -n "$BKLOCALDIR" -a -d "$BKLOCALDIR" ]; then
    $BKLOCALDIR/$BKROTATE $BKNAME

    ### Now, do the backup
    $BK/findemblem.py $BKEMBLEMNAME | while read fn; do
    rsync $RSYNCFLAGS “$fn” $BKLOCALDIR/$BKNAME/current
    done
    fi

    ## If no backup destination is specified – abort
    if [ -z "$BKREMOTEHOST" -a -z "$BKREMOTEDIR" -a -z "$BKLOCALDIR" ]; then
    echo “No backup destination specified.”
    exit 1
    fi

  5. pel says:

    Ok, that came out completely garbled.

    I put it here http://www.update.uu.se/~peterl/tmp/dobackup.sh.txt
    if someone wants it.

  6. pel says:

    Oh, and a final note – you probably have some good reason for not using -R (use relative pathnames), but it really is a good idea and makes it much easier to restore from backup :)

  7. [...] Idag såg jag en rolig idé som gick ut på att använda nautilus emblem för att markera vad det ska tas backup på – så nu har jag fina backupband här och där i min filhanterare Jag har filat lite på ditt och datt – man kanske skulle fila det till något mer generellt och paketerbart? [...]

  8. Unless I missed something when I read this, this backup method won’t create proper incremental backups. The reason being that each new backup contains hard links to all the files in the previous one. That’s fine when you create/delete a file, but when you edit an existing file rsync will update that file with the new content. Since every copy of that file in each day’s backup is a hard link to the same file, they’ll all be updated.

  9. sil says:

    astopy: you have missed something, and it’s this: when rsync needs to change a file, it unlinks it first. :)

  10. Ah, thanks. I had no idea :)

  11. [...] Anyway, this means that something I’ve been thinking of doing for a while leaps up my priority list. One of the 6 computers is a server: an old tower case machine with 90GB of storage in it made up of scrounged hard drives I had lying around. I use it as the server for my home backup system, and it does a good job. However, since I can now play media across the network (exciting! it is!), I could use it as the storage place for all our films and TV and music and games and whatnot. However, it’s old and a bit noisy, so it’d be nice to get rid of it. I still need whatever I replace it with to be an actual server, though, not just a NAS-style stack o’ discs, so I don’t want a Buffalo Terastation or similar — it not only needs to run the rsync server but I also plan on having it be used for BitTorrenting stuff I’d like to watch and to be always on for downloading things. [...]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>