Research Data Backup

From ACENET
Jump to: navigation, search

These instructions are for members of Memorial University.

Memorial University provides a service to all researchers on campus called Research Data Backup. The implementation of the system is managed and maintained by and through the Labnet system.

Basic Outline

Anything you put in your home directory on Labnet is automatically backed up on a daily basis as part of the Research Data Backup service. You can get access to up to approximately 6 months of incremental versions of your documents/data through the Labnet 'webtools' interface. The number of incremental versions available is not guaranteed, as it is a function of overall system usage. If you have a Labnet account, you can back up data right now on this service, through rsync or scp directly from placentia head node to either garfield.cs.mun.ca or ganymede.pcglabs.mun.ca.

This is all you need to know if you only need up to 50G of data backed up. If you need more than 50G of space, then a special partition will have to be created for you, and this partition will not be backed up incrementally (i.e. you will have one copy of your data).

How To Get Access

If you already have a Labnet account, you are ready to go. If you do not have a Labnet account go to the Labnet 'webtools' page to create one. Click on 'Labnet Account Generation' on the left menu. You will need your MUN Login credentials, and your Labnet user name and password will be the same as your MUN Login ones. If you do not have MUN Login credentials, go to this web page to get them activated first. Then go to the Labnet 'webtools' page.

NOTE
If you use the so-called IMAP Email service for your MUN email (i.e. you connect to mail.mun.ca with your email client to get your MUN email) or you use my.mun.ca then you have already activated these credentials. If you do not know your password, you can change it on the MUNLogin page, but this will also change your password for your email client.

Once you have activated your Labnet account, test it by logging on to garfield.cs.mun.ca or ganymede.pcglabs.mun.ca via ssh from the placentia head node.

If you have a small amount of data, less than or equal to 50G, then you can simply rsync or scp your data to garfield.cs.mun.ca or ganymede.pcglabs.mun.ca. See below for some examples.

NOTE
Whether you already have a Labnet account or have to activate one, please inform us that you are going to use this backup service, using the subject line MUN RDB Access, so that we can ensure that your Labnet home directory gets moved to the appropriate file system. Please include your Labnet login ID in the message. Further: this service is on an 'honour' basis, in so far as there are no quotas imposed. So, if you are using the service in the <=50G category and know that you are going to need more, please contact support to let us know, so we can arrange appropriate storage for you.

If you have more than 50G of data, please contact support and request MUN RDB Large DATA Access with details of how much space you need. Availability is dependent on University resources. However, the University has made a commitment to this service, so if your full request cannot be accommodated immediately, it should be doable in future. Again, storage above 50G cannot be accommodated in the normal Labnet incremental backup system. You will have one copy of your data.

Examples

Create a directory in your Labnet home directory called, e.g., placentia_backup. Log on to garfield.cs.mun.ca or ganymede.pcglabs.mun.ca to do this. (Of course you can call it whatever you like, but you'll need to make corresponding change whenever placentia_backup appears below.)

Let us further assume that you have a single directory on Placentia you wish to back up, called your_local_data.

Manual backup

To back up your_local_data manually, run the following command on placentia head node (assuming you are in the directory immediately above your_local_data):

rsync -e ssh -aSx ./your_local_data <labnet_userid>@garfield.cs.mun.ca:placentia_backup/

This will create a sub-directory on the Labnet system called placentia_backup/your_local_data. You will be prompted for your password when you do this. If you use -aRSx instead of -aSx the full directory structure of your source directory will be preserved, relative to the destination directory, e.g. on Labnet you might see placentia_backup/home/<user>/your_local_data.

You can do this manually as often as you like. The data will be backed up daily in an incremental manner on the Labnet system. So you do not need to worry about keeping track of versions of the data yourself, unless you expect to need to have versions going back longer than about 6 months.

Read the rsync man page for more information about how it works, and how to use it, e.g., for more complex cases (multiple source directories, etc.). It's possible to get fairly sophisticated, with exclude patterns, multiple sources, incremental, etc. If you go that route it is probably a good idea to script it, rather than try to do it all on the command line. For some examples of rsync scripting, see here. There are a lot of resources for help with understanding and scripting rsync on the Internet.

Automatic backup

The above process can be made to happen automatically on a recurring schedule, but in order to do so first you must set up passwordless ssh to Labnet. If the backup happens automatically in the middle of the night, you won't be there to supply your Labnet password!

1. Copy the contents of ~/.ssh/id_rsa.pub from placentia into the file ~/.ssh/authorized_keys in your Labnet home directory. If you've done this correctly the above rsync command will now work without prompting you for a password.

2. Set up a cron job on placentia to do this for you automatically. Running...

crontab -e

...will open an editor (probably vi). Add a line like this:

0 22 * * * /usr/bin/rsync -e ssh -aSx /home/<acenet_username>/your_local_data <labnet_username>@garfield.cs.mun.ca:placentia_backup/

This will run the command for you automatically at 10pm each night. It is basically the same command you would run manually, but with full path names specified. It's a good idea to use full path specifications for everything in a cron table. See man 5 crontab for information on the meaning of the first part of this line, and how to set up, e.g., weekly rather than daily backups.

3 (optional). If you don't like the idea of using your 'internal' placentia ssh key, then instead of step 1 above, create an alternative key (in your home directory on placentia):

mkdir .sshremote
chmod 700 .sshremote
ssh-keygen -t rsa -f .sshremote/placentia  # Hit enter when asked for a pass phrase

Then replace -e ssh above with -e "ssh -i .sshremote/placentia".