Storage System
- Main page: User Guide
The storage implemented by ACEnet is Sun's SAM-QFS, a hierarchical storage system using RAID 5 technology. The objective of a hierarchical storage system is large storage volume at low cost, while not sacrificing speed of access. The objectives of the RAID 5 technology are speed and fault tolerance.
Although there are features available in SAM-QFS that allow for backup and recovery of user data, SAM-QFS is not configured as a backup system at ACEnet! ACEnet endeavours to protect users from the effects of hardware failures, but we do not protect from accidental overwriting or deletion.
Given that ACEnet does not provide backup services therefore users are strongly encouraged to make off-site (or multi-site) copies of their critical data, and to observe their home institution's data storage policies. Some institutions offer network backup facilities which you might be able to take advantage of.
Contents |
Layout
There are four areas of disk space available to the user on most ACEnet clusters: two areas are part of the Permanent Storage, and two others are in Temporary Storage. The general outline of the ACEnet storage system is given below.
- Permanent Storage system on every cluster
| Name | Location | Function | Resource type |
|---|---|---|---|
| Main (home) | /home/<username> |
critical data and code | shared |
| Global Scratch (working dir) | /globalscratch/<username> |
working data, large volumes of data | shared |
On Glooscap, Global Scratch does not exist.
- Temporary Storage system on every cluster
| Name | Location | Function | Resource type |
|---|---|---|---|
| No-quota Scratch | /nqs/<username> |
temporary data, large data | shared |
| Local Scratch | /scratch/tmp |
temporary data, fast read/write access data | node-local |
Permanent Storage
Main Storage
- Home Directory
Main storage is your personal and permanent space for research-critical data and code. This is where you should put your data prior to and after computations, and where you should keep source code and executables. It is located in /home/<username>, where your username will replace <username>. When you log in, this is your current working directory. You may create whatever subdirectories you like here. The Main storage is a networked storage shared among all compute nodes via NFS (Network File System).
- How it works
- The data will be first written on SATA storage. A copy of the files will be done after 8 hours of the last modification time to the tape library. After 2 months of inactivity on a file (not modified or read), the copy on the SATA disk will be released and the inode will point directly to the tape cartridge. The file will only reside on the tape at that point. If the SATA-based filesystem usage reaches a "high water mark", currently defined as 80% full, the system looks for files that have been copied to tape but not yet released and releases them from SATA disks, continuing until usage drops to the "low water mark", 50% full.
Global Scratch
- Working Directory
Global Scratch is found in /globalscratch/<username>. It is designed as a storage volume for working data and data that is required by computations. The name 'scratch' is an unfortunate historical name and it does not imply that this is temporary storage. We do not delete or "clean" any user data from this volume. It enjoys the same level of protection as Main Storage. The Global Scratch is considerably larger than the main storage and is shared among all compute nodes via NFS. You may have the scratch symlink in your home directory pointing to this location, or you can create one yourself with the following command:
$ cd $ ln -s /globalscratch/<username> scratch
- How it works
- The data will be first written on the fast (FC) storage. A copy of the files will be done after 3 days of the last modification time to the slower SATA disk storage. A copy of the files will be done after 6 days of the last modification time to the tape library. After two weeks of inactivity on a file (not modified or read), the copy on the fast (FC) disk will be released and the inode will point directly to the copy on the SATA disk. After 8-16 weeks of inactivity on a file (not modified or read), the copy on the SATA disk will be recycled and the inode will only point the copy on tape. The file will only reside on the tape at that point. If the FC-based filesystem usage reaches a "high water mark", currently defined as 80% full, the system looks for files that have been copied but not yet released and releases them from FC disks, continuing until usage drops to the "low water mark", 50% full.
Quotas
Storage quotas are implemented at all clusters except Courtenay. These are the default values:
| Location | Limit type | Brasdor | Fundy | Mahone | Placentia | Glooscap |
|---|---|---|---|---|---|---|
/home/<username> |
per user quota | 47 Gb | 47 Gb | 13 Gb | 13 Gb | 61 Gb |
/globalscratch/<username> |
per user quota | 238 Gb | 238 Gb | 238 Gb | 238 Gb | -- |
The quota covers both "online" (disk) and "offline" (tape) storage.
Your usage and limit information can be found with the command quota.
You can also use du to determine how much online space your files occupy:
$ du -h --max-depth=1 /home/<username>/ $ du -h --max-depth=1 /globalscratch/<username>/
Living within your quota:
- The
ducommand will report only those files that have not been released to the second level disks or tapes (see "How it works" above), while the quota is set for all of your files. - You can see your offline usage with
quota, or with the web app https://webmo.ace-net.ca/uqs/login.pl. - The disk allocation unit (DAU) ranges from 32K to 2.7M at different sites and storage areas. The DAU is the smallest unit of disk that a file can occupy, so this number can affect your total storage usage strongly if you have a large number of files smaller than the DAU.
- In current versions of SAM-QFS, a file that is newly created or appended to may have a much larger footprint than the DAU for about 30 seconds after the creation or extension. If your application creates a large number of small files very rapidly you might find that you have to introduce a delay into the process to avoid running over quota.
- Where there are two numbers specified for the DAU in the table below, like so X (Y) KB, then the first 8 blocks of a file will be X KB each, and the rest of the blocks will be Y KB each. For example, a 97 KB file at Brasdor in
/homewill occupy 8*4+64+64=160 KB of the disk space. This feature of having a dual-DAU allows to save disk space when working with many small files.
| Location | Brasdor | Fundy | Mahone | Placentia | Glooscap |
|---|---|---|---|---|---|
/home |
4 (64) KB | 2.7 MB | 4 (32) KB | 4 (64) KB | 4 (64) KB |
/globalscratch |
1.5 MB | 1.6 MB | 192 KB | 2.6 MB | -- |
Temporary Storage
No-quota Scratch
No-quota Scratch (NQS) is temporary network storage that has no per-user quota limit, but gets cleaned periodically to get rid of old files. It's available at /nqs/<username>/ on every cluster to users who have requested access to it.
- Note
- If you want to use NQS, you should contact support stating that you understand the terms of use and would like NQS turned on. Also, if you want notification for when files will be deleted, please let us know that you want notification turned on and tell us where you want those emails to be sent.
NQS is designed to allow you to store large amounts of data on a temporary basis, for example, files generated and consumed during a single job that cannot be stored on Main Storage or Global Scratch due to the per-user quotas. It is not a hierarchical storage system, it only consists of disk drives. Because no quotas are enforced, there is an irreducible risk that the filesystem will fill up. Should that occur existing data on /nqs may be unrecoverable. This means it is unsuitable for storage of critical data. Long-term storage of data --- critical or not --- is also not appropriate since this increases the risk of the filesystem filling up during its intended use.
You are expected to delete your files from /nqs once the associated job or jobs are complete. Technical staff also reserve the right to delete files manually in the event of a manifest risk of a fill-up emergency.
To ensure that these guidelines are followed and /nqs/ stays usable for its intended purpose, files which have not been accessed for 31 days are automatically deleted. The deletion routine will notify you seven days in advance of removing any of your files if you keep a file named /home/username/.nqs in your home directory with these contents:
U_EMAIL=user@some.address.foo U_QUIET=no
| Mahone | Brasdor | Placentia | Glooscap | Courtenay | Fundy |
|---|---|---|---|---|---|
| 13T | 13T | 13T | 2.7T | 2T | 13T |
If you want to check how much space is used or available in NQS then use the following command:
$ df -h /nqs/$USER
To examine the last access time of your files:
$ ls -lu /nqs/$USER # in the given directory $ ls -luR /nqs/$USER # in subdirectories too, recursively
To find files recursively which have not been accessed for the last e.g. 24 days:
$ find /nqs/$USER -type f -atime +24
Local Scratch
- Main page: Local Scratch
Each compute node has its own disk (or in some cases, solid state memory) which is not shared with other compute nodes. We refer to this as local disk. If it is used to store temporary files for an individual job, then we refer to that as "local scratch storage".
Local scratch is not organized consistently across all clusters and hosts. In most cases it is found in /scratch/tmp, but there are some hosts where /scratch/tmp doesn't exist. Grid Engine provides an environment variable TMPDIR which points to a local disk location which always exists, hence
$ cd $TMPDIR
should always succeed inside your submission script.
The size of local scratch space varies from cluster to cluster and from host to host. In particular the X6440 "Blade Servers" introduced in late 2009 have small local scratch. You may wish to have your job script check the size before it decides where to write scratch files, in order to avoid "File system full" errors. Here's a script fragment that prints the available space in $TMPDIR in kilobytes:
$ df --block-size=1024 $TMPDIR | awk 'END {print $4}'
Parallel users will want to be even more careful, since available space may vary from host to host within a single job.
$TMPDIR is unique to each job, and Grid Engine deletes the directory at the termination of a job. We strongly recommend that you use $TMPDIR if you want to use node-local disk. If you do not use $TMPDIR we recommend that you
- check for the existence of
/scratch/tmp; - create a subdirectory with your username,
/scratch/tmp/$USER, or Grid Engine job number,/scratch/tmp/$JOB_ID; - ensure at the end of the job that the directory is cleaned up and deleted.
For a parallel job you should do this for each host in $PE_HOSTFILE.
If you write output files to Local Scratch, your script should ensure that they are copied to Main Storage at the end of the job. If you write temporary files to Local Scratch, please ensure that they are deleted at the end of the job. You should also manually patrol your Local Scratch directories to ensure that the space is not occupied by outdated files from failed or finished jobs.