Storage System

From ACENET
(Redirected from NQS)
Jump to: navigation, search
Achtung.png Legacy documentation

This page describes a service provided by a retired ACENET system. Most ACENET services are currently provided by national systems, for which please visit https://docs.computecanada.ca.


Main page: User Guide

File storage at ACENET is implemented with one of two file system technologies:

Changes

In early 2016 ACENET replaced a variety of aging storage hardware and software at Mahone, Fundy, Placentia in order to assure data continuity. Lustre was also introduced at that time, and the following changes affecting users were made:

  • The /globalscratch file system was merged with /home. Files formerly in /globalscratch/$USER should now be found in /home/$USER/scratch, and /home/$USER/scratch is no longer a symbolic link but a real subdirectory. Scripts or programs which explicitly refer to /globalscratch will have to be edited.
  • The quota command is now a wrapper around the lfs quota command. The appearance of its output has changed somewhat, but your quota standing are now available immediately.
  • Quotas have been adjusted to reflect the merged filesystems, and a new file count quota has been imposed with a default quota of 180,000 files per user.
  • The tape layer of the old storage systems has not been replaced, for reasons of cost. While file restoration after deletion or other forms of accidental loss was never officially supported, such recovery is now practically impossible in every case.
  • Red Hat Enterprise Linux 6 (RHEL6) is the default operating system on all four clusters now. This simplifies job submission for those users who have been obliged to use os=RHEL6 for the last year or so, and certain applications that were previously only available on RHEL6 can now run on any node, e.g. MATLAB.

Policies

Policy document: ACENET Data Policies

Backup

ACENET does not provide backup services. Our filesystems are built with RAID redundancy to protect your data from the loss due to hardware failures, but we do not protect you from accidentally deleting or changing your own files. Users are therefore strongly encouraged to make off-site (or multi-site) copies of their critical data. Source code and other such key files should be managed with a version control tool such as Git, Subversion, Mercurial, or CVS.

You should also be aware of your home institution's data storage policies and follow them. Some institutions offer network backup facilities which you might be able to take advantage of. MUN users can take advantage of MUN's RDB system for backing up data from Placentia.

Archiving

ACENET does not provide permanent data archiving.

Data retention policy

Data stored in expired accounts is subject to deletion after a grace period of 4 months.

Layout

There are three types of disk space available to the user on most ACENET clusters: One is Permanent Storage, and two others are Temporary Storage. The general outline of the ACENET storage system is given below.

Permanent Storage system on every cluster
Name Location Function Resource type
Home Dir /home/<username> critical data and code network
Temporary Storage system on every cluster
Name Location Function Resource type
No-quota Scratch /nqs/<username> temporary data, large data network
Local Scratch /scratch/tmp temporary data, fast read/write access data node-local

Permanent Storage

Main storage (home directory) is your personal and permanent space for research-critical data and code. This is where you should put your data prior to and after computations, and where you should keep source code and executables. It is located in /home/<username>, where your username will replace <username>. When you log in, this is your current working directory. You may create whatever subdirectories you like here. The Main storage is a networked storage shared among all compute nodes via Lustre or NFS (Network File System) at Glooscap.

Quotas

Storage quotas are implemented at all clusters. The default quota values (soft limits) are given in the table below. The hard limit quotas are 5-10% higher (except Glooscap). The grace period of exceeding the soft limit is one week.

Location Limit type Fundy Mahone Placentia Glooscap
/home/<username> bytes per user 150 GB 155 GB 75 GB 61 GB
/home/<username> files per user 180,000 180,000 180,000 no limit

Your usage and limit information can be found with the command quota. You can also use du to determine how much space your files occupy:

$ du -h --max-depth=1 /home/$USER/

DAU

In the table below, the Disk Allocation Unit (DAU) sizes on ACENET clusters are provided. Where there are two numbers specified for the DAU in the table below, like so X (Y) KB, then the first 8 blocks of a file will be X KB each, and the rest of the blocks will be Y KB each.

Location Fundy Mahone Placentia Glooscap
/home 4 KB 4 KB 4 KB 4 (64) KB

Temporary Storage

No-quota Scratch

No-quota Scratch (NQS) is temporary network storage that has no per-user quota limit, but gets cleaned periodically to get rid of old files. It's available at /nqs/<username>/ on every cluster to users who have requested access to it.

Note
If you want to use NQS, you should contact support stating that you understand the terms of use and would like NQS turned on. Also, please let us know if you want to be notified when files are scheduled for deletion, and if so where you want those emails to be sent.

NQS is designed to allow you to store large amounts of data on a temporary basis, for example, files generated and consumed during a single job that cannot be stored on Main Storage or Global Scratch due to the per-user quotas. Because no quotas are enforced on NQS, there is an irreducible risk that the filesystem will fill up. Should that occur existing data on /nqs may be unrecoverable. This means it is unsuitable for storage of critical data. Long-term storage of data --- critical or not --- is also not appropriate since this increases the risk of the filesystem filling up during its intended use.

You are expected to delete your files from /nqs once the associated job or jobs are complete. Technical staff also reserve the right to delete files manually in the event of a manifest risk of a fill-up emergency.

To ensure that these guidelines are followed and /nqs stays usable for its intended purpose, files which have not been accessed for 31 days are automatically deleted. The deletion routine will notify you seven days in advance of removing any of your files if you keep a file named /home/username/.nqs in your home directory with these contents:

U_EMAIL=user@some.address.foo
U_QUIET=no
Size and DAU of /nqs at each site
Fundy Mahone Placentia Glooscap
size 12 T 13 T 12 T 19 T
DAU 4 KB 4 KB 4 KB 4 KB

If you want to check how much space is used or available in NQS then use the following command:

$ df -h /nqs/$USER

To examine the last access time of your files:

$ ls -lu /nqs/$USER     # in the given directory
$ ls -luR /nqs/$USER    # in subdirectories too, recursively

To find files recursively which have not been accessed for the last e.g. 24 days:

$ find /nqs/$USER -type f -atime +24

Local Scratch

Main page: Local Scratch

Each compute node has its own disk (or in some cases, solid state memory) which is not shared with other compute nodes. We refer to this as local disk. If it is used to store temporary files for an individual job, then we refer to that as "local scratch storage".

Local scratch has the advantage over network storage that local storage is not prone to slow down when cluster load is high. If your application does a high volume of input/output then using local scratch might result in more predictable run times. However, local scratch is more complicated to use than network storage. If you are willing to invest some effort into learning how to use node-local disk in general and the specifics of ACENET's node-local scratch in particular, then please read Local Scratch.