Data Storage

3 minute read

Dashboard

HPCC cluster users are able to check on their home and bigdata storage usage from the Dashboard Portal.

Home

Home directories are where you start each session on the HPC cluster and where your jobs start when running on the cluster. This is usually where you place the scripts and various things you are working on. This space is very limited. Please remember that the home storage space quota per user account is 20 GB.

Path/rhome/username
User AvailabilityAll Users
Node AvailabilityAll Nodes
Quota ResponsibilityUser

Bigdata

Bigdata is an area where large amounts of storage can be made available to users. A lab purchases bigdata space separately from access to the cluster. This space is then made available to the lab via a shared directory and individual directories for each user.

Lab Shared Space This directory can be accessed by the lab as a whole.

Path/bigdata/labname/shared
User AvailabilityLabs that have purchased space.
Node AvailabilityAll Nodes
Quota ResponsibilityLab

Individual User Space This directory can be accessed by specific lab members.

Path/bigdata/labname/username
User AvailabilityLabs that have purchased space.
Node AvailabilityAll Nodes
Quota ResponsibilityLab

Non-Persistent Space

Frequently, there is a need for faster temporary storage. For example activities like the following would fall under this category: 1. Output a significant amount of intermediate data during a job 2. Access a dataset from a faster medium than bigdata or the home directories 3. Write out lock files

These types of activities are well suited to the use of fast non-persistent spaces. Below are the filesystems available on the HPC cluster that would best suited for these actions.

Temporary Space This is a standard space available on all Linux systems. Please be aware that it is limited to the amount of free disk space on the node you are running on.

Path/tmp
User AvailabilityAll Users
Node AvailabilityAll Nodes
Quota ResponsibilityN/A

SSD Backed Space This space is much faster than the persistent space (/rhome,/bigdata), but slower than using RAM based storage.

Path/scratch
User AvailabilityAll Users
Node AvailabilityAll Nodes
Quota ResponsibilityN/A

RAM Space This type of space takes away from physical memory but allows extremely fast access to the files located on it. When submitting a job you will need to factor in the space your job is using in RAM as well. For example, if you have a dataset that is 1G in size and use this space, it will take at least 1G of RAM.

Path/dev/shm
User AvailabilityAll Users
Node AvailabilityAll Nodes
Quota ResponsibilityN/A

Usage and Quotas

To quickly check your usage and quota limits:

check_quota home
check_quota bigdata

To get the usage of your current directory, run the following command:

du -sh .

To calculate the sizes of each separate sub directory, run:

du -sch * 
du -sch .[!.]* * | sort -h # includes hidden files and directories

This may take some time to complete, please be patient.

For more information on your home directory, please see the Linux Basics Orientation.

Automatic Backups

The cluster does create backups however it is still advantageous for users to periodically make copies of their critical data to a separate storage device. The cluster is a production system for research computations with a very expensive high-performance storage infrastructure. It is not a data archiving system.

Home backups are created daily and kept for one week. Bigdata backups are created weekly and kept for one month.

Home and bigdata backups are located under the following respective directories:

/rhome/.snapshots/
/bigdata/.snapshots/

The individual snapshot directories have names with numerical values in epoch time format. The higher the value the more recent the snapshot.

To view the exact time of when each snapshot was taken execute the following commands:

mmlssnapshot home
mmlssnapshot bigdata
Last modified July 9, 2023: manual update (30a591830)