Dashboard

HPCC cluster users are able to check on their home and bigdata storage usage from the Dashboard Portal.

Home

Home directories are where you start each session on biocluster and where your jobs start when running on the cluster. This is usually where you place the scripts and various things you are working on. This space is very limited. Please remember that the home storage space quota per user account is 20 GB.

Path /rhome/
User Availability All Users
Node Availability All Nodes
Quota Responsibility User

Bigdata

Bigdata is an area where large amounts of storage can be made available to users. A lab purchases bigdata space separately from access to the cluster. This space is then made available to the lab via a shared directory and individual directories for each user.

Lab Shared Space This directory can be accessed by the lab as a whole.

Path /bigdata//shared
User Availability Labs that have purchased space.
Node Availability All Nodes
Quota Responsibility Lab

Individual User Space This directory can be accessed by specific lab members.

Path /bigdata//
User Availability Labs that have purchased space.
Node Availability All Nodes
Quota Responsibility Lab

Non-Persistent Space

Frequently, there is a need to do things like, output a significant amount of intermediate data during a job, access a dataset from a faster medium than bigdata or the home directories or write out lock files. These types of things are well suited to the use of non-persistent spaces. Below are the filesystems available on biocluster.

RAM Space This type of space takes away from physical memory but allows extremely fast access to the files located on it. When submitting a job you will need to factor in the space your job is using in RAM as well. For example, if you have a dataset that is 1G in size and use this space, it will take at least 1G of RAM.

Path /dev/shm
User Availability All Users
Node Availability All Nodes
Quota Responsibility N/A

Temporary Space This is a standard space available on all Linux systems. Please be aware that it is limited to the amount of free disk space on the node you are running on.

Path /tmp
User Availability All Users
Node Availability All Nodes
Quota Responsibility N/A

SSD Backed Space This space is much faster than the standard temporary space, but slower than using RAM based storage.

Path /scratch
User Availability All Users
Node Availability All Nodes
Quota Responsibility N/A

Usage and Quotas

To quickly check your usage and quota limits:

check_quota home
check_quota bigdata

To get the usage of your current directory, run the following command:

du -sh .

To calculate the sizes of each separate sub directory, run:

du -shc *

This may take some time to complete, please be patient.

For more information on your home directory, please see the Linux Basics Orientation.

Automatic Backups

The cluster does create backups however it is still advantageous for users to periodically make copies of their critical data to a separate storage device. The cluster is a production system for research computations with a very expensive high-performance storage infrastructure. It is not a data archiving system.

Home backups are created daily and kept for one month. Bigdata backups are created weekly and kept for one month.

Home and bigdata backups are located under the following respective directories:

/rhome/.snapshots/
/bigdata/.snapshots/

The individual snapshot directories have names with numerical values in epoch time format. The higher the value the more recent the snapshot.

To view the exact time of when each snapshot was taken execute the following commands:

mmlssnapshot home
mmlssnapshot bigdata