Premise Software Usage UNH Logo

Table of contents

Major Software Packages

slurm

The batch system is used to schedule jobs to run on the compute nodes. Each command supports a --help option and has a Unix man page.

More information can be found here:

Common commands include:

sbatch
Submit a script to be run when the required resource are available.
scancel
Cancel a job currently running or waiting to be run.
squeue
Shows your jobs currently running or waiting to be run.
sacct
Shows your jobs that have completed or failed to run.

modules

Used to select specific software packages and exact versions. See the official module man page for more information.

Common commands include:

module avail
Provides a list of modules available on this cluster
module load X
Load the package X into the current shells environment. If more than one version is available it is specified as X/version.
module list
Display the list of packages currently loaded in this shell.

MATLAB

Users of Matlab on the Premise compute cluster should not run graphically on the Premise head node. Unlike running on your desktop, matlab jobs must be submitted to the Slurm job queue. A helper script has been created to submit your matlab.m scripts for you.

Run your matlab script with:

[rea@premise ~]$ module load matlab
[rea@premise ~]$ sMATLAB.py matlabscriptname.m

Use "sMATLAB.py --help" to describe available options and defaults.

Adding the "--verbose" option to sMATLAB.py displays both the Slurm sbatch command line and helper job script that is being generated for you. This could be used as a starting point for users wishing to create their own Slurm scripts.

Note that Matlab does not automatically use all the cores on a node, or split a job across multiple nodes for you. These features must be coded into your scripts. You may find the following links helpful:

Here is an example matlab script utilizing "parfor" to iterate work across all the cores on one node. Using Matlab on more than one node is not supported. Premise nodes currently all have 24 cores.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
parpool(str2num(getenv('SLURM_JOB_CPUS_PER_NODE')));   % workers=24 cores per Premise node
tic   % start timer
ticBytes(gcp);   % time should include distribution transfers
n = 1024;
A = zeros(n);
parfor (i = 1:n)   % Distribute these "n" iterations over workers in parpool.
    A(i,:) = (1:n) .* sin(i*2*pi/1024);
end
tocBytes(gcp)   % timer should include collection transfers
toc   % stop & display elapsed time.

Installed Modules

COLSA Overview

Information specific to COLSA usage of the Premise cluster is listed in the sections below. If you have any questions regarding COLSA usage of Premise, or would like to schedule a training session, please contact Toni Westbrook.

Slurm Templates

A selection of templates are available to use as a foundation for your Slurm scripts. These are especially recommended in the case of any MPI compatible software, such as MAKER or ABySS. All templates may be found on Premise in the following directory:

/mnt/lustre/hcgs/shared/slurm-templates

These templates may be copied into your home directory, and then modified. Each script is heavily commented to aid in changing specific parameters relevant to your job, such as ensuring allocation of high memory nodes. The four available templates are as follows:

threaded.slurm
This template is suitable for software that runs as a single process with multiple threads on a single node, which represents the majority of bioinformatics software installed on Premise.
parallel.slurm
This template is designed for executing multiple, low-thread count jobs concurrently on a single node. Two styles of parallel execution are shown in the template, including a method of spawning a process for each file in a directory. This could be used for scenarios such as simultaneously running an instance of an application for each FASTQ file.
abyss.slurm
This template ensures MPI is loaded correctly for ABySS jobs that use multiple nodes, but is also suitable for single node use.
maker.slurm
This template ensures MPI is loaded correctly for MAKER jobs that use multiple nodes, but is also suitable for single node use. MAKER will not function properly using the MPI instructions in the GMOD tutorials, please make use of this template instead.

Reference Databases

The number of reference databases are downloaded and indexed regularly across a selection of popular alignment tools. These may be found on Premise in the following directory:

/mnt/lustre/hcgs/shared/databases

When making use of these files, please do not copy them into your home directory. Instead, either direct your aligner to use them directly, or symbolically link to these files. As some of these FASTA files create especially large indexes that take over a week to create, please make use of this shared directory to avoid unnecessarily allocating compute resources.

Group Shared Directories

Each PI or group on Premise has a shared folder located in the group's directory, for example:

/mnt/lustre/macmaneslab/shared

These directories are intended for large or numerous files shared between multiple users in the group, such as sequences, references, software, etc. This avoids copying the same file to multiple users, alleviating disk space consumption and version management.

Monitoring Jobs

While executing a job, to ensure that threading and other parameters have been specified correctly, it can be helpful to monitor metrics like CPU and memory usage. While directly running utilities like top will only monitor the head node, we have developed slurm-monitor, which will show top as if connected to the relevant compute nodes. Usage is as follows:

slurm-monitor <job ID>

Note - for jobs that span multiple nodes, the active node may be cycled within slurm-monitor using the [ and ] keys.

Personal Anaconda Virtual Environments

The Anaconda installation on Premise accommodates both system-wide and user-specific virtual environments. Personal environments allow the user to install specific versions of libraries per application, often necessary for bioinformatics pipelines.

To begin working with any Anaconda environment, load the Anaconda environmental module:

module load anaconda/colsa

Note - it will be necessary to unload the "linuxbrew/colsa" module before loading the anaconda module. After the module is loaded, Anaconda environments may be activated and deactivated with the following commands, respectively:

conda activate <environment name>
conda deactivate

While an Anaconda environment is active, any software or libraries installed within the virtual environment will be available, and any new software installations using the "conda" utility will be installed within the active environment. For a list of bioinformatics software available through the Bioconda channel, see their website.

Connecting to the Bioconda channel, creating an environment, cloning recommended settings, and adding Python 2.7 and samtools 0.1.18 to the environment is outlined in the following example:

# Note - setting up these channels only needs to be done once per user
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# The following creates an Anaconda environment with the recommended
# configuration and adds software to it
module load anaconda/colsa
conda create --name test-pipeline --clone template
conda activate test-pipeline
conda install python=2.7 samtools=0.1.18

Installed Software Packages

A number of bioinformatics related packages and programming language interpreters are pre-installed on Premise and ready to use. These are available in either the linuxbrew/colsa or anaconda/colsa modules, as listed below. We are also happy to install any missing software package; feel free to send us an email with the link to the software.

Prior to using any software on Premise, the corresponding enviornmental module must be loaded:

module load <module name>

The following packages are available in the linuxbrew/colsa module:

The following packages are available in the anaconda/colsa module:

Note: The Anaconda environment name of each package below is indicated within parentheses. Please see here for instructions on how to activate an Anaconda environment):

NH-INBRE Support

Research supported by New Hampshire-INBRE through an Institutional Development Award (IDeA), P20GM103506, from the National Institute of General Medical Sciences of the NIH.

For more information on New Hampshire-INBRE, please visit nhinbre.org.