COLSA Premise Usage UNH Logo

Table of contents

COLSA Overview

Information specific to COLSA usage of the Premise cluster is listed in the sections below. If you have any questions regarding COLSA usage of Premise, or would like to schedule a training session, please contact Toni Westbrook.

Slurm Templates

A selection of templates are available to use as a foundation for your Slurm scripts. These are especially recommended in the case of any MPI compatible software, such as MAKER or ABySS. All templates may be found on Premise in the following directory:

/mnt/lustre/hcgs/shared/slurm-templates

These templates may be copied into your home directory, and then modified. Each script is heavily commented to aid in changing specific parameters relevant to your job, such as ensuring allocation of high memory nodes. The four available templates are as follows:

threaded.slurm
This template is suitable for software that runs as a single process with multiple threads on a single node, which represents the majority of bioinformatics software installed on Premise.
parallel.slurm
This template is designed for executing multiple, low-thread count jobs concurrently on a single node. Two styles of parallel execution are shown in the template, including a method of spawning a process for each file in a directory. This could be used for scenarios such as simultaneously running an instance of an application for each FASTQ file.
abyss.slurm
This template ensures MPI is loaded correctly for ABySS jobs that use multiple nodes, but is also suitable for single node use.
maker.slurm
This template ensures MPI is loaded correctly for MAKER jobs that use multiple nodes, but is also suitable for single node use. MAKER will not function properly using the MPI instructions in the GMOD tutorials, please make use of this template instead.

Reference Databases

The number of reference databases are downloaded and indexed regularly across a selection of popular alignment tools. These may be found on Premise in the following directory:

/mnt/lustre/hcgs/shared/databases

When making use of these files, please do not copy them into your home directory. Instead, either direct your aligner to use them directly, or symbolically link to these files. As some of these FASTA files create especially large indexes that take over a week to create, please make use of this shared directory to avoid unnecessarily allocating compute resources.

Group Shared Directories

Each PI or group on Premise has a shared folder located in the group's directory, for example:

/mnt/lustre/macmaneslab/shared

These directories are intended for large or numerous files shared between multiple users in the group, such as sequences, references, software, etc. This avoids copying the same file to multiple users, alleviating disk space consumption and version management.

Monitoring Jobs

While executing a job, to ensure that threading and other parameters have been specified correctly, it can be helpful to monitor metrics like CPU and memory usage. While directly running utilities like top will only monitor the head node, we have developed slurm-monitor, which will show top as if connected to the relevant compute nodes. Usage is as follows:

slurm-monitor <job ID>

Note - for jobs that span multiple nodes, the active node may be cycled within slurm-monitor using the [ and ] keys.

Personal Anaconda Virtual Environments

The Anaconda installation on Premise accommodates both system-wide and user-specific virtual environments. Personal environments allow the user to install specific versions of libraries per application, often necessary for bioinformatics pipelines.

To begin working with any Anaconda environment, load the Anaconda environmental module:

module load anaconda/colsa

Note - it will be necessary to unload the "linuxbrew/colsa" module before loading the anaconda module. After the module is loaded, Anaconda environments may be activated and deactivated with the following commands, respectively:

source activate <environment name>
source deactivate

While an Anaconda environment is active, any software or libraries installed within the virtual environment will be available, and any new software installations using the "conda" utility will be installed within the active environment. For a list of bioinformatics software available through the Bioconda channel, see their website.

Connecting to the Bioconda channel, creating an environment, cloning recommended settings, and adding Python 2.7 and samtools 0.1.18 to the environment is outlined in the following example:

# Note - setting up these channels only needs to be done once per user
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# The following creates an Anaconda environment with the recommended
# configuration and adds software to it
module load anaconda/colsa
conda create --name test-pipeline --clone template
source activate test-pipeline
conda install python=2.7 samtools=0.1.18

Installed Software Packages

A number of bioinformatics related packages and programming language interpreters are pre-installed on Premise and ready to use. These are available in either the linuxbrew/colsa or anaconda/colsa modules, as listed below. We are also happy to install any missing software package; feel free to send us an email with the link to the software.

Prior to using any software on Premise, the corresponding enviornmental module must be loaded:

module load <module name>

The following packages are available in the linuxbrew/colsa module:

The following packages are available in the anaconda/colsa module (environment name indicated within parantheses):

NH-INBRE Support

Research supported by New Hampshire-INBRE through an Institutional Development Award (IDeA), P20GM103506, from the National Institute of General Medical Sciences of the NIH.

For more information on New Hampshire-INBRE, please visit nhinbre.org.