Information specific to COLSA usage of the Premise cluster is listed in the
sections below. If you have any questions regarding COLSA usage of
Premise, or would like to schedule a training session, please contact
A selection of templates are available to use as a foundation for your Slurm
scripts. These are especially recommended in the case of any MPI compatible
software, such as MAKER or ABySS. All templates may be found on Premise
in the following directory:
These templates may be copied into your home directory, and then modified.
Each script is heavily commented to aid in changing specific parameters
relevant to your job, such as ensuring allocation of high memory nodes. The
four available templates are as follows:
This template is suitable for software that runs as a single
process with multiple threads on a single node, which represents the majority
of bioinformatics software installed on Premise.
This template is designed for executing multiple, low-thread count jobs
concurrently on a single node. Two styles of parallel execution are
shown in the template, including a method of spawning a process for each file
in a directory. This could be used for scenarios such as simultaneously
running an instance of an application for each FASTQ file.
This template ensures MPI is loaded correctly for ABySS jobs that use
multiple nodes, but is also suitable for single node use.
This template ensures MPI is loaded correctly for MAKER jobs that use
multiple nodes, but is also suitable for single node use. MAKER
will not function properly using the MPI instructions in the GMOD
tutorials, please make use of this template instead.
The number of reference databases are downloaded and indexed regularly
across a selection of popular alignment tools. These may be found on
Premise in the following directory:
When making use of these files, please do not copy them into your home
directory. Instead, either direct your aligner to use them directly, or
symbolically link to these files. As some of these FASTA files create
especially large indexes that take over a week to create, please make use of
this shared directory to avoid unnecessarily allocating compute resources.
Group Shared Directories
Each PI or group on Premise has a shared folder located in the group's
directory, for example:
These directories are intended for large or numerous files shared between
multiple users in the group, such as sequences, references, software, etc.
This avoids copying the same file to multiple users, alleviating disk space
consumption and version management.
While executing a job, to ensure that threading and other parameters have
been specified correctly, it can be helpful to monitor metrics like CPU and
memory usage. While directly running utilities like top will only monitor
the head node, we have developed slurm-monitor, which will show top as
if connected to the relevant compute nodes. Usage is as follows:
slurm-monitor <job ID>
Note - for jobs that span multiple nodes, the active node may be cycled within
slurm-monitor using the [ and ] keys.
Personal Anaconda Virtual Environments
The Anaconda installation on Premise accommodates both system-wide and
user-specific virtual environments. Personal environments allow the user to
install specific versions of libraries per application, often necessary for
To begin working with any Anaconda environment, load the Anaconda environmental
module load anaconda/colsa
Note - it will be necessary to unload the "linuxbrew/colsa" module before
loading the anaconda module. After the module is loaded, Anaconda environments
may be activated and deactivated with the following commands, respectively:
While an Anaconda environment is active, any software or libraries installed
within the virtual environment will be available, and any new software
installations using the "conda" utility will be installed within the
active environment. For a list of bioinformatics software available through
the Bioconda channel, see their
Connecting to the Bioconda channel, creating an environment, cloning
recommended settings, and adding Python 2.7 and samtools 0.1.18
to the environment is outlined in the following example:
# Note - setting up these channels only needs to be done once per user
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
# The following creates an Anaconda environment with the recommended
# configuration and adds software to it
module load anaconda/colsa
conda create --name test-pipeline --clone template
source activate test-pipeline
conda install python=2.7 samtools=0.1.18
Installed Software Packages
A number of bioinformatics related packages and programming language
interpreters are pre-installed on Premise and ready to use. These are available
in either the linuxbrew/colsa or anaconda/colsa modules, as listed below.
We are also happy to install any missing software package; feel free to send
us an email with the link to the software.
Prior to using any software on Premise, the corresponding enviornmental
module must be loaded:
module load <module name>
The following packages are available in the linuxbrew/colsa module:
ABySS 2.0.2 -- Paired-end short read sequence assembler. website
AMPtk 0.9.2 -- Process amplicon data using USEARCH/VSEARCH. website
ARAGORN 1.2.36 -- Detection of tRNA and tmRNA in nucleotide sequences. website
Augustus 3.2.2 -- Gene prediction in eukaryotic genomic sequences. website
BamTools 2.4.0 -- Toolkit for working with BAM (Binary Alignment Map) data. website