Submit a script to be run when the required resource are available.
scancel
Cancel a job currently running or waiting to be run.
squeue
Shows your jobs currently running or waiting to be run.
sacct
Shows your jobs that have completed or failed to run.
modules
Used to select specific software packages and exact versions. See the
official module man page for more
information.
Common commands include:
module avail
Provides a list of modules available on this cluster
module load X
Load the package X into the current shells environment. If more than one version is available it is specified as X/version.
module list
Display the list of packages currently loaded in this shell.
MATLAB
Users of Matlab on the Premise compute cluster should not run
graphically on the Premise head node. Unlike running on your desktop,
matlab jobs must be submitted to the Slurm job queue. A helper script
has been created to submit your matlab.m scripts for you.
Use "sMATLAB.py --help" to describe available options and defaults.
Adding the "--verbose" option to sMATLAB.py displays both the Slurm
sbatch command line and helper job script that is being generated for you.
This could be used as a starting point for users wishing to create
their own Slurm scripts.
Note that Matlab does not automatically use all the cores on a node,
or split a job across multiple nodes for you. These features must be
coded into your scripts. You may find the following links helpful:
Here is an example matlab script utilizing "parfor" to iterate work
across all the cores on one node. Using Matlab on more than one node is not
supported. Premise nodes currently all have 24 cores.
1
2
3
4
5
6
7
8
9
10
parpool(str2num(getenv('SLURM_JOB_CPUS_PER_NODE')));% workers=24 cores per Premise nodetic% start timerticBytes(gcp);% time should include distribution transfersn=1024;A=zeros(n);parfor(i=1:n)% Distribute these "n" iterations over workers in parpool.A(i,:)=(1:n).*sin(i*2*pi/1024);endtocBytes(gcp)% timer should include collection transferstoc% stop & display elapsed time.
Installed Modules
namd -- Parallel molecular dynamics simulation of biomolecular systems. website
namd/NAMD_2.12_Linux-x86_64-ibverbs-smp(default)
namd/NAMD_2.12_Linux-x86_64-multicore-CUDA
namd/NAMD_2.12_Linux-x86_64-ibverbs-smp-CUDA
namd/namd-2.11-ibverbs-smp
namd/namd-2.11-ibverbs-smp-CUDA
gromacs/gromacs-5.1.2 -- package to perform molecular dynamics. website
vmd -- Molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics. website
matlab/matlab-r2016a -- Math and graphing. website
COLSA Overview
Information specific to COLSA usage of the Premise cluster is listed in the
sections below. If you have any questions regarding COLSA usage of
Premise, or would like to schedule a training session, please contact
Toni Westbrook.
Slurm Templates
A selection of templates are available to use as a foundation for your Slurm
scripts. These are especially recommended in the case of any MPI compatible
software, such as MAKER or ABySS. All templates may be found on Premise
in the following directory:
/mnt/lustre/hcgs/shared/slurm-templates
These templates may be copied into your home directory, and then modified.
Each script is heavily commented to aid in changing specific parameters
relevant to your job, such as ensuring allocation of high memory nodes. The
four available templates are as follows:
threaded.slurm
This template is suitable for software that runs as a single
process with multiple threads on a single node, which represents the majority
of bioinformatics software installed on Premise.
parallel.slurm
This template is designed for executing multiple, low-thread count jobs
concurrently on a single node. Two styles of parallel execution are
shown in the template, including a method of spawning a process for each file
in a directory. This could be used for scenarios such as simultaneously
running an instance of an application for each FASTQ file.
abyss.slurm
This template ensures MPI is loaded correctly for ABySS jobs that use
multiple nodes, but is also suitable for single node use.
maker.slurm
This template ensures MPI is loaded correctly for MAKER jobs that use
multiple nodes, but is also suitable for single node use. MAKER
will not function properly using the MPI instructions in the GMOD
tutorials, please make use of this template instead.
Reference Databases
The number of reference databases are downloaded and indexed regularly
across a selection of popular alignment tools. These may be found on
Premise in the following directory:
/mnt/lustre/hcgs/shared/databases
When making use of these files, please do not copy them into your home
directory. Instead, either direct your aligner to use them directly, or
symbolically link to these files. As some of these FASTA files create
especially large indexes that take over a week to create, please make use of
this shared directory to avoid unnecessarily allocating compute resources.
Group Shared Directories
Each PI or group on Premise has a shared folder located in the group's
directory, for example:
/mnt/lustre/macmaneslab/shared
These directories are intended for large or numerous files shared between
multiple users in the group, such as sequences, references, software, etc.
This avoids copying the same file to multiple users, alleviating disk space
consumption and version management.
Monitoring Jobs
While executing a job, to ensure that threading and other parameters have
been specified correctly, it can be helpful to monitor metrics like CPU and
memory usage. While directly running utilities like top will only monitor
the head node, we have developed slurm-monitor, which will show top as
if connected to the relevant compute nodes. Usage is as follows:
slurm-monitor <job ID>
Note - for jobs that span multiple nodes, the active node may be cycled within
slurm-monitor using the [ and ] keys.
Personal Anaconda Virtual Environments
The Anaconda installation on Premise accommodates both system-wide and
user-specific virtual environments. Personal environments allow the user to
install specific versions of libraries per application, often necessary for
bioinformatics pipelines.
To begin working with any Anaconda environment, load the Anaconda environmental
module:
module load anaconda/colsa
Note - it will be necessary to unload the "linuxbrew/colsa" module before
loading the anaconda module. After the module is loaded, Anaconda environments
may be activated and deactivated with the following commands, respectively:
While an Anaconda environment is active, any software or libraries installed
within the virtual environment will be available, and any new software
installations using the "conda" utility will be installed within the
active environment. For a list of bioinformatics software available through
the Bioconda channel, see their
website.
Connecting to the Bioconda channel, creating an environment, cloning
recommended settings, and adding Python 2.7 and samtools 0.1.18
to the environment is outlined in the following example:
# Note - setting up these channels only needs to be done once per user
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
# The following creates an Anaconda environment with the recommended
# configuration and adds software to it
module load anaconda/colsa
conda create --name test-pipeline --clone template
conda activate test-pipeline
conda install python=2.7 samtools=0.1.18
Installed Software Packages
A number of bioinformatics related packages and programming language
interpreters are pre-installed on Premise and ready to use. These are available
in either the linuxbrew/colsa or anaconda/colsa modules, as listed below.
We are also happy to install any missing software package; feel free to send
us an email with the link to the software.
Prior to using any software on Premise, the corresponding enviornmental
module must be loaded:
module load <module name>
The following packages are available in the linuxbrew/colsa module:
ABySS 2.1.0 -- Paired-end short read sequence assembler. website
agrep 0.8.0 -- Grep with approximate and weighted matches. website
USEARCH 9.2.64 -- Search and clustering toolkit. website
VCFtools 0.1.15 -- Toolkit for working with VCF files. website
Velvet 1.2.10 -- Short read sequence assembler. website
VolcanoFinder 1.0 -- Detect events of adaptive introgression. website
VSEARCH 2.9.0 -- USEARCH alternative for nucleotide alignment. website
The following packages are available in the anaconda/colsa module:
Note: The Anaconda environment name of each package below is indicated within parentheses. Please see here for instructions on how to activate an Anaconda environment):
admixture v1.3.0 (admixture-1.3.0) -- ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.. website
AlphaFold 2.3.1 (alphafold-2.3.1) -- AI system developed by Google DeepMind that predicts a protein’s 3D structure from its amino acid sequence. website
antiSMASH 7.1.0 (antismash-7.1.0) -- Antibiotics and secondaryh metabolite analysis. website
bcftools v1.17 (bcftools-1.17) -- BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.. website
bcl2fastq 2.20.0 (bcl2fastq-2.20.0) -- Convert and demultiplex BCL files to FASTQ files. website
Beagle 3.3 (beagle-3.3) -- Package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. website
Beagle v5.2_21 (beagle-5.2_21) -- Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.. website
Blast v2.14.0 (blast-2.14.0) -- BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.. website
Blobtools 1.1.1 (blobtools-1.1.1) -- Visualization, QC, and taxonomic partitioning.. website
BRAKER 3.0.3 (braker-3.0.3) -- A combination of GeneMark-ET R2 and AUGUSTUS R3, R4, that uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genome.. website
Braker 3.0.8 (braker-3.0.8) -- a combination of GeneMark-ET and AUGUSTUS that uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genome.. website
Busco v5.4.4 (busco-5.4.4) -- Assessment of assembly completeness using Universal Single Copy Orthologs. website
Busco v5.4.7 (busco-5.4.7) -- Assessment of assembly completeness using Universal Single Copy Orthologs. website
BUSCO 5.beta (busco-5.beta) -- Assessment of genome assemblies, gene sets, and transcriptome completeness. website
Centromere Seeker 2.1 (centromere-seeker-2.1) -- bash script to search for centromeric repeat patterns in long sequence data, using several current tools (trf and R). website
CG Pipeline (cg-pipeline) -- Genome assembly and annotation pipline. website
COLONY 2.0.6.8 (colony-2.0.6.8) -- Assign sibship and parentage via maximum likelihood. website
Cutadapt 3.2 (cutadapt-3.2) -- Trim adapters and primers from read sequences. website
dammit 1.2 (dammit-1.2) -- De novo transcriptome annotator. website
Dedalus 3.0.2 (dedalus-3.0.2) -- A flexible framework for solving partial differential equations using modern spectral methods. website
bmx_MMPBSA 1.6.2 (gmx_mmpbsa-1.6.2) -- gmx_MMPBSA is a new tool based on AMBER's MMPBSA.py aiming to perform end-state free energy calculations with GROMACS files.. website
GTDB-Tk 1.5.1 (gtdbtk-1.5.1) -- Assign taxonomy to bacterial and archaeal genomes. website
GTDB-Tk 2.1.0 (gtdbtk-2.1.0) -- Assign taxonomy to bacterial and archaeal genomes. website
HAL 2.1 (hal-2.1) -- Store and index multiple genome alignments and ancestral reconstructions. website
hicat1.1.0 (hicat-1.1.0) -- HiCAT is a generalized computational tool based on hierarchical tandem repeat mining (HTRM) method to automatically process centromere annotation.. website
HTSeq 0.13.5 (htseq-0.13.5) -- High-throughput sequencing data analysis tools. website
HyPhy 2.5.26 (hyphy-2.5.26) -- Hypothesis testing using phylogenies. website
hyphy 2.5.59 (hyphy-2.5.59) -- An open-source software package for comparative sequence analysis using stochastic evolutionary models.. website
I-TASSER 5.2 (i-tasser-5.2) -- I-TASSER Suite is a package of standalone computer programs, developed for high-resolution protein structure prediction, refinement, and structure-based function annotations.. website
I-TASSER-MTD 1.0 (i-tasser-mtd-1.0) -- I-TASSER-MTD is a hierarchical protocol to predict structures and functions of multi-domain (MTD) proteins.. website
IMP 2.18.0 (imp-2.18.0) -- IMP's broad goal is to contribute to a comprehensive structural characterization of biomolecules ranging in size and complexity from small peptides to large macromolecular assemblies, by integrating data from diverse biochemical and biophysical experiments.. website
IMP 2.20.1 (imp-2.20.1) -- IMP's broad goal is to contribute to a comprehensive structural characterization of biomolecules ranging in size and complexity from small peptides to large macromolecular assemblies, by integrating data from diverse biochemical and biophysical experiments.. website
InStrain 1.5.3 (instrain-1.5.3) -- Analysis of co-occurring populations in metagenomes. website
InterProScan 5.69.101 (interproscan-5.69.101) -- Align and characterize sequences against InterPro databases. website
ipyrad v0.9.90 (ipyrad-0.9.90) -- Interactive assembly and analysis of RAD-seq data sets.. website
IQ-TREE 2.0.3 (iqtree-2.0.3) -- Maximum-Likelihood inference of phylogeny.. website
IQ-TREE 2.2.6 (iqtree-2.2.6) -- Maximum-Likelihood inference of phylogeny.. website
JAGS 4.3.2 (jags-4.3.2) -- Just Another Gibbs Sampler (with R installed). website
Juicer 1.8.9 (juicer-1.8.9) -- Pipeline for processing Hi-C datasets. website
Julia 1.8.2 (julia-1.8.2) -- Julia programming language. website
Jupyter 20190324 (jupyter-20190324) -- Jupyter lab environment with Python and R kernels. website
KAT (kat-2.4.2) -- Analyze hashes and sequence files from the Jellyfish package. website
Kraken v2.1.2 (kraken-2.1.2) -- The second version of the Kraken taxonomic sequence classification system. website
kSNP 3.1 (ksnp-3.1) -- SNP discovery and annotation for whole genomes. website
MetaLAFFA 1.0.1 (metalaffa-1.0.1) -- MetaLAFFA is a pipeline for annotating shotgun metagenomic data with abundances of functional orthology groups.. website
ngsLD 1.2.0 (ngsld-1.2.0) -- Estimate pairwise linkage disequilibrium (LD) taking the uncertainty of genotype's assignation into account. website
NgsRelate 2.0 (ngsrelate-2.0) -- Infer relatedness, inbreeding coefficients and many other summary statistics for pairs of individuals from low coverage Next Generation Sequencing (NGS) data. website
ngsTools (ngstools-20190326) -- Population genetics analysis (includes ANGSD and NgsRelate). website
NMRPipe 10.4 (nmrpipe-10.4) -- NMR spectroscopic data processing and analysis.. website
Nullarbor 2.0.20181010 (nullarbor-2.0.20181010) -- Generate public health microbiology reports. website
OPERA 2.0.6 (opera-2.0.6) -- Assembly of paired-end/long reads. website
ORCA 5.0.4 (orca-5.0.4) -- A powerful and versatile quantum chemistry software package. website
Oyster River Protocol 2.3.3 (orp-2.3.3) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.0.0 (orp-20180828) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.1.1 (orp-20190215) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.2.8 (orp-20191014) -- Transcriptome assembly pipeline. website
Pacbio Tools (pacbio-20190801) -- Pacbio suite of tools, including Falcon. website
PALADIN 1.5.0 (paladin-1.5.0) -- Nucleotide alignment and UniProt reporting against annotated nucleotide/transcriptome/protein references. website
PALADIN 1.6.0 (paladin-1.6.0) -- Nucleotide alignment and UniProt reporting against annotated nucleotide/transcriptome/protein references. website
PASTA 1.8.2 (pasta-1.8.2) -- Estimate alignments and ML trees from unaligned sequences. website
PCAngsd 1.2 (pcangsd-1.2) -- Estimates the covariance matrix and individual allele frequencies for low-depth next-generation sequencing (NGS) data in structured/heterogeneous populations.. website
PHAST 1.5 (phast-1.5) -- Phylogenetic Analysis with Space/Time Models. website
PhyloAcc 2.3.3 (phyloacc-2.3.3) -- Bayesian estimation of lineage specific substitution rates in conserved non-coding regions while accounting for phylogenetic discordance. website
Plink 2.00a3LM (plink-2.00a3LM) -- Whole genome data analysis toolset. website
pmx (pmx-4.1.3) -- A versatile (bio-) molecular structure manipulation package with some additional functionalities,. website
PopLDdecay 3.42 (poplddecay-3.42) -- a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. website
proovread 2.14.1 (proovread-2.14.1) -- Correct PacBio and Illumina reads. website
Shapeit 4.2.2 (shapeit-4.2.2) -- Fast and accurate method for estimation of haplotypes (phasing). website
SMC++ 1.15.5 (smcpp-1.15.5) -- SMC++ infers population history from whole-genome sequence data.. website
Smudgeplot 0.2.5 (smudgeplot-0.2.5) -- Inference of ploidy and heterozygosity structure using whole genome sequencing data. website
Snippy 3.2 (snippy-3.2) -- Bactrial SNP calling and core genome alignments. website
Spaceranger v2.0.1 (spaceranger-2.0.1) -- Space Ranger is delivered as a single, self-contained tar file that can be unpacked anywhere on your system.. website
Split Pipe 1.3.1 (spipe-1.3.1) -- allows analysis of parse bioseciences single cell rna sequencing data. website
Tassel 5.2.40 (tassel-5.2.40) -- Genotyping by Sequencing (GBS) pipeline. website
Trans-ABySS 2.0.1 (transabyss-2.0.1) -- De novo assembly of RNA-Seq data. website
treemix 1.13 (treemix-1.13) -- TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.. website
WRF 4.5.1 (wrf-4.5.1) -- Weather Research & Forecasting Model (WRF). website
NH-INBRE Support
Research supported by New Hampshire-INBRE through an Institutional Development Award (IDeA), P20GM103506, from the National Institute of General Medical Sciences of the NIH.
For more information on New Hampshire-INBRE, please visit nhinbre.org.