Premise Software Usage

Premise Software Usage

Major Software Packages

slurm

The batch system is used to schedule jobs to run on the compute nodes. Each command supports a --help option and has a Unix man page.

More information can be found here:

Common commands include:

sbatch: Submit a script to be run when the required resource are available.
scancel: Cancel a job currently running or waiting to be run.
squeue: Shows your jobs currently running or waiting to be run.
sacct: Shows your jobs that have completed or failed to run.

modules

Used to select specific software packages and exact versions. See the official module man page for more information.

Common commands include:

module avail: Provides a list of modules available on this cluster
module load X: Load the package X into the current shells environment. If more than one version is available it is specified as X/version.
module list: Display the list of packages currently loaded in this shell.

MATLAB

Users of Matlab on the Premise compute cluster should not run graphically on the Premise head node. Unlike running on your desktop, matlab jobs must be submitted to the Slurm job queue. A helper script has been created to submit your matlab.m scripts for you.

Run your matlab script with:

[rea@premise ~]$ module load matlab
[rea@premise ~]$ sMATLAB.py matlabscriptname.m

Use "sMATLAB.py --help" to describe available options and defaults.

Adding the "--verbose" option to sMATLAB.py displays both the Slurm sbatch command line and helper job script that is being generated for you. This could be used as a starting point for users wishing to create their own Slurm scripts.

Note that Matlab does not automatically use all the cores on a node, or split a job across multiple nodes for you. These features must be coded into your scripts. You may find the following links helpful:

https://people.sc.fsu.edu/~jburkardt/examples/slurm/slurm.html Especially the "MATLAB_COMMANDLINE" link half way down the page.
https://www.chpc.utah.edu/documentation/software/matlab.php
https://hcc-docs.unl.edu/display/HCCDOC/Submitting+Matlab+Jobs
https://rcc.uchicago.edu/docs/software/environments/matlab/
https://github.com/haught/matlab-slurm/blob/master/communicatingJobWrapper.sh

Here is an example matlab script utilizing "parfor" to iterate work across all the cores on one node. Using Matlab on more than one node is not supported. Premise nodes currently all have 24 cores.

parpool(str2num(getenv('SLURM_JOB_CPUS_PER_NODE')));   % workers=24 cores per Premise node
tic   % start timer
ticBytes(gcp);   % time should include distribution transfers
n = 1024;
A = zeros(n);
parfor (i = 1:n)   % Distribute these "n" iterations over workers in parpool.
    A(i,:) = (1:n) .* sin(i*2*pi/1024);
end
tocBytes(gcp)   % timer should include collection transfers
toc   % stop & display elapsed time.

Installed Modules

namd -- Parallel molecular dynamics simulation of biomolecular systems. website
- namd/NAMD_2.12_Linux-x86_64-ibverbs-smp(default)
- namd/NAMD_2.12_Linux-x86_64-multicore-CUDA
- namd/NAMD_2.12_Linux-x86_64-ibverbs-smp-CUDA
- namd/namd-2.11-ibverbs-smp
- namd/namd-2.11-ibverbs-smp-CUDA
gromacs/gromacs-5.1.2 -- package to perform molecular dynamics. website
vmd -- Molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics. website
- vmd/vmd-1.9.2-opengl(default)
- vmd/vmd-1.9.2-text
spades -- Genome Assembler. website
- spades/spades-3.8.1
- spades/spades-3.7.1
mpi -- Message Passing Interface wikipedia
- mpi/mpich-x86_64
- mpi/mvapich2-x86_64
- mpi/openmpi-x86_64
matlab/matlab-r2016a -- Math and graphing. website

COLSA Overview

Information specific to COLSA usage of the Premise cluster is listed in the sections below. If you have any questions regarding COLSA usage of Premise, or would like to schedule a training session, please contact Toni Westbrook.

Slurm Templates

A selection of templates are available to use as a foundation for your Slurm scripts. These are especially recommended in the case of any MPI compatible software, such as MAKER or ABySS. All templates may be found on Premise in the following directory:

/mnt/home/hcgs/shared/slurm-templates

These templates may be copied into your home directory, and then modified. Each script is heavily commented to aid in changing specific parameters relevant to your job, such as ensuring allocation of high memory nodes. The four available templates are as follows:

threaded.slurm: This template is suitable for software that runs as a single process with multiple threads on a single node, which represents the majority of bioinformatics software installed on Premise.
parallel.slurm: This template is designed for executing multiple, low-thread count jobs concurrently on a single node. Two styles of parallel execution are shown in the template, including a method of spawning a process for each file in a directory. This could be used for scenarios such as simultaneously running an instance of an application for each FASTQ file.
abyss.slurm: This template ensures MPI is loaded correctly for ABySS jobs that use multiple nodes, but is also suitable for single node use.
maker.slurm: This template ensures MPI is loaded correctly for MAKER jobs that use multiple nodes, but is also suitable for single node use. MAKER will not function properly using the MPI instructions in the GMOD tutorials, please make use of this template instead.

Reference Databases

The number of reference databases are downloaded and indexed regularly across a selection of popular alignment tools. These may be found on Premise in the following directory:

/mnt/home/hcgs/shared/databases

When making use of these files, please do not copy them into your home directory. Instead, either direct your aligner to use them directly, or symbolically link to these files. As some of these FASTA files create especially large indexes that take over a week to create, please make use of this shared directory to avoid unnecessarily allocating compute resources.

Group Shared Directories

Each PI or group on Premise has a shared folder located in the group's directory, for example:

/mnt/home/macmaneslab/shared

These directories are intended for large or numerous files shared between multiple users in the group, such as sequences, references, software, etc. This avoids copying the same file to multiple users, alleviating disk space consumption and version management.

Monitoring Jobs

While executing a job, to ensure that threading and other parameters have been specified correctly, it can be helpful to monitor metrics like CPU and memory usage. While directly running utilities like top will only monitor the head node, we have developed slurm-monitor, which will show top as if connected to the relevant compute nodes. Usage is as follows:

slurm-monitor <job ID>

Note - for jobs that span multiple nodes, the active node may be cycled within slurm-monitor using the [ and ] keys.

Personal Anaconda Virtual Environments

The Anaconda installation on Premise accommodates both system-wide and user-specific virtual environments. Personal environments allow the user to install specific versions of libraries per application, often necessary for bioinformatics pipelines.

To begin working with any Anaconda environment, load the Anaconda environmental module:

module load anaconda/colsa

Note - it will be necessary to unload the "linuxbrew/colsa" module before loading the anaconda module. After the module is loaded, Anaconda environments may be activated and deactivated with the following commands, respectively:

conda activate <environment name>
conda deactivate

While an Anaconda environment is active, any software or libraries installed within the virtual environment will be available, and any new software installations using the "conda" utility will be installed within the active environment. For a list of bioinformatics software available through the Bioconda channel, see their website.

Connecting to the Bioconda channel, creating an environment, cloning recommended settings, and adding Python 3.12 and samtools 1.22 to the environment is outlined in the following example:

# Note - setting up these channels only needs to be done once per user
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# The following creates an Anaconda environment with the recommended
# configuration and adds software to it
module load anaconda/colsa
conda create --name test-pipeline --clone template
conda activate test-pipeline
conda install python=3.12 samtools=1.22

Installed Software Packages

A number of bioinformatics related packages and programming language interpreters are pre-installed on Premise and ready to use. These are available in either the linuxbrew/colsa or anaconda/colsa modules, as listed below. We are also happy to install any missing software package; feel free to send us an email with the link to the software.

Prior to using any software on Premise, the corresponding enviornmental module must be loaded:

module load <module name>

anaconda/colsa

The following packages are available in the anaconda/colsa module:

Note: The Anaconda environment name of each package below is indicated within parentheses. Please see here for instructions on how to activate an Anaconda environment):

admixture v1.3.0 (admixture-1.3.0) -- ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.. website
AGAT 0.5.1 (agat-0.5.1) -- GFF analysis tools. website
AlphaFold 2.3.1 (alphafold-2.3.1) -- AI system developed by Google DeepMind that predicts a protein’s 3D structure from its amino acid sequence. website
AlphaLink2 1.1.1 (alphalink2-1.1.1) -- Integrating crosslinking MS data into Uni-Fold-Multimer. website
antiSMASH 7.1.0 (antismash-7.1.0) -- Antibiotics and secondaryh metabolite analysis. website
bcftools v1.17 (bcftools-1.17) -- BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.. website
bcl2fastq 2.20.0 (bcl2fastq-2.20.0) -- Convert and demultiplex BCL files to FASTQ files. website
Beagle 3.3 (beagle-3.3) -- Package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. website
Beagle v5.2_21 (beagle-5.2_21) -- Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.. website
Blast v2.14.0 (blast-2.14.0) -- BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.. website
Blobtools 1.1.1 (blobtools-1.1.1) -- Visualization, QC, and taxonomic partitioning.. website
BRAKER 3.0.3 (braker-3.0.3) -- A combination of GeneMark-ET R2 and AUGUSTUS R3, R4, that uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genome.. website
Braker 3.0.8 (braker-3.0.8) -- a combination of GeneMark-ET and AUGUSTUS that uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genome.. website
Busco v5.4.4 (busco-5.4.4) -- Assessment of assembly completeness using Universal Single Copy Orthologs. website
Busco v5.4.7 (busco-5.4.7) -- Assessment of assembly completeness using Universal Single Copy Orthologs. website
BUSCO 5.beta (busco-5.beta) -- Assessment of genome assemblies, gene sets, and transcriptome completeness. website
BWA-meth 0.2.2 (bwameth-0.2.2) -- BS-Seq read aligner. website
Cactus 2.9.0 (cactus-2.9.0) -- Reference free whole-genome multiple alignment. website
Cactus 2.9.0 (cactus-2.9.0-with-dill) -- Reference free whole-genome multiple alignment. website
Cactus 20200408 (cactus-20200408) -- Reference free whole-genome multiple alignment. website
Centromere Seeker 2.1 (centromere-seeker-2.1) -- bash script to search for centromeric repeat patterns in long sequence data, using several current tools (trf and R). website
CG Pipeline (cg-pipeline) -- Genome assembly and annotation pipline. website
CitcomSVE 3.0 (citcomsve-3.0) -- A finite element package for modeling terrestrial planetary viscoelastic deformation in response to tidal and surface loads.. website
COLONY 2.0.6.8 (colony-2.0.6.8) -- Assign sibship and parentage via maximum likelihood. website
Cutadapt 3.2 (cutadapt-3.2) -- Trim adapters and primers from read sequences. website
dammit 1.2 (dammit-1.2) -- De novo transcriptome annotator. website
Dedalus 3.0.2 (dedalus-3.0.2) -- A flexible framework for solving partial differential equations using modern spectral methods. website
DeepFRI 0.0.1 (deepfri-0.0.1) -- Deep functional residue identification. website
Diamond 2.1.9 (diamond-2.1.9) -- Accelerated BLAST compatible local sequence aligner. website
DRAM 1.2.2 (dram-1.2.2) -- Annotation of metagenomic assembled genomes. website
EDTA 2.2.0 (edta-2.2.0) -- Automated whole-genome de-novo TE annotation and benchmarking the annotation performance of TE libraries. website
EEMS 0.0.0.9000 (eems-0.0.0.9000) -- Method for analyzing and visualizing spatial population structure from geo-referenced genetic samples.. website
eggnog-mapper 2.1.12 (eggnog-mapper-2.1.12) -- Fast genome-wide functional annotation through orthology assignment.. website
Fastp v0.23.2 (fastp-0.23.2) -- FASTQ preprocessor with full features (QC/adapters/trimming/filtering/splitting...). website
FEniCS 2019.1.0 (fenics-2019.1.0) -- A collection of free software for automated, efficient solution of differential equations. website
FFmpeg 7.0.2 (ffmpeg-7.0.2) -- Cross-platform solution to record, convert and stream audio and video.. website
Foldseek 5.53465f0 (foldseek-5.53465f0) -- fast and accurate protein structure search. website
FUSTr 20200224 (fustr-20200224) -- Detect protein families under positive selection. website
G-PhOCS 1.3.2 (g-phocs-1.3.2) -- c$ython API for comprehensive GWAS analysis using GEMMA. website
gatk 4.5.0.0 (gatk-4.5.0.0) -- Genome Analysis Toolkit. website
Genomescope 2.0 (genomescope-2.0) -- Reference-free profiling of polyploid genomes. website
get_homologues 3.0.7 (get_homologues-3.0.7) -- Pan-genome analysis. website
GetOrganelle 1.7.4.1 (getorganelle-1.7.4.1) -- Organelle genome assembly. website
bmx_MMPBSA 1.6.2 (gmx_mmpbsa-1.6.2) -- gmx_MMPBSA is a new tool based on AMBER's MMPBSA.py aiming to perform end-state free energy calculations with GROMACS files.. website
GTDB-Tk 1.5.1 (gtdbtk-1.5.1) -- Assign taxonomy to bacterial and archaeal genomes. website
GTDB-Tk 2.1.0 (gtdbtk-2.1.0) -- Assign taxonomy to bacterial and archaeal genomes. website
Guidance 2.02 (guidance-2.02) -- . website
HAL 2.1 (hal-2.1) -- Store and index multiple genome alignments and ancestral reconstructions. website
hicat1.1.0 (hicat-1.1.0) -- HiCAT is a generalized computational tool based on hierarchical tandem repeat mining (HTRM) method to automatically process centromere annotation.. website
HOOMD-blue 3.11.0 (hoomd-3.11.0) -- General-purpose particle simulation toolkit. website
HOOMD-blue 4.4.1 (hoomd-4.4.1) -- General-purpose particle simulation toolkit. website
HTSeq 0.13.5 (htseq-0.13.5) -- High-throughput sequencing data analysis tools. website
HyPhy 2.5.26 (hyphy-2.5.26) -- Hypothesis testing using phylogenies. website
hyphy 2.5.59 (hyphy-2.5.59) -- An open-source software package for comparative sequence analysis using stochastic evolutionary models.. website
I-TASSER 5.2 (i-tasser-5.2) -- I-TASSER Suite is a package of standalone computer programs, developed for high-resolution protein structure prediction, refinement, and structure-based function annotations.. website
I-TASSER-MTD 1.0 (i-tasser-mtd-1.0) -- I-TASSER-MTD is a hierarchical protocol to predict structures and functions of multi-domain (MTD) proteins.. website
IMP 2.18.0 (imp-2.18.0) -- IMP's broad goal is to contribute to a comprehensive structural characterization of biomolecules ranging in size and complexity from small peptides to large macromolecular assemblies, by integrating data from diverse biochemical and biophysical experiments.. website
IMP 2.20.1 (imp-2.20.1) -- IMP's broad goal is to contribute to a comprehensive structural characterization of biomolecules ranging in size and complexity from small peptides to large macromolecular assemblies, by integrating data from diverse biochemical and biophysical experiments.. website
InStrain 1.5.3 (instrain-1.5.3) -- Analysis of co-occurring populations in metagenomes. website
InterProScan 5.69.101 (interproscan-5.69.101) -- Align and characterize sequences against InterPro databases. website
ipyrad v0.9.90 (ipyrad-0.9.90) -- Interactive assembly and analysis of RAD-seq data sets.. website
IQ-TREE 2.0.3 (iqtree-2.0.3) -- Maximum-Likelihood inference of phylogeny.. website
IQ-TREE 2.2.6 (iqtree-2.2.6) -- Maximum-Likelihood inference of phylogeny.. website
IQ-TREE 2.3.6 (iqtree-2.3.6) -- Maximum-Likelihood inference of phylogeny.. website
JAGS 4.3.2 (jags-4.3.2) -- Just Another Gibbs Sampler (with R installed). website
Juicer 1.8.9 (juicer-1.8.9) -- Pipeline for processing Hi-C datasets. website
Julia 1.8.2 (julia-1.8.2) -- Julia programming language. website
Jupyter 20190324 (jupyter-20190324) -- Jupyter lab environment with Python and R kernels. website
KAT (kat-2.4.2) -- Analyze hashes and sequence files from the Jellyfish package. website
Kraken v2.1.2 (kraken-2.1.2) -- The second version of the Kraken taxonomic sequence classification system. website
kSNP 3.1 (ksnp-3.1) -- SNP discovery and annotation for whole genomes. website
LAMMPS 2023.11.21 (lammps-2023.11.21) -- Large-scale Atomic/Molecular Massively Parallel Simulator.. website
MafFilter 1.3.1 (maffilter-1.3.1) -- Genome alignment processor and analysis. website
MAJIQ 2.1 (majiq-2.1) -- Detect, quantify, and visualize local splicing variations. website
MAKER 3.01.02 (maker-3.01.02) -- Annotation of prokaryotic/eukaryotic genomes. website
MaSuRCA 3.1.3 (masurca-3.1.3) -- Short read and mixed short/long read assembler. website
MaSuRCA 3.2.6 (masurca-3.2.6) -- Short read and mixed short/long read assembler. website
MaSuRCA 3.3.1 (masurca-3.3.1) -- Short read and mixed short/long read assembler. website
Megan 6.25.9 (megan-6.25.9) -- A tool for studying the taxonomic content of a set of DNA reads. website
MEME 5.3.0 (meme-5.3.0) -- Motif-based sequence analysis. website
MetaLAFFA 1.0.1 (metalaffa-1.0.1) -- MetaLAFFA is a pipeline for annotating shotgun metagenomic data with abundances of functional orthology groups.. website
MITE-Hunter 2011 (mite_hunter-2011) -- Discover miniature inverted-repeat transposable elements. website
MitoZ 2.4-alpha (mitoz-2.4) -- Mitochondrial genome assembly and annotation. website
MitoZ 3.6 (mitoz-3.6) -- MitoZ: A toolkit for assembly, annotation, and visualization of animal mitochondrial genomes. website
Modeller 10.4 (modeller-10.4) -- Comparative modeling by satisfaction of spatial restraints. website
Mugsy 1.2.3 (mugsy-1.2.3) -- Multiple whole genome aligner. website
MultiQC 1.10.1 (multiqc-1.10.1) -- Aggregate bioinformatics results across many samples. website
MUMPS 5.7.3 (mumps-5.7.3) -- A parallel parse direct solver. website
NASP 1.0.02 (nasp-1.0.2) -- Pipeline for performing variant calling. website
ngopt 2016.08.25 (ngopt-20160825) -- Illumina MiSeq sequence assembly pipeline. website
ngsLD 1.2.0 (ngsld-1.2.0) -- Estimate pairwise linkage disequilibrium (LD) taking the uncertainty of genotype's assignation into account. website
NgsRelate 2.0 (ngsrelate-2.0) -- Infer relatedness, inbreeding coefficients and many other summary statistics for pairs of individuals from low coverage Next Generation Sequencing (NGS) data. website
ngsTools (ngstools-20190326) -- Population genetics analysis (includes ANGSD and NgsRelate). website
NMRPipe 10.4 (nmrpipe-10.4) -- NMR spectroscopic data processing and analysis.. website
Nullarbor 2.0.20181010 (nullarbor-2.0.20181010) -- Generate public health microbiology reports. website
OpenFoam v12 (openfoam-v12) -- Open-source computational fluid dynamics (CFD) toolbox. website
OPERA 2.0.6 (opera-2.0.6) -- Assembly of paired-end/long reads. website
ORCA 5.0.4 (orca-5.0.4) -- A powerful and versatile quantum chemistry software package. website
Oyster River Protocol 2.3.3 (orp-2.3.3) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.0.0 (orp-20180828) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.1.1 (orp-20190215) -- Transcriptome assembly pipeline. website
Oyster River Protocol 2.2.8 (orp-20191014) -- Transcriptome assembly pipeline. website
OrthoFinder 2.5.2 (orthofinder-2.5.2) -- Phylogenetic orthology inference. website
OrthoFinder 2.5.5 (orthofinder-2.5.5) -- Phylogenetic orthology inference. website
OrthoFinder 3.0.1b1 (orthofinder-3.0.1b1) -- Phylogenetic orthology inference. website
OrthoFinder 3.1.0 (orthofinder-3.1.0) -- Phylogenetic orthology inference. website
Pacbio Tools (pacbio-20190801) -- Pacbio suite of tools, including Falcon. website
PALADIN 1.5.0 (paladin-1.5.0) -- Nucleotide alignment and UniProt reporting against annotated nucleotide/transcriptome/protein references. website
PALADIN 1.6.0 (paladin-1.6.0) -- Nucleotide alignment and UniProt reporting against annotated nucleotide/transcriptome/protein references. website
PASTA 1.8.2 (pasta-1.8.2) -- Estimate alignments and ML trees from unaligned sequences. website
PCAngsd 1.2 (pcangsd-1.2) -- Estimates the covariance matrix and individual allele frequencies for low-depth next-generation sequencing (NGS) data in structured/heterogeneous populations.. website
Pear 0.9.6 (pear-0.9.6) -- paired-end read merger. website
Perl 5.32.x (perl-5.32) -- Perl programming language. website
PHAST 1.5 (phast-1.5) -- Phylogenetic Analysis with Space/Time Models. website
PhyloAcc 2.3.3 (phyloacc-2.3.3) -- Bayesian estimation of lineage specific substitution rates in conserved non-coding regions while accounting for phylogenetic discordance. website
Pixy 1.2.11 (pixy-1.2.11) -- Unbiased estimation of nucleotide diversity within and between populations. website
Plink 2.00a3LM (plink-2.00a3LM) -- Whole genome data analysis toolset. website
pmx (pmx-4.1.3) -- A versatile (bio-) molecular structure manipulation package with some additional functionalities,. website
PopLDdecay 3.42 (poplddecay-3.42) -- a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. website
proovread 2.14.1 (proovread-2.14.1) -- Correct PacBio and Illumina reads. website
PSORTb 3.0.0 (psortb-3.0.0) -- Subcellular localization prediction. website
Python 3.9.x (python-3.9) -- Python programming language. website
QIIME2 2017.10 (qiime2-2017.10) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.12 (qiime2-2017.12) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.2 (qiime2-2017.2) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.6 (qiime2-2017.6) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.7 (qiime2-2017.7) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.8 (qiime2-2017.8) -- Pipeline for performing microbiome analysis. website
QIIME2 2017.9 (qiime2-2017.9) -- Pipeline for performing microbiome analysis. website
QIIME2 2018.2 (qiime2-2018.2) -- Pipeline for performing microbiome analysis. website
QIIME2 2018.4 (qiime2-2018.4) -- Pipeline for performing microbiome analysis. website
QIIME2 2019.1 (qiime2-2019.1) -- Pipeline for performing microbiome analysis. website
QIIME2 2019.4 (qiime2-2019.4) -- Pipeline for performing microbiome analysis. website
QIIME2 2020.2 (qiime2-2020.2) -- Pipeline for performing microbiome analysis. website
QIIME2 2024.10.1 (qiime2-2024.10.1) -- Pipeline for performing microbiome analysis. website
QIIME2 2024.10.26 (qiime2-2024.10.26) -- Pipeline for performing microbiome analysis. website
Qualimap v2.2.1 (qualimap-2.2.1) -- Quality control of alignment sequencing data and its derivatives. website
R 4.0.x (r-4.0) -- R programming language. website
R 4.2.x (r-4.2) -- R programming language. website
RAxML-NG 1.0.1 (raxmlng-1.0.1) -- Phylogenetic tree inference tool using maximum-likelihood. website
RAxML-NG 1.0.3 (raxmlng-1.0.3) -- Phylogenetic tree inference tool using maximum-likelihood. website
Rcorrector v1.0.5 (rcorrector-1.0.5) -- kmer-based error correction method for RNA-seq data. website
Redundans 2.0.1 (redundans-2.0.1) -- Assist assembly of heterozygous/polymorphic genomes. website
RepeatModeler 2.0.4 (repeatmodeler-2.0.4) -- RepeatModeler is a de-novo repeat family identification and modeling package.. website
RepeatProfiler (repeatprofiler-20201223) -- Tool for repetitive DNA dynamics using low-coverage, short-read data. website
rMATS 4.1.1 (rmats-4.1.1) -- Detect differential alternative splicing events. website
Sabre (sabre-20210224) -- Barcode demultiplexing tool. website
Salmon 1.5.2 (salmon-1.5.2) -- Quantify transcript expression from RNA-Seq data. website
Satsuma2 20161123 (satsuma2-20161123) -- Long sequence/whole genome alignment. website
SDA 20200116 (sda-20200116) -- Segmental Duplication Assembler. website
SEPP 4.3.10 (sepp-4.3.10) -- Phylogenetic placement (SEPP/TIPP/UPP). website
SequenceTools 1.4.1 (sequencetools-1.4.1) -- Pileup caller and Eigenstrat file tools. website
Shapeit 4.2.2 (shapeit-4.2.2) -- Fast and accurate method for estimation of haplotypes (phasing). website
SMC++ 1.15.5 (smcpp-1.15.5) -- SMC++ infers population history from whole-genome sequence data.. website
Smudgeplot 0.2.5 (smudgeplot-0.2.5) -- Inference of ploidy and heterozygosity structure using whole genome sequencing data. website
Snippy 3.2 (snippy-3.2) -- Bactrial SNP calling and core genome alignments. website
SnpEff 5.2 (snpeff-5.2) -- Genetic variant annotation and effect prediction toolbox. website
Spaceranger v2.0.1 (spaceranger-2.0.1) -- Space Ranger is delivered as a single, self-contained tar file that can be unpacked anywhere on your system.. website
Split Pipe 1.3.1 (spipe-1.3.1) -- allows analysis of parse bioseciences single cell rna sequencing data. website
SQLite 3.44.2 (sqlite-3.44.2) -- Implements a self-contained, zero-configuration, SQL database engine. website
Stacks 2.4 (stacks-2.4) -- Pipeline for building loci from short-read sequences (CLI only). website
Stacks 2.5 (stacks-2.5) -- Pipeline for building loci from short-read sequences (CLI only). website
Stacks 2.67 (stacks-2.67) -- Pipeline for building loci from short-read sequences (CLI only). website
STAR 2.7.10b (star-2.7.10b) -- ultrafast universal RNA-seq aligner. website
Supernova 2.1.1 (supernova-2.1.1) -- De novo assembly of 10X Genomics Linked-Reads. website
table2asn 1.24.426 (table2asn-1.24.426) -- Create annotated genome submission from GFF3 file. website
Tassel 5.2.40 (tassel-5.2.40) -- Genotyping by Sequencing (GBS) pipeline. website
Trans-ABySS 2.0.1 (transabyss-2.0.1) -- De novo assembly of RNA-Seq data. website
treemix 1.13 (treemix-1.13) -- TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.. website
TreePL 1.0 (treepl-1.0) -- Phylogenetic penalized likelihood. website
Trinity v2.15.0 (trinity-2.15.0) -- Trinity assembles transcript sequences from Illumina RNA-Seq data.. website
Trinotate 3.1.1 (trinotate-3.1.1) -- Transcriptome annotation. website
vcf2gwas v0.8.8 (vcf2gwas-0.8.8) -- ython API for comprehensive GWAS analysis using GEMMA. website
vcftools v0.1.16 (vcftools-0.1.16) -- A set of tools written in Perl and C++ for working with VCF files.. website
Whippet 0.8 (whippet-0.8) -- RNA-seq quantification. website
WRF 4.5.1 (wrf-4.5.1) -- Weather Research & Forecasting Model (WRF). website

linuxbrew/colsa

The following packages are available in the linuxbrew/colsa module:

ABySS 2.1.0 -- Paired-end short read sequence assembler. website
agrep 0.8.0 -- Grep with approximate and weighted matches. website
AdapterRemoval 2.3.1 -- Read trimming tool. website
Albacore 2.3.4 -- Nanopore basecaller. website
ALLPATHS-LG 52488 -- Short read sequence assembler. website
AMPtk 1.2.5 -- Process amplicon data using USEARCH/VSEARCH. website
ARAGORN 1.2.38 -- Detection of tRNA and tmRNA in nucleotide sequences. website
Ariba 2.13.3 -- Antimicrobial resistance identification by assembly. website
ART 2016-06-05 -- NGS read simulator. website
Astral 5.6.3 -- Estimate unrooted species trees from unrooted genes trees. website
Augustus 3.2.2 -- Gene prediction in eukaryotic genomic sequences. website
BamTools 2.4.0 -- Toolkit for working with BAM (Binary Alignment Map) data. website
Barrnap 0.9 -- Rapid ribosomal RNA predictor. website
BayeScan 2.0 -- Detecting natural selection from population-based genetic data. website
BCFtools 1.9.1 -- Tools for variant calling and manipulating VCFs/BCFs. website
BedTools 2.29.0 -- Toolkit for performing a variety of genomics analysis tasks. website
BEAST 1.8.4 -- Bayesian inference of phylogeny. website
BEAST-2 2.6.2 -- Bayesian inference of phylogeny. website
BFC r181 -- Illumina read error correction. website
Bioawk 20110810 -- AWK with extensions for biological data formats. website
BLASR 1.3.1 -- PacBio nucleotide alignment against nucleotide references. website
BLAST Suite 2.5.0 -- Nucleotide/Protein alignment against nucleotide/protein references. website
BLAT 36.1 -- Nucleotide/Protein alignment against nucleotide/protein references. website
Bowtie 1.1.2 -- Nucleotide (<50 bp) alignment against nucleotide references. website
BUSCO 3.0.0 -- Assessment of genome assemblies, gene sets, and transcriptome completeness. website
BWA 0.7.15 -- Nucleotide alignment against nucleotide references. website
CAFE 4.2.1 -- Analyze changes to gene family size for evolutionary inference. website
Canu 1.4 -- High-noise, single-molecule (PacBio RS II / Oxford Nanopore MinION) assembler. website
CD-HIT 4.6 -- Nucleotide and protein cluster creation and comparison. website
Centrifuge 1.0.3-beta -- Classification of DNA reads from microbial samples. website
cis-Metalysis 1.3 -- Analysis of annotations associated with dysregulated gene sets. website
ClonalFrameML 1.11 -- Inference of recombination in bacterial genomes. website
CONCOCT 0.4.0 -- Metagenomic binning by kmer frequency and coverage. website
CRISPResso 1.0.8 -- Analysis of CRISPR-Cas9 genome editing outcomes. website
Cufflinks 2.2.1 -- Transcriptome assembly and differential expression analysis. website
Cutadapt 1.14 -- Read trimming tool. website
DIAMOND 0.9.17 -- Nucleotide/Protein alignment against protein references. website
Distruct 1.1 -- Plot population structure. website
DIYABC 2.1.0 -- Approximate Bayesian inference on population history. website
DiscovarDeNovo r52488 -- Assembly of Illumina reads. website
E-MEM 1.0.0 -- Compute MEMs (maximal exact matches) between large genomes. website
EMBOSS 6.6.0.0 -- Toolkit containing multiple applications for molecular biology analysis. website
EPA-ng 0.3.5 -- Phylogenetic placement of sequences. website
ExaBayes 1.5 -- Bayesian inference of phylogeny for computer clusters. website
Exonerate 2.2.0 -- Nucleotide and protein alignment against nucleotide and protein references. website
FastANI 1.1 -- Alignment-free computation of average nucleotide identity. website
fastGEAR 2018-07-02 -- Identify population genetic structure and recombinations. website
FastQC 0.11.5 -- Quality control reporting tool for sequence data. website
FASTQ-Tools 0.8 -- FASTQ related utilities. website
FastViromeExplorer 2017-11-03 -- Identify virus abundances in metagenomic data. website
FASTX-Toolkit 0.0.14 -- Collection of tools for short-read FASTA/FASTQ files. website
Filtlong 0.2.0 -- Quality filtering for long reads. website
Flash 1.2.11 -- Merge paired-end reads. website
Flye 2.3.5 -- Long and noisy read de-novo assembler. website
FUNGuild 1.1 -- Parsing OTUs into functional guilds. website
GATK 4.0.11.0 -- Toolkit for variant discovery and genotyping. website
Gblocks 0.91b -- Eliminates poorly aligned positions and divergent regions. website
GKaKs 1.3 -- Genome level calculation of Ka/Ks. website
Gnuplot 5.2 -- Graphing utility. website
Go 1.11.4 -- Go programming language. website
Gubbins 2.3.4 -- Bacterial phylogeny inference unbiased by recombination. website
Guppy 3.0.3 -- Nanopore basecaller. website
HiDe 2012-06-09 -- Inference of HGT highways. website
HISAT2 2.2.1 -- Alignment of reads to populations of human genomes. website
HMMER 3.2.1 -- Homolog search and sequence alignment using hidden Markov models. website
iBPP 2.1.3 -- Bayesian species delimitation integrating genes and traits. website
Infernal 1.1.2 -- Inference of RNA alignments. website
IQ-TREE 1.6.12 -- Maximum-Likelihood inference of phylogeny. website
ITSx 1.0.11 -- Detection and extraction of ITS1 and ITS2 sequences. website
Jellyfish 2.2.6 -- K-mer counting and analysis. website
Kallisto 0.43.1 -- Quantify abundances of transcripts from RNA-Seq data. website
KaKs Calculator 2.0 -- Calculates nonsynonymous and synonymous substitution rates. website
KinFin 1.0.3 -- Taxon-aware analysis of protein clusters. website
LAGAN 2.0 -- Alignment toolkit. website
LDhat 2018-02-19 -- Estimate recombination rates from population genetics data. website
LEfSe 1.0.0 -- OTU and gene visualization. website
LINKS 1.8.6 -- Long Interval Nucleotide K-mer Scaffolder. website
LoRDEC 0.9 -- Long read error correction. website
MAFFT 7.305b -- Nucleotide/Protein multiple sequence alignment. website
Maker 2.31.9 -- Annotation of prokaryotic/eukaryotic genomes. website
Mash 2.1 -- Genome and metagenome distance estimation via minhash. website
Mauve 2015-02-13 -- Multiple whole genome alignment. website
Mcorr 2018-03-14 -- Infer recombination rate using correlation profile. website
MEGAHIT 1.1.3 -- De-novo metagenomic assembler. website
MetaBAT 2.13 -- Framework for reconstructing genomes from metagenomes. website
MetaCRAST 2018-01-22 -- Find CRISPR arrays in unassembled metagenomes. website
MinCED 0.4.0 -- Find CRISPRs in shotgun DNA sequences or full genomes. website
Miniasm 0.3-r179 -- Assembler for noisy, long reads. website
Minimap 2.10 -- Long read nucleotide alignment against nucldeotide references. website
MCL 14-137 -- Unsupervised cluster algorithm for graphs (Markov Cluster Algorithm). website
MrBayes 3.2.6 -- Bayesian inference of phylogeny. website
MrsFAST 3.4.0 -- Micro-read substitution-only alignment. website
MUMmer 3.23 -- Rapid genome alignment. website
MUSCLE 3.8.1551 -- Nucleotide/Protein multiple sequence alignment. website
Myriads 1.2 -- P-value based dependence detection. website
NanoFilt 2.2.0 -- Filtering and trimming of long read sequences. website
Nanopolish 0.10.2 -- Signal-level analysis of Oxford Nanopore sequences. website
OCaml 4.07.1 -- OCaml programming language. website
OrthoFinder 2.3.3 -- Phylogenetic orthology inference. website
Pacbio-Patch 2015-10-13 -- Improve assemblies via PacBio long reads. website
PALADIN-plugins 1.0.2 -- Pipeline for PALADIN related analysis. website
PAML 4.9c -- Phylogenetic analysis using maximum liklihood. website
Panaroo 1.2.2 -- Pangenome analysis pipeline. website
Pandoc 2.1.1 -- Markup file converter for many formats. website
PartitionFinder 2.1.1 -- Partition scheme and model selection for phylogenetic analysis. website
PEAR 0.9.10 -- Illumina Paired-end read merger. website
Peregrine 0.1.5.3 -- Accurate long read assembler. website
Perl 5.24.0 -- Perl programming language. website
PGDSpider 2.1.1.5 -- Convert between population genetics file formats. website
PhyloBayes 1.8 -- Phylogenetic reconstruction using mixture models. website
PhyloTreePruner 2015-09-18 -- Trim suspected paralogs from trees. website
Picard 2.18.1 -- Toolkit for manipulating bioinformatics files. website
Pilon 1.22 -- Draft assembly improvement and variant calling. website
Plink 1.07 -- Whole genome data analysis toolset. website
Porechop 0.2.3 -- Oxford Nanopore adapter trimmer. website
PopPUNK 1.1.5 -- Population partitioning. website
PRANK 140603 -- Probabilistic multiple sequence alignment for closely related sequences. website
ProbCons 1.12 -- Protein multiple sequence alignment. website
PRODIGAL 2.6.3_1 -- Microbial gene finder. website
Prokka 1.14 -- Annotation of prokaryotic genomes. website
PSMC 0.6.5 -- Infer population size history from diploid sequence. website
Python 2 2.7.13 -- Python programming language. website
Python 3 3.6.0 -- Python programming language. website
QUAST 4.5 -- Quality assessment of genome assemblies. website
QuorUM 1.0.0 -- Illumina read error correction. website
R 3.3.3 -- R programing language. website
RAxML 8.2.10 -- Maximum-Likelihood inference of phylogeny. website
Rcorrector 1.0.2 -- Illumina RNA-seq error correction. website
Repeatmasker 4.0.7 -- DNA repeat and low complexity detection. website
RevBayes 1.0.11 -- Phylogenetic modeling, simulation, and inference. website
Roary 3.8.0 -- Pan genome pipeline. website
Salmon 0.8.2 -- Quantify abundances of transcripts from RNA-Seq data. website
samblaster 0.1.24 -- SAM duplicate marking and read extraction. website
SAMtools 1.5 -- Toolkit for working with SAM files. website
seqtk 1.2-r94 -- Tool for processing FASTA/FASTQ files. website
Shannon 0.0.2 -- RNA-Seq assembler. website
Skewer 0.2.2 -- Adapter trimmer for paired-end reads. website
SGA 0.10.15 -- Sequence assembler using string graphs. website
SNAP 2013-11-29 -- Semi-HMM-based gene prediction tool. website
SOAPaligner 2.21 -- Short read alignmer. website
SPAdes 3.13.1 -- Small genome assembly toolkit. website
SPLITREAD 0.1 -- Detecting INDELs within coding regions. website
SpreaD3 0.9.7.1 -- Analyze and visualize pathogens from phylogeny. website
STAR 2.7.5c -- Spliced Transcripts Alignment. website
stringMLST 0.4.1 -- k-mer based tool for multi locus sequence typing. website
Structure 2.3.4 -- Population structure inference. website
Stubb 2.1 -- Probabilistic detection of Regulatory Modules. website
SweepFinder2 20160916 -- Detect recent selective sweep selection. website
Taxtastic 0.8.11 -- Build and maintain reference packages. website
Tophat 2.1.1 -- Spliced read mapped for RNA-Seq. website
Transdecoder 3.0.1 -- Identify candidate coding regions within transcripts. website
Transfuse 0.5.0 -- Merge transcriptomic assemblies. website
Transrate 1.0.3 -- De-novo transcriptome assembly quality analysis. website
TreSpEx 1.1 -- Detection of misleading phylogenetic signals. website
Trimal 1.4.rev15 -- Automated alignment trimming in phylogeny. website
Trimmomatic 0.39 -- Illumina read trimming tool. website
Trinity 2.4.0 -- Illumina RNA-seq read sequence assembler. website
USEARCH 9.2.64 -- Search and clustering toolkit. website
VCFtools 0.1.15 -- Toolkit for working with VCF files. website
Velvet 1.2.10 -- Short read sequence assembler. website
VolcanoFinder 1.0 -- Detect events of adaptive introgression. website
VSEARCH 2.9.0 -- USEARCH alternative for nucleotide alignment. website