USNH Premise Cluster

Table of contents

USNH Premise Cluster
- Overview of the Premise cluster
- Usage

Overview of the Premise cluster

The Premise High-Performance Computing (HPC) cluster is a collection of USNH servers dedicated to processing research related computational analysis.

Funding

This cluster was funded by what is commonly called the "Condo Compute Model".

For Free?

The Premise core infrastructure is provided for USNH researchers. The Premise core infrastructure includes: racks, power distribution, cooling, network connectivity, file storage, and six servers. This shared infrastructure has been funded by ET&S and REEO.

A "shared" job queue is available for all Premise users. The Technology Governance Committee regularly reviews usage to ensure equitable availability of all HPC resources.

Buy-in

Your budget should include "HPC buy-in" funding to satisfy your projects minimum needs. The "Hardware Description" below defines three standard node configurations: base ($20k), hi-ram ($25k), and gpu ($30k+). Contact RCC for current pricing for your proposal budget. Your grant retains ownership of any hardware you purchase.

Owners are provided a restricted job queue with priority scheduling on any hardware they own. When no owner priority work exists "shared" queue jobs may be scheduled on the idle hardware. Owners should expect active jobs to complete in a "reasonable amount of time". This might cause wait times for some priority jobs.

Citation & Proposal language

Please acknowledge the use of Premise in your papers like this:

Computations were performed on Premise, a central, shared HPC cluster at USNH supported by the Research Computing Center and PIs who have contributed compute nodes.

You may find the following narrative description of Premise useful when writing proposals:

USNH operates "Premise.sr.unh.edu<http://Premise.sr.unh.edu>", an HPC cluster with 70 compute nodes connected together using HDR Infiniband networking. Each node has at least two 12 core CPUs & 128GB RAM, 26 nodes have GPUs, and 29 nodes have 384GB or more RAM. The entire cluster shares 225TB standard, 777+TB PixStor, and 160TB of ZFS storage. Premise is a central, shared cluster based on the "buy in" model where funded researchers contribute to the node count. Four nodes, are funded centrally for researchers without funding. Central funding also purchased infrastructure including cabinets, power, cooling, and storage. The cluster is hosted and administered by the Research Computing Center staff and is located in the Morse Hall Lenharth Data Center. Unused Premise compute is available equally to all USNH Research projects, node owners are provided a priority job queue to ensure the funding sponsors work takes priority.

Description of Hardware

The Premise cluster is an HPC made up of:

The login node is "premise.sr.unh.edu"
70 compute nodes connected together using 100GB HDR Infiniband networking
- All compute nodes have at least two 12-core CPUs
  - 30 nodes have 24 cores per node from dual 12 core CPUs
  - 16 nodes have 32 cores per node from dual 16 core CPUs
  - 11 nodes have 40 cores per node from dual 20 core CPUs
  - 1 node has 48 cores per node from dual 24 core CPUs
  - 1 node has 64 cores per node from dual 32 core CPUs
  - 6 nodes have 112 cores per node from dual 56 core CPUs
  - 2 nodes have 64 cores per node from quad 16 core CPUs
  - 1 node has 32 cores per node from quad 8 core CPUs
  - 1 node has 32 cores per node from a single core CPU
  - 1 node has 24 cores per node from a single core CPU
- All compute nodes have at least 128GB of main memory
  - 22 compute nodes has 128GB of main memory
  - 5 compute nodes has 192GB of main memory
  - 14 compute nodes has 256GB of main memory
  - 6 compute nodes have 384GB of main memory
  - 9 compute nodes have 512GB of main memory
  - 8 compute nodes have 768GB of main memory
  - 4 compute nodes have 1TB of main memory
  - 2 compute nodes have 2TB of main memory
  - 9 compute nodes have NVidia K80 GPUs
  - 1 compute node has NVidia v100 GPU
  - 15 compute nodes have NVidia a100 GPUs
  - 1 compute node has NVidia h100 GPU
Shared Cluster storage
- 225TB+ of usable storage.
- 777TB+ of usable PixStor storage.
- 160TB uncompressed ZFS archive/backup storage

See purchase history for more detail.

CPU performance only

Premise has 70 compute nodes with CPUs that have differing core counts for a total of over 2,200 cores. Individual nodes have computing power that range from almost 500 GFlops/sec all the way to over 7TFlops/sec

(Total CPU performance) 
  = ( 1 nodes) * ( 48 cores) * (2.80 GHz) * (16 Flops/cycle) =  2150 GFlops
  + ( 1 nodes) * ( 64 cores) * (2.80 GHz) * (16 Flops/cycle) =  2867 GFlops
  + ( 2 nodes) * ( 64 cores) * (2.50 GHz) * ( 8 Flops/cycle) =  2560 GFlops
  + (26 nodes) * ( 24 cores) * (2.50 GHz) * ( 8 Flops/cycle) = 12480 GFlops
  + ( 1 nodes) * ( 24 cores) * (2.50 GHz) * (16 Flops/cycle) =   960 GFlops
  + ( 4 nodes) * ( 24 cores) * (2.60 GHz) * (32 Flops/cycle) =  7987 GFlops
  + ( 2 nodes) * ( 40 cores) * (2.10 GHz) * (32 Flops/cycle) =  5376 GFlops
  + ( 5 nodes) * ( 32 cores) * (2.80 GHz) * (32 Flops/cycle) = 14336 GFlops
  + ( 2 nodes) * ( 32 cores) * (3.20 GHz) * (16 Flops/cycle) =  3278 GFlops
  + ( 9 nodes) * ( 40 cores) * (2.50 GHz) * (32 Flops/cycle) = 28800 GFlops
  + ( 6 nodes) * (112 cores) * (2.00 GHz) * (32 Flops/cycle) = 43008 GFlops
  + (11 nodes) * ( 32 cores) * (2.50 GHz) * (32 Flops/cycle) = 28160 GFlops
  = (151962 GFlops) 
  = (151.96 TFlops)

GPU performance only

Nine of the compute nodes of the Premise cluster each contain two NVIDIA k80 GPU cards. One compute node contains a single NVIDIA v100 GPU card. Fourteen compute nodes each contain a single NVIDIA a100 GPU card.

(Total GPU performance)
  = ( 9 nodes) * (2 k80  GPU/node) * (1.87 TFlops/GPU) =  33.67 TFlops 
  = ( 1 nodes) * (1 v100 GPU/node) * (7    TFlops/GPU) =   7.00 TFlops
  = (15 nodes) * (1 a100 GPU/node) * (9.7  TFlops/GPU) = 145.50 TFlops
  = ( 1 nodes) * (1 h100 GPU/node) * (30   TFlops/GPU) =  30.00 TFlops
  = (216.17 TFlops)

Additionally, if leveraged, the 15 NVIDIA a100 and 1 h100 GPU have Tensor Cores that add an additional (9.8 * 15 = 147) and (30 * 1 = 30) TFlops to the maximum theoretical performance.

CPUs + GPUs

(Combined Performance)
  = (Total CPU performance) + (Total GPU performance)
  = (151.96 TFlops) + (216.17 TFlops + 177.00 TFlops) 
  = (545.13 TFlops)

Usage

Premise is managed by USNH Research Computing Center staff. Please email all requests to: rcc.support@unh.edu The focus of the Premise HPC cluster is to support USNH research. If you're seeking academic services, please contact the RCC for other options.

Utilize Premise for Research

Establish a Premise account

Create a Premise account by emailing USNH Research Computing Center staff at: rcc.support@unh.edu

Getting started

HPC software is most often field specific. You probably have a better idea of where to look for relevant software tools in your field that we do, but you are welcome to ask RCC what we might know.

If you are bringing your own source code or use common Linux tools they may already exist on the cluster. Some software packages may be available as "modules" (more information available below).

Connecting to Premise

Premise is accessible via SSH, and will require an SSH client to remotely connect to it. Further details will be provided during the Premise overview session.

Disk quotas on Premise

Upon creation, each PI and their group share 2TB of file storage. Additional storage can be purchased. Rates can be found on the rcc site. You can contact RCC via email at email: rcc.support@unh.edu

At any time you can check your teams quota by typing

mmlsquota --block-size auto -g <group> mmfs1

How do I run my program

The RCC provides every new Premise user with an overview session. This session takes around an hour and provides users an overview of Premise, how to connect, interacting with Slurm, running software, transferring files, and other considerations specific to the user's research work. Further support is always available via email: rcc.support@unh.edu

Visualize Current Usage

RCC chose XDMoD to visualize Premise usage. The Premise XDMoD webpage may be viewed when on campus or using the VPN.

Premise Software Usage

For Premise software usage, please see here.