Software

Environment Modules

Environment modules provide a convenient way of using most of the installed sotfware on the Mantis cluster. Environment modules are a dynamic way to modify your shell environment, allowing you to easily load and unload different software packages, compilers, and libraries without conflicts. They work by setting up the necessary environment variables (like PATH, LD_LIBRARY_PATH, and MANPATH).

Below is a list of commonly used commands, where <module> is the target module.

Module Command Description
module avail Display all available modules.
module load <module> Loads a specific module into environemnt.
module list Show list of currently loaded modules.
module unload <module> Unloads a specific module.
module purge Unloads all loaded modules.

The Mantis cluster uses the Lmod implementation of environment modules.

Compiling software yourself

Some software is pretty easy to install. See this classic softare in population genetics, psmc.

All you have to do to install are the following commands (the instructions are in the README.md file):

# clone the repository
git clone https://github.com/lh3/psmc.git
cd psmc
make; (cd utils; make)

And it should create an executable binary. Some software is slightly more complicated, but still manageable by novice to intermediate users.

Some software, however can be very difficult, with a complex chain of dependencies, in which case we often rely on containerization with Singularity or use isolated environments created with Conda

Singularity

Reproducing environments and managing dependencies is difficult problem. Containerization is a powerful solution, allowing users to package software and its dependencies into a single, portable unit. Singularity is a container platform that is well-suited for HPC environments and can additionally use Docker images (Docker is the other major containerization platform and unavailable on Mantis).

Singularity is in your PATH by default so it can be run without loading a module.

To get an existing container from a repository, you use the command singularity pull <URL>. For example:

singularity pull https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0

This will create the file fastqc:0.12.1--hdfd78af_0, a singularity image (some image files will have the extension .sif). This containst the package fastqc.

To run a command (e.g. fastqc --help) inside the container, particularly when submitting SLURM batch scripts, you typically you will use singularity exec <container> <command>:

singularity exec fastqc:0.12.1--hdfd78af_0 fastqc --help

You can also start up a shell inside the container and explore interactively with singularity shell <container>.

singularity shell fastqc:0.12.1--hdfd78af_0

See here for a more in-depth tutorial.

Conda

Conda is a package and environment manager that allows users to install and run software in isolated environments. Is is particularly useful for software with complicated dependencies that may span multiple languages.

When users install software with conda the software and dependencies are downloaded from one or more repositories, known as channels.

Warning

Some conda commands utilize a lot of CPU which can negatively impact other users on these nodes. Please refrain from running conda command such as conda install, conda update, or conda create on login nodes. Instead, use an interactive session or a batch job to run these commands. Any long-running, resource-intensive processes running on login nodes may be terminated without notice.

Installing Conda

Conda is distributed in many ways (e.g. anaconda, miniconda), but we ask our users to install the Miniforge distribution.

Miniforge is a lightweight, minimalist installation of Conda. Importantly, Miniforge only uses the conda-forge channel by default. conda-forge is free and community-maintained. Other distributions are configured to use the defaults channel. While defaults is not protected by a paywall, it is operated by Anaconda Inc, a for-profit entity and is not free to use.

Start interactive session

To get the Miniforge conda distribution first start an interactive session.

srun --qos=general --mem=8gb --pty bash

Download miniforge installer

wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"

There is no need to substitute anything in the URL. The command as given will download the installation script.

Run the Miniforge Installation Script

bash Miniforge3.sh
  1. Agree to license terms (yes)
  2. Choose installation path (default is fine)
  3. Important: Choose whether to initialize conda (select “yes”)

Activate Conda

This will happen automatically when you start a new interactive shell session, but to load it into your current session, run:

source ~/.bashrc
Note

~/.bashrc is only sourced within interactive sessions such as when you are logged in to a login node or have an interactive job running. If you want to use Conda in non-interactive sessions (e.g. batch jobs), you will need to take an extra step. One option is to simply add source ~/.bashrc to your batch job scripts which will recreate your typical environment on a compute node. Another option is to explicitly source the conda setup script in your batch job scripts with: source ~/miniforge3/etc/profile.d/conda.sh (assuming you installed Miniforge to the default location of ~/miniforge3) or source <miniforge path>/etc/profile.d/conda.sh, sustituting <miniforge path with the appropriat path if you did not install to the default location. This can be desirable if you want more control over what is loaded in your batch job environment.

Base Envirionment

By default, conda creates a “base” environment when it is installed. This environment contains the core conda packages and tools. It is generally recommended to avoid installing additional packages into the base environment to keep it clean and stable. You can prevent the base environment from being activated automatically when you start a new shell session by running the following command:

conda config --set auto_activate false

This only needs to be run one time. After running this command, the base environment will not be activated automatically in new shell sessions.

Using Conda

It is recommended to create separate environments for different projects or sets of packages to avoid conflicts and to avoid installing packages into the base environment. Below are the basic commands for creating environments, activating them, and installing packages.

Creating an Environment

Once Miniforge is installed, you can create environments with:

conda create -n <environment name> 

Replace <environment name> with the desired name for your environment.

Confirm changes when prompted.

Activating & Deactivating an Environment

Once you have created an environment, you can activate it with:

conda activate <environment name>

Replace <environment name> with the name of the environment you want to activate.

Once activated you can install or use packages within that environment.

To see a list of all your environments, run:

conda env list

To deactivate the current environment and return to the base environment, run:

conda deactivate

Installing Packages

In order to install packages into a Conda environment, you first need to activate the environment (see above). Once activated, packages can be installed with:

conda install <package name>

Some packages may require specifying a channel with the -c <channel name> argument. For example, to install a package from the bioconda channel, you would run:

conda install -c bioconda <package name>

To see a list of installed packages in the current environment, run:

conda list
Warning

Please avoid using the defaults channel, or any channel that is not free. conda-forge and bioconda are free, community maintained channels that should fit the needs of most users.

Mamba

The mamba command, a faster drop in replacement for conda, will also by installed with Miniforge. We encourage the use of mamba in place of any conda commands. No changes to the syntax are necessary, simply replace conda with mamba.

Jupyter

Jupyter notebooks provide an interactive environment for data analysis and visualization. You can run Jupyter notebooks on the Mantis cluster using a SLURM job to launch a Jupyter server and then connect to it from your local machine via SSH tunneling.

Installation

We recommend installing Jupyter using Conda (see above). Once Conda is installed, you can install Jupyter into a Conda environment with:

conda activate <environment name>
conda install jupyter

Launching a Jupyter Server

To launch a Jupyter server on Mantis, create a SLURM batch script (e.g., jupyter.sh) with the following content which can be copied and pasted with no modification necessary unless you want to change resource requests by modifying the #SBATCH directives.:

#!/bin/bash
#SBATCH --job-name jupyter
#SBATCH --qos general 
#SBATCH --cpus-per-task 1
#SBATCH --mem 8gb
#SBATCH --time 0-6:00:00
#SBATCH --output %x_%j.out

conda_env=${1}

port=$(shuf -i8000-9999 -n1)
user=$(whoami)
token=$(openssl rand -hex 16)

source ~/.bashrc   
conda activate $conda_env

echo -e "

# Execute in local terminal to create SSH tunnel:
ssh -N -L $port:localhost:$port $user@$(hostname -s).cam.uchc.edu

# URL to enter into your browser:
http://localhost:$port/?token=$token

" > jupyter_connection.txt

jupyter lab --no-browser --port=$port --ip=localhost --IdentityProvider.token=$token

Submit the job with:

sbatch jupyter.sh <environment name>

Replacing <environment name> with the name of the Conda environment where Jupyter is installed.

Connecting to the Jupyter Server

After the job starts, look in the jupyter_connections.txt file for connection details. In order to connect:

  1. Start the CAM VPN using Pulse Secure if you are not connected to the UCHC Secure WiFi network. Note that you will need to use the mantis-submit.cam.uchc.edu host if you want to have a simultaneous terminal session on Mantis while connected to the CAM VPN.
  2. Open a terminal on your local machine and run the command shown in the jupyter_connections.txt file to create an SSH tunnel.
  3. Open a web browser and navigate to the URL provided in the jupyter_connections.txt file.
Warning

Please remember to properly close your Jupyter server and terminate the SLURM job when you are finished to free up resources on the cluster. Intentionally running Jupyter servers for extended periods when they are not being utilized is not acceptable and users will recieve warnings for doing so.

Request Software Installation

Users can request the installation of software by submitting a software request. We can create global environment modules, help you compile something in your home directory, or if you need, help you get a singularity container running or set up conda.

Commercial Software

Together with the HPC admins we help manage the use of software for which we have commercial licenses.

IPA

Ingenuity Pathway Analysis from Qiagen “allows you to “quickly visualize and understand complex ’omics data and perform insightful data analysis and interpretation by placing your experimental results within the context of biological systems”.

Request an account here.

Log in here

Some training materials

Geneious

Geneious is a software package which allows you to run many common bioinformatic workflows in a graphical user interface. It runs on your local machine, not Mantis. Some features:

  • Reference mapping of sequencing data.
  • Alignment and phylogeny estimation.
  • Visualization of the data and analysis.
  • Searching NCBI databases.

We have a floating license with 10 seats (10 users can simultaneously run it).

How to Access Geneious

  1. Request an account here.

  2. Once the request has been approved, connect to the UCHC VPN using Pulse Secure:

    UConn Health CAM VPN Tutorial

  3. Use the following URL to download the client for your computer:
    http://geneious.cam.uchc.edu:8080/GeneiousServer/clients.jsp

  4. Select “Return to Home Page” and select the “Download the Geneious Server bundled plugin file” link on the home page to download the correct plugins.

Installation and Login

  1. Double-click to open the Geneious Prime installation file and follow the prompts to accept license agreement and download

  2. Leave the defaults for the Select Destination Directory window, Select Start Menu Folder window, and Select File Associations window, clicking “Next >” after each one

  3. Click “Finish” to complete installation

  4. Open the Geneious Prime application and click “Activate a License” on the pop-up message

  5. Click “License server” and enter geneious.cam.uchc.edu next to Server and leave the port as default, Enter your email address and click “OK”

  6. You will get a notification that the license has been obtained from the FLEXnet server

  7. Install plugins by double clicking the GeneiousServerbundle.gplugin file

  8. Exit out of Geneious once installation is complete, Re-open to begin analyzing data

Virtual Machines

If you need to run something that requires a virtual machine, you can request one here.