Software
Environment Modules
Environment modules provide a convenient way of using most of the installed sotfware on the Mantis cluster. Environment modules are a dynamic way to modify your shell environment, allowing you to easily load and unload different software packages, compilers, and libraries without conflicts. They work by setting up the necessary environment variables (like PATH, LD_LIBRARY_PATH, and MANPATH).
Below is a list of commonly used commands, where <module> is the target module.
| Module Command | Description |
|---|---|
module avail |
Display all available modules. |
module load <module> |
Loads a specific module into environemnt. |
module list |
Show list of currently loaded modules. |
module unload <module> |
Unloads a specific module. |
module purge |
Unloads all loaded modules. |
The Mantis cluster uses the Lmod implementation of environment modules.
Compiling software yourself
Some software is pretty easy to install. See this classic softare in population genetics, psmc.
All you have to do to install are the following commands (the instructions are in the README.md file):
# clone the repository
git clone https://github.com/lh3/psmc.git
cd psmc
make; (cd utils; make)And it should create an executable binary. Some software is slightly more complicated, but still manageable by novice to intermediate users.
Some software, however can be very difficult, with a complex chain of dependencies, in which case we often rely on containerization with Singularity or use isolated environments created with Conda
Singularity
Reproducing environments and managing dependencies is difficult problem. Containerization is a powerful solution, allowing users to package software and its dependencies into a single, portable unit. Singularity is a container platform that is well-suited for HPC environments and can additionally use Docker images (Docker is the other major containerization platform and unavailable on Mantis).
Singularity is in your PATH by default so it can be run without loading a module.
To get an existing container from a repository, you use the command singularity pull <URL>. For example:
singularity pull https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0This will create the file fastqc:0.12.1--hdfd78af_0, a singularity image (some image files will have the extension .sif). This containst the package fastqc.
To run a command (e.g. fastqc --help) inside the container, particularly when submitting SLURM batch scripts, you typically you will use singularity exec <container> <command>:
singularity exec fastqc:0.12.1--hdfd78af_0 fastqc --helpYou can also start up a shell inside the container and explore interactively with singularity shell <container>.
singularity shell fastqc:0.12.1--hdfd78af_0See here for a more in-depth tutorial.
Conda
Conda is a package and environment manager that allows users to install and run software in isolated environments. Is is particularly useful for software with complicated dependencies that may span multiple languages.
When users install software with conda the software and dependencies are downloaded from one or more repositories, known as channels.
Some conda commands utilize a lot of CPU which can negatively impact other users on these nodes. Please refrain from running conda command such as conda install, conda update, or conda create on login nodes. Instead, use an interactive session or a batch job to run these commands. Any long-running, resource-intensive processes running on login nodes may be terminated without notice.
Installing Conda
Conda is distributed in many ways (e.g. anaconda, miniconda), but we ask our users to install the Miniforge distribution.
Miniforge is a lightweight, minimalist installation of Conda. Importantly, Miniforge only uses the conda-forge channel by default. conda-forge is free and community-maintained. Other distributions are configured to use the defaults channel. While defaults is not protected by a paywall, it is operated by Anaconda Inc, a for-profit entity and is not free to use.
Start interactive session
To get the Miniforge conda distribution first start an interactive session.
srun --qos=general --mem=8gb --pty bashDownload miniforge installer
wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"There is no need to substitute anything in the URL. The command as given will download the installation script.
Run the Miniforge Installation Script
bash Miniforge3.sh- Agree to license terms (yes)
- Choose installation path (default is fine)
- Important: Choose whether to initialize conda (select “yes”)
Activate Conda
This will happen automatically when you start a new interactive shell session, but to load it into your current session, run:
source ~/.bashrc~/.bashrc is only sourced within interactive sessions such as when you are logged in to a login node or have an interactive job running. If you want to use Conda in non-interactive sessions (e.g. batch jobs), you will need to take an extra step. One option is to simply add source ~/.bashrc to your batch job scripts which will recreate your typical environment on a compute node. Another option is to explicitly source the conda setup script in your batch job scripts with: source ~/miniforge3/etc/profile.d/conda.sh (assuming you installed Miniforge to the default location of ~/miniforge3) or source <miniforge path>/etc/profile.d/conda.sh, sustituting <miniforge path with the appropriat path if you did not install to the default location. This can be desirable if you want more control over what is loaded in your batch job environment.
Base Envirionment
By default, conda creates a “base” environment when it is installed. This environment contains the core conda packages and tools. It is generally recommended to avoid installing additional packages into the base environment to keep it clean and stable. You can prevent the base environment from being activated automatically when you start a new shell session by running the following command:
conda config --set auto_activate falseThis only needs to be run one time. After running this command, the base environment will not be activated automatically in new shell sessions.
Using Conda
It is recommended to create separate environments for different projects or sets of packages to avoid conflicts and to avoid installing packages into the base environment. Below are the basic commands for creating environments, activating them, and installing packages.
Creating an Environment
Once Miniforge is installed, you can create environments with:
conda create -n <environment name> Replace <environment name> with the desired name for your environment.
Confirm changes when prompted.
Activating & Deactivating an Environment
Once you have created an environment, you can activate it with:
conda activate <environment name>Replace <environment name> with the name of the environment you want to activate.
Once activated you can install or use packages within that environment.
To see a list of all your environments, run:
conda env listTo deactivate the current environment and return to the base environment, run:
conda deactivateInstalling Packages
In order to install packages into a Conda environment, you first need to activate the environment (see above). Once activated, packages can be installed with:
conda install <package name>Some packages may require specifying a channel with the -c <channel name> argument. For example, to install a package from the bioconda channel, you would run:
conda install -c bioconda <package name>To see a list of installed packages in the current environment, run:
conda listPlease avoid using the defaults channel, or any channel that is not free. conda-forge and bioconda are free, community maintained channels that should fit the needs of most users.
Mamba
The mamba command, a faster drop in replacement for conda, will also by installed with Miniforge. We encourage the use of mamba in place of any conda commands. No changes to the syntax are necessary, simply replace conda with mamba.
Jupyter
Jupyter notebooks provide an interactive environment for data analysis and visualization. You can run Jupyter notebooks on the Mantis cluster using a SLURM job to launch a Jupyter server and then connect to it from your local machine via SSH tunneling.
Installation
We recommend installing Jupyter using Conda (see above). Once Conda is installed, you can install Jupyter into a Conda environment with:
conda activate <environment name>
conda install jupyterLaunching a Jupyter Server
To launch a Jupyter server on Mantis, create a SLURM batch script (e.g., jupyter.sh) with the following content which can be copied and pasted with no modification necessary unless you want to change resource requests by modifying the #SBATCH directives.:
#!/bin/bash
#SBATCH --job-name jupyter
#SBATCH --qos general
#SBATCH --cpus-per-task 1
#SBATCH --mem 8gb
#SBATCH --time 0-6:00:00
#SBATCH --output %x_%j.out
conda_env=${1}
port=$(shuf -i8000-9999 -n1)
user=$(whoami)
token=$(openssl rand -hex 16)
source ~/.bashrc
conda activate $conda_env
echo -e "
# Execute in local terminal to create SSH tunnel:
ssh -N -L $port:localhost:$port $user@$(hostname -s).cam.uchc.edu
# URL to enter into your browser:
http://localhost:$port/?token=$token
" > jupyter_connection.txt
jupyter lab --no-browser --port=$port --ip=localhost --IdentityProvider.token=$tokenSubmit the job with:
sbatch jupyter.sh <environment name>Replacing <environment name> with the name of the Conda environment where Jupyter is installed.
Connecting to the Jupyter Server
After the job starts, look in the jupyter_connections.txt file for connection details. In order to connect:
- Start the CAM VPN using Pulse Secure if you are not connected to the UCHC Secure WiFi network. Note that you will need to use the
mantis-submit.cam.uchc.eduhost if you want to have a simultaneous terminal session on Mantis while connected to the CAM VPN. - Open a terminal on your local machine and run the command shown in the
jupyter_connections.txtfile to create an SSH tunnel. - Open a web browser and navigate to the URL provided in the
jupyter_connections.txtfile.
Please remember to properly close your Jupyter server and terminate the SLURM job when you are finished to free up resources on the cluster. Intentionally running Jupyter servers for extended periods when they are not being utilized is not acceptable and users will recieve warnings for doing so.
Request Software Installation
Users can request the installation of software by submitting a software request. We can create global environment modules, help you compile something in your home directory, or if you need, help you get a singularity container running or set up conda.
Commercial Software
Together with the HPC admins we help manage the use of software for which we have commercial licenses.
IPA
Ingenuity Pathway Analysis from Qiagen “allows you to “quickly visualize and understand complex ’omics data and perform insightful data analysis and interpretation by placing your experimental results within the context of biological systems”.
Request an account here.
Log in here
Geneious
Geneious is a software package which allows you to run many common bioinformatic workflows in a graphical user interface. It runs on your local machine, not Mantis. Some features:
- Reference mapping of sequencing data.
- Alignment and phylogeny estimation.
- Visualization of the data and analysis.
- Searching NCBI databases.
We have a floating license with 10 seats (10 users can simultaneously run it).
How to Access Geneious
Request an account here.
Once the request has been approved, connect to the UCHC VPN using Pulse Secure:
Use the following URL to download the client for your computer:
http://geneious.cam.uchc.edu:8080/GeneiousServer/clients.jspSelect “Return to Home Page” and select the “Download the Geneious Server bundled plugin file” link on the home page to download the correct plugins.
Installation and Login
Double-click to open the Geneious Prime installation file and follow the prompts to accept license agreement and download
Leave the defaults for the Select Destination Directory window, Select Start Menu Folder window, and Select File Associations window, clicking “Next >” after each one
Click “Finish” to complete installation
Open the Geneious Prime application and click “Activate a License” on the pop-up message
Click “License server” and enter geneious.cam.uchc.edu next to Server and leave the port as default, Enter your email address and click “OK”
You will get a notification that the license has been obtained from the FLEXnet server
Install plugins by double clicking the GeneiousServerbundle.gplugin file
Exit out of Geneious once installation is complete, Re-open to begin analyzing data
Virtual Machines
If you need to run something that requires a virtual machine, you can request one here.