Tools for Single Cell Analysis
Kerry Cobb
Objectives & Learning Goals
Brief overview of tools used in single cell analysis
Python
Versatile and popular object oriented programming language
Expressive & readable syntax
Extensive library ecosystem
Heavily used in data science and machine learning
R
Language created for statistical computing
Extensive library ecosystem for statistical analysis
Heavily used in bioinformatics and statistics
Excellent for data visualization
Python vs R
R has lead over python for existing statistical analysis libraries
Python is more popular for machine learning
Python is more versatile for general programming tasks
Python scales better to larger datasets
Python vs R
Higher rate of growth of Python packages for single cell analysis
Zappia & Theis 2021
Python vs R
Recommendation:
Get comfortable with both
R will stay relevant for a long time
Python popularity is growing rapidly
Particularly in machine learning
Challenge:
Diffierent syntax
Its not actually that different
Different package ecosystems
We have a solution for that
Switching between languages
We have a solution for that too!
Mamba
A cross platform package manager developed for Python
Can install non-python packages as well
Reimplementation of conda
Uses the same package repositories as conda
Much faster
Create isolated environments
Install packages
Jupyter
Interactive notebooks for data analysis
Can run Python, R, and many other languages
Supports rich media output
Great for exploratory data analysis and visualization
Start Jupyter Notebook
sbatch
jupyter.sh
<
mamba env
>
anndata