Small variant detection

In this workshop we teach the students the basics of detecting short germ-line variants (approximately < 50bp) such as single-nucleotide polymorphisms (SNPs), insertions and deletions (INDELs) in a reference genome framework. We cover:

This workshop is focused on whole-genome sequencing data (WGS), but many of the concepts apply equally well to exome sequencing and other capture probe methods. It does not cover somatic variants, structural variants, or pan-genomes. The workshop does not go into detail about the laboratory methods themselves. It also does not cover much downstream data analysis of genotypes because our participants typically have extremely diverse aims.

See this site for our workshop schedule and registration link.

See this page for our general short-form workshop approach.

Schedule

  • Prior to workshop
  • Day 1
    • Intro to genetic variation and variant detection.
    • Initial data QC steps.
    • Exploring alignment data in IGV.
  • Day 2
    • Introduction to variant callers and models.
    • Variant detection with GATK and Freebayes.
    • Introduction to post-processing variant calls.
  • Day 3
    • Variant QC
      • Filtering variant call sets.
      • Comparing variant call sets.
    • Annotating variant call sets.
    • Strategies for arallelizing variant calling and genotyping.

Data

We fully work through ddRAD-seq data from an unpublished study of the population genetics of Arctic grayling (a salmonid fish). The dataset has >500 samples from 28 populations, and provides the opportunity to demonstrate several ways to speed up analysis through parallelization on the HPC.