Variant calling and quality filters

What is the call status of a variant?

Variant calling is the process by which a software program (the variant caller) identifies variants from sequence data. There are different quality metrics associated with each variant which can be used in subsequent steps of the pipeline to assign it a call status. The call status of a variant can be:

  • PASS: all the quality metrics are above the thresholds (i.e. the variant has passed all quality filters).
  • FAIL: the variant has not passed all quality filters.

Quality filters

The quality filters used for germline and somatic analyses are different since we use different variant callers.

Quality filters in germline analyses

The variant caller algorithm used in germline analyses is Sentieon's DNAscope. We apply the following quality filters after the variant calling step:

  • Coverage: number of reads aligned against the variant position. The coverage threshold is different depending on the chemistry of the assay:
    • WGS samples or capture-based sequencing samples (e.g. exome, gene panels): all variants with a coverage lower than 8 reads will be considered as FAIL.
    • Amplicon samples: all variants with a coverage lower than 100 reads will be considered as FAIL.
  • Quality: the quality score is an internal score calculated by the variant caller algorithm. It can be used to estimate how confident we are that the variant caller has correctly identified a variation in a given genomic position.
    • Single sample analyses: we assign a FAIL call status to the the variants having a QUAL lower than 100 in single sample analyses. The QUAL is the Phred-scaled probability that a REF/ALT polymorphism exists at this site given the sequencing data.
    • Multisample analyses (couple, family trio or generic multisample): we use the GQ (genotype quality) which represents the Phred-scaled confidence that the genotype assignment (GT) is correct. All variants with a GQ lower than 20 will be marked as FAIL. Please bear in mind that the GQ is associated with each sample. For example, a variant called in a trio analysis will have three different GQs, one per each sample. The variant might have a GQ below the threshold in one of the samples while having a GQ above of it in the other samples. In that case, the variant will be marked as "Failed/Not genotyped" in the sample where it had a low GQ and PASS in the others.

Quality filters in somatic analyses

We use Sentieon’s Tnhaplotyper2 as variant caller algorithm. Tnhaplotyper2 is designed to behave like GATK’s mutect2. Tnhaplotyper2, like mutect2, has associated filtering tools which are applied to the variants found by the caller. These filters can then be used to decide whether a variant should be marked as PASS or FAIL. If a variant fails any of the filters present in the “FAIL” column of the table below, it will be marked as FAIL. Failing to pass a filter in the “PASS” column will not cause the variant to be marked as FAIL.

PASS FAIL

clustered_events

map_qual

duplicate

base_qual

fragment

contamination

multiallelic

weak_evidence

n_ratio

low_allele_frac

orientation

normal_artifact

position

panel_of_normals

slippage

strand_bias

haplotype

 

germline

 

strict_strand

 

Somatic VCF filters associated to variants with a PASS call status:

  • clustered_events: multiple events are present on the same haplotype as the variant which is indicative of a false-positive call.
  • duplicate: the alternate allele is overrepresented by apparent sequencing duplicates.
  • fragment: a large difference is observed in the median fragment length for reads supporting the reference and alternate alleles.
  • multiallelic: the mutation occurs at a multialleleic site.
  • n_ratio: too many 'N' bases at the site.
  • orientation: the variant is likely an artifact due to orientation bias.
  • position: the allele is close to the ends of the reads.
  • slippage: the variant is likely an artifact due to polymerase slippage.
  • haplotype: variant is on the same haplotype as other filtered variants.
  • germline: there is evidence that the variant is germline.
  • strict_strand: evidence for the alternate allele is not significant on both directions.

Somatic VCF filters associated to variants with a FAIL call status:

  • map_qual: the median mapping quality of reads supporting the alternate allele is too low.
  • base_qual: the median base quality of bases supporting the alternate allele is too low.
  • contamination: the alternate allele is present due to contamination.
  • weak_evidence: the mutation does not have significant support above noise.
  • low_allele_frac: the variant allele fraction is below the threshold.
  • normal_artifact: the variant is likely an artifact in the normal sample.
  • panel_of_normals: the site is present in the panel of normals.
  • strand_bias: evidence for the alternate allele comes from only one read direction.

Call status variant filtering

When launching a germline/somatic analysis from FASTQ, the user will have two options:

- Variant list will contain all variants: the variant table will contain all variants called by the variant caller including both variants with PASS and FAIL call status.

- Variant list will contain only those variants that pass quality filters: the variant table will contain only variants having a call status of PASS.

Variants can be filtered by its call status using the dynamic filters feature. The "Call Status" filter allows the user to filter variants based on the following criteria:

  • Call Status: PASS, FAIL or anything.
  • Allelic balance: proportion of reads supporting the alternative allele.
  • Coverage: number of aligned reads against the variant position.