Variant calling local reassembly

Sometimes, the variants reported in the variant table are not visible in the alignment displayed through IGV/JBrowse. The reason lies in an intermediate step of the variant calling process called local reassembly

Variant calling algorithms

We use Sentieon variant caller algorithms to perform the variant calling procedure. Sentieon is a toolkit analogous to GATK (The Genome Analysis Toolkit) but built on a highly optimized backend (Kendig et al., 2019). For calling germline variants we use Sentieon DNAscope and for calling somatic variants we use Sentieon TNHaplotyper2. 

Sentieon & GATK local reassembly

Sentieon and GATK use a procedure to re-assemble read data and determine candidate haplotypes as a prelude to variant calling. This procedure, named local reassembly, is a middle step where the program first builds an intermediate alignment and the reads are locally aligned. The algorithm calls variants during this middle step and assigns the variants found values such as allele frequency or read-depth.

 

Sometimes, for some particular genomic regions, the whole sequence alignment does not match with the intermediate alignment constructed during the local reassembly. This is the reason why some variants reported in the variant table are not visible when opening the alignment using IGV or JBrowse.

What is the purpose of performing local reassembly during the variant calling?

Local reassembly based methods are less dependent on prior mapping of sequence reads for variant calling and as a result have higher sensitivity and specificity in indel calling (Rimmer et al., 2014). 

Can I access the intermediate alignment?

No, the intermediate alignment file is not available for download or visualization. This file is temporarily created during the variant calling process by the algorithm. This is the reason why it is not displayed together with the other alignment in the results.

Should I consider the variant a true positive or an artifact?

A variant found under these circumstances should be treated like any other variant called by the variant calling pipeline. All of them are candidates of being true positives and other factors should be taken into consideration to evaluate whether a variant is real (e.g. quality of the call, read depth, allele frequency...). Our validated variant calling pipeline has demonstrated to have a high sensitivity and specificity, however, as any other NGS pipeline, there might be false positives. Even if the variant is not a variant calling or mapping artifact, there might be other type of artifacts (sequencing errors) that should not be overlooked.

References

Kendig, K. I. et al. Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy. Front. Genet. 10, 736 (2019).

Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).