Swift Biosciences's Accel-Amplicon Plus 57G Pan-Cancer Profiling Panel
In a nutshell, the variant T790M has an AF of 0.009, so just under 1% in the bam files VarSome Clinical produces from Swift read data. This is very low and at the very limit of what can be detected by a variant caller. When calling variants, the caller will perform a local realignment of the reads around the target region being investigated and then try to identify the haplotypes supported by the resulting pileup. Therefore, since this variant is such a faint signal, even the tiniest change in the pileup can cause it to be missed.
Specifically, the reason it was called in the analysis A but not in B, is that the latter was run using an “intervals” option that limits the caller to look at only those regions defined in the assay’s bed file. We added this option to the Swift assays because Swift doesn't want any variants that fall outside the target regions to be shown, given that Swift assays are very specific and anything outside the target regions will almost certainly be an artifact.
We spoke to the authors of our variant caller (Sentieon) and they confirmed that in order for a read to be included, when running with this intervals option, it is enough that it has at least one base overlapping the target region. This is quite permissive, but in cases such as this, where the variant is so close to the limit of what can be detected, even a tiny change in the reads included in the pileup may cause it to be missed. As indeed was the case.
We did manage to rescue this and call the variant by padding the assay’s bed file with 10nt on either side of each target region. This is fine and good, but of course, still opens up the possibility of displaying off-target (albeit only marginally off-target) variants and, more importantly, isn't really a long term solution. While we're happy to add such padding to Swift assays, the main issue here is that sometimes, a variant will simply have too low an AF to be detected. The fact that it was found when the intervals option was removed simply reflects how sensitive this sort of faint variant signal is to small changes in the pileup. So while we can rescue this particular variant by changing the intervals, the underlying issue will remain and down the road we may miss another variant that has AF <= 1%. On the other hand, it's questionable whether we shall consider variants with such low support as true positives.
On the whole, I would argue that not finding this variant is a reasonable result given how little support for it there is. This is the sort of thing that will be greatly affected by any small change in sequencing quality, any perturbation in the underlying data. There will always be such borderline cases, and I would caution against trying too hard to rescue this particular one since that might lead us to overfit our pipeline and could result in more false positives with unknown samples.
Charles Chapple, Head of Bioinformatics @ VarSome
February 17, 2020