VarSome Clinical accepts VCF files for SNPs/INDEL and CNV annotation. You can upload VCFs containing only SNPs/INDELs or CNVs, but you can also upload VCFs containing both types of variants.
If you upload mixed VCFs, they will be divided into two files: one file to annotate SNPs/small INDELs (*filtered.vcf.gz) and one file to annotate CNVs (*cnv.vcf.gz).
Required format for SNPs/INDELs annotation
VCFs containing SNPs and small INDELs can be used to launch a somatic or germline analysis: (Launch analysis > New analysis > Germline/Somatic analysis from VCF).
The VCFs uploaded to analyze SNPs/small INDELs variants must have the following requirements:
- Are compliant with the VCF standard.
- Include only specific SNVs and INDELs. In order to annotate a variant, we need to know exactly what that variant is, so we cannot handle cases where the variant's sequence isn't specified. For example, we cannot annotate "NON_REF" variants:
#CHROM POS ID REF ALT
chr1 10052 . C <NON_REF>Or variants with an "N" in the ALT field:
#CHROM POS ID REF ALT
chr22 30998425 . C CTTTTTNT - Include a valid genotype (GT) field for each variant entry.
- The files should contain the variants found in a real human sample. We expect a maximum of around 4 or 5 million variants in a sample.
Required format for CNVs annotation
VCFs containing CNVs (deletions and duplications) can be used to launch a CNV subanalysis from VCF.
The VCFs uploaded to annotate CNV variants must have the following requirements:
- Are compliant with the VCF standard.
- Include duplications and/or deletions where the type of copy number variant is shown in the ALT field:
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">Example of an accepted VCF with CNVs:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1
chr12 133040735 . C <DUP> . PASS SVTYPE=DUP;SVLEN=140;END=133040875 GT:CN 0/1:1.50
chr12 133049934 . G <DEL> . PASS SVTYPE=DEL;SVLEN=78;END=133050012 GT:CN 0/1:0.50 -
According to the VCF Specification, the CNV category should not be used when a more specific category can be applied.
##ALT=<ID=CNV,Description="Copy Number Variant">
Therefore, the following VCF format is not accepted:
chrX 133559227 . G <CNV> . . SVTYPE=CNV;SVLEN=140;END=133559366 GT:FC:CN 0/1:-1.82:1.10
- Include a valid genotype (GT) field for each variant entry.
- Do not include other type of SV variants such as large chromosomal rearrangements (e.g. inversions, translocations) or gene fusions. We currently do not support these type of SV variants.
❗ Tip: checking the format of a VCF file
Ensuring that your VCF file is structured correctly and ready to be uploaded to VarSome Clinical is a recommended practice that could facilitate your analyses and save valuable time.
An easy way to check that your VCF file is valid is to try to run a bcftools command on it. Bcftools, a set of utilities that manipulate VCF files, is very sensitive to malformed VCFs, so it will fail if the file doesn't conform to the standard.
After installing Bcftools according to the instructions, the following command can be executed, where file.vcf represents your input VCF file:
bcftools norm -m -any -NO v file.vcf
This command will attempt to perform certain actions: check that REF alleles match the reference, split multiallelic sites into multiple rows, or recover multiallelics from multiple rows. If the fields in your file are complete, the command will be executed smoothly. However, if it comes across a non-compliant field like the following,
chr1 16366632 . CC GC,GT 193.02 PASS AB=0.5;
the command will fail. In the row above, the field allelic balance (AB) is incomplete, as this is a multiallelic site with two alleles in a single row and two numbers are expected. This information will be provided with an error message:
Error: wrong number of fields in INFO/AB at chr1:16366632, expected 2, found 1
Other alternatives to VCF validation:
- https://github.com/EBIvariation/vcf-validator
- http://vcftools.sourceforge.net/perl_module.html#vcf-validator
which can be used to locate other types of errors (e.g. a malformed or missing header).
Another quick test is to just see if a standard program like bcftools recognizes the file and doesn't complain.