Requirements for submitted VCF files

VarSome Clinical accepts VCF files containing only CNVs or a mix of CNVs and other SVs. The VCFs may contain the following types of variants:
    •  CNVs: deletion and duplication
    •  Insertions
    •  Inversions
    •  Breakends 
    •  Repeat expansions

    For more information about the SV annotation from VCF files please refer to the document SV annotation (from VCF)

    Users may also optionally upload an alignment BAM file for the VCF sample which can be used to visualize the coverage of the variants provided in the VCF file. 

    The VCF files should conform to the VCF standard, regardless of the sequencing platform.

    Required format for SNPs/INDELs annotation

    VCFs containing SNPs and small INDELs can be used to launch a somatic or germline analysis: (Launch analysis > New analysis > Germline/Somatic analysis from VCF). 

    The VCFs uploaded to analyze SNPs/small INDELs variants must have the following requirements:

    1. Are compliant with the VCF standard.
    2. Include only specific SNVs and INDELs. In order to annotate a variant, we need to know exactly what that variant is, so we cannot handle cases where the variant's sequence isn't specified. For example, we cannot annotate "NON_REF" variants:
      #CHROM    POS      ID     REF      ALT
      chr1 10052 . C   <NON_REF>     

      Or variants with an "N" in the ALT field:

      #CHROM      POS       ID    REF     ALT
      chr22  30998425  .     C    CTTTTTNT
    3. Include a valid genotype (GT) field for each variant entry.
    4. The files should contain the variants found in a real human sample. We expect a maximum of around 4 or 5 million variants in a sample.

    Required format for SVs annotation

    VCFs containing CNVs (deletions and duplications) and other SVs (insertions, inversions and breakends) can be used to launch an SV sub-analysis from VCF. 

    The VCFs uploaded to annotate SV variants must have the following requirements:

    1. Are compliant with the VCF standard.
    2. Include duplications and/or deletions where the type of copy number variant is shown in the ALT field or insertions, inversions and/or breakends:
      ##ALT=<ID=DEL,Description="Deletion">

      ##ALT=<ID=DUP,Description="Duplication">

      ##ALT=<ID=INS,Description="Insertion">

      ##ALT=<ID=INV,Description="Inversion">

      ##ALT=<ID=BND,Description="Breakend">

      Example of an accepted VCF with CNVs:

      #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE1
      chr12    133040735    .    C    <DUP>    .    PASS    SVTYPE=DUP;SVLEN=140;END=133040875    GT:CN    0/1:1.50
      chr12    133049934    .    G    <DEL>    .    PASS    SVTYPE=DEL;SVLEN=78;END=133050012    GT:CN    0/1:0.50
      chrX    127845659    .    N    <INV>    60    GT    IMPRECISE;SVTYPE=INV;SVLEN=59;END=127845718;SUPPORT=5    GT:GQ:DR:DV    0/0:36:31:5
      chr1    4939486 .        T       <INS>   406     PASS    END=4939486;SVTYPE=INS;CIPOS=0,11;CIEND=0,11;HOMLEN=11;HOMSEQ=GATATCAATAT;LEFT_SVINSSEQ=GATATCAATATTCTCCATATGACTTCAGTGTCCTCCATATGACATCAATATCCTCCATATGATGTCAATATCTTC;RIGHT_SVINSSEQ=GTATGATGTCAATATCCTCCATATGATGTCAACATCATCCATATGATTTCAGTGTCCTCCGTATGATGTCAATGTCCTCCATAA     GT:FT:GQ:PL:PR:SR       1/1:PASS:39:459,42,0:0,2:0,22
      chr6    9500791        .    N       N[chr14:82674769[       60.0    PASS    PRECISE;SVTYPE=BND;SUPPORT=10;RNAMES=575d2f99-ea85-4dd3-a8f6-905e82d20947,ec3d723e-a783-4ee6-8fd2-ce44b4dddbf7,9c0f1346-ce9d-4d1a-b35b-9a09fda784d1,5ed8794a-9c95-4dc8-90e5-ca76641d6fd6,784251f9-d4f1-4782-ae3b-cca7c87abe95,62559b4a-d107-4f9e-8cff-e2762b068338,6b2fa5e8-6baa-4be6-a1e6-e8a7d0373cde,c452c1f0-9dd9-496b-bd92-812ce84919e1,b7a4d531-de79-45ec-84b6-8967b927c7ec,a0984676-e8c5-4314-b9a2-b7919ded8ccd;COVERAGE=0,0,40,40,40;STRAND=-;AF=0.25;CHR2=chr14;STDEV_POS=0;ANN=N[CHR14:82674769[|transcript_ablation|HIGH|LINC02301|LINC02301|transcript|NR_146650.1|pseudogene||t(6%3B14)(%3B)(n.*12363966)|t(6%3B14)(%3BNR_146650.1:null)|||||      GT:GQ:DR:DV           0/1:16:30:10
    3. According to the VCF Specification, the CNV category should not be used when a more specific category can be applied. 

      ##ALT=<ID=CNV,Description="Copy Number Variant">

      Therefore, the following VCF format is not accepted:

      chrX	133559227	.	G	<CNV>	.	.	SVTYPE=CNV;SVLEN=140;END=133559366	GT:FC:CN	0/1:-1.82:1.10
    4. Include a valid genotype (GT) field for each variant entry.
    5. Do not include other type of SV variants such as large chromosomal rearrangements (e.g. inversions, translocations) or gene fusions. We currently do not support these type of SV variants.


    ❗ Tip: checking the format of a VCF file

    Ensuring that your VCF file is structured correctly and ready to be uploaded to VarSome Clinical is a recommended practice that could facilitate your analyses and save valuable time.

    An easy way to check that your VCF file is valid is to try to run a bcftools command on it. Bcftools, a set of utilities that manipulate VCF files, is very sensitive to malformed VCFs, so it will fail if the file doesn't conform to the standard.

    After installing Bcftools according to the instructions, the following command can be executed, where file.vcf represents your input VCF file:

    bcftools norm -m -any -NO v file.vcf

    This command will attempt to perform certain actions: check that REF alleles match the reference, split multiallelic sites into multiple rows, or recover multiallelics from multiple rows. If the fields in your file are complete, the command will be executed smoothly. However, if it comes across a non-compliant field like the following,  

    chr1    16366632        .       CC      GC,GT   193.02  PASS    AB=0.5;

    the command will fail. In the row above, the field allelic balance (AB) is incomplete, as this is a multiallelic site with two alleles in a single row and two numbers are expected. This information will be provided with an error message: 

    Error: wrong number of fields in INFO/AB at chr1:16366632, expected 2, found 1

    Other alternatives to VCF validation:

    which can be used to locate other types of errors (e.g. a malformed or missing header).

    Another quick test is to just see if a standard program like bcftools recognizes the file and doesn't complain.