Accepted input files

The accepted input files to run analyses on VarSome Clinical are either:

  • FASTQ files only from Illumina or MGI sequencers. 

We expect files that conform to Illumina's or MGI naming convention. 
When providing paired-end FASTQ files, we require that reads are properly coordinated between them. Paired-end reads provided in a single FASTQ file are not accepted.

For Illumina pair-end files, we will consider pairs to be files with the exact same name except for the number of the read.
For example:
SampleName_S1_L001_R1_001.fastq.gz and
SampleName_S1_L001_R2_001.fastq.gz.
We accept files in which the read number is specified alone (for example SN1234_S1_L001_1.fastq.gz and SN1234_S1_L001_2.fastq.gz) or with an “R” before the number (for example SN5678_S1_L001_R2.fastq.gz and SN5678_S1_L001_R1.fastq.gz). For further instructions in terms of naming conventions, please refer to Illumina

For MGI pair-end files, we will parse the files as follows:
[flow cell ID]_[lane ID]_[barcode ID]_(optional_id)_[read 1/2].fastq.gz
and we accept the number of the read to be specified alone (for example, 12345_L02_48_1.fastq.gz and 12345_L02_48_2.fastq.gz) or with an “R” before the number (for example, 6789_L02_56_R1.fastq.gz and 6789_L02_56_R2.fastq.gz)

In case were there are more than two paired end files per sample, all the paired reads should have the following naming structure: 

E12345_34_4321_L001_R1_001.fastq.gz 
E12345_34_4321_L001_R2_001.fastq.gz
E12345_34_4321_L002_R1_001.fastq.gz
E12345_34_4321_L002_R2_001.fastq.gz
E12345_34_4321_L003_R1_001.fastq.gz
E12345_34_4321_L003_R2_001.fastq.gz

  • VCF files which conform to the VCF standard, regardless of sequencing platform. Users may also optionally upload an alignment BAM file for the VCF sample which can be used to visualize the coverage of the variants provided in the VCF file. 
    Please, check this link to find more details about requirements for submitted VCFs.