Unique Molecular Identifiers are specific nucleotide sequences used in the sequencing process:
- to detect and quantify unique mRNA transcripts
- and/or to detect low-frequency variants, typically found in somatic samples.
For further information, you may refer to Illumina.
VarSome Clinical supports the data processing pipeline for assays using UMIs. There are two approaches to follow when it comes to the processing of sequencing data with UMIs.
Specific FASTQ file with UMIs
Simply upload three FASTQ input files and VarSome Clinical's pipeline will recognize the data automatically.
- files R1 and R3 with reads
- file R2 as the UMI file
UMIs derived from the R2 file
Upload two FASTQ files (R1 and R2) and VarSome Clinical will consider automatically the first N nucleotides of the R2 file to be the UMI sequences.
Pipeline setup
In both cases, the assay first needs to be correctly set up.
Apart from that, there is no need for any additional steps when uploading the data and launching the analysis.
Background
- VarSome Clinical will align the reads to get a preliminary bam file.
- Tag that bam file using the UMI data.
- Extract what should now be deduplicated reads from the bam.
- Align those reads to the genome again to get the final bam file.
- And then call variants.
VarSome Clinical implements fgbio tools for UMI processing.