Unique Molecular Identifiers (UMI / UMIs)

Unique Molecular Identifiers are specific nucleotide sequences used in the sequencing process:

  • to detect and quantify unique mRNA transcripts
  • and/or to detect low-frequency variants, typically found in somatic samples.

For further information, you may refer to Illumina.

VarSome Clinical supports the data processing pipeline for assays using UMIs.  There are two approaches to follow when it comes to the processing of sequencing data with UMIs.

    Specific FASTQ file with UMIs

    Simply upload three FASTQ input files and VarSome Clinical's pipeline will recognize the data automatically.

    • files R1 and R3 with reads
    • file R2 as the UMI file

    UMIs derived from the R2 file

    Upload two FASTQ files (R1 and R2) and VarSome Clinical will consider automatically the first N nucleotides of the R2 file to be the UMI sequences.

    Pipeline setup

    In both cases, the assay first needs to be correctly set up.

    Apart from that, there is no need for any additional steps when uploading the data and launching the analysis.


    VarSome Clinical implements fgbio tools for UMI processing.

    Assay-specific instructions