How is frequency calculated for CNV calls?

We use the gnomAD database to get the population frequencies for a given CNV. Depending on the type of variant, the frequencies are calculated as follows:

- - Deletions: we use gnomAD variants if they fully overlap with the given variants.
  - Duplications in coding regions: we compare at the gene level and we use those gnomAD variants that encompass the same coding genes as the given variant.
  - Duplications in non-coding regions: we use gnomAD variants if they cover at least 85% of the variant region.

Why are frequencies calculated differently for gains vs losses?

GnomAD reports common structural variants; currently, we retrieve information only for deletions and duplications.

For deletions, we consider the gnomAD CNVs when they fully encompass a sample's annotated CNV. Even if the gnomAD reported deletion is larger than the sample's CNV, we can assume that the sample's CNV is contained in the gnomAD population. This way, if the gnomAD CNV is reported as a benign loss, then the sample's CNV will be most likely benign as it is contained in that region. Conversely, if the sample's CNV overlaps with a gnomAD pathogenic CNV, then the sample's CNV will be most likely pathogenic as it contains a region whose loss results in pathogenicity.
For duplications, we differentiate our frequency calculation approach based on the genomic location of a CNV. In case of duplications found at coding regions, we compare the sample's CNVs and gnomAD CNVs at gene level. We will consider that both CNVs may have an equivalent effect only if they encompass the same coding genes.