ExpansionHunter's catalogue creator

Create a custom variant catalogue for ExpansionHunter. You can either convert a BED file containing any STR loci into a variant catalogue or manually select known pathogenic loci from the list that you wish to target during your analysis. The same catalogue can be also generated from command line (see API instructions).

Convert a BED file to variant catalogue

You can use a BED file or a tab/space-delimited text file with a maximum of approximately 1 million loci to convert it into a variant catalogue. Each locus must be placed on a new line, and the columns should include: 1) chromosome, 2) STR start position, 3) STR end position, 4) repeat unit, and 5) locus ID (optional). If the locus ID is missing, an ID will be generated based on the reference coordinates.

Example content of a regions file:
chrX6754531667545385GCA
chr430748763074933CAGHTT
chr96903728669037304GAAFXS

 
Available loci
Loci in the catalogue
  • ABCD3 (GCC in 5′ UTR)
  • AFF2 (CCG in 5′ UTR)
  • AR (CAG in Coding)
  • ARX_1 (GCN in Coding)
  • ARX_2 (GCN in Coding)
  • ATN1 (CAG in Coding)
  • ATXN1 (CAG in Coding)
  • ATXN10 (ATTCT in Intron)
  • ATXN2 (CAG in Coding)
  • ATXN3 (CAG in Coding)
  • ATXN7 (CAG in Coding)
  • ATXN8OS (CTG in 3′ UTR)
  • BEAN1 (TGGAA in Intron)
  • C9ORF72 (GGGGCC in Intron)
  • CACNA1A (CAG in Coding)
  • CBL (CCG in 5′ UTR)
  • CNBP (CCTG in Intron)
  • COMP (GAC in Coding)
  • CSTB (CCCCGCCCCGCG in Promoter)
  • DAB1 (ATTTC in Intron)
  • DIP2B (CGG in 5′ UTR)
  • DMD (GAA in Intron)
  • DMPK (CTG in 3′ UTR)
  • EIF4A3 (TCGGCAGCGGCACAGCGAGG in 5′ UTR)
  • FGF14 (GAA in Intron)
  • FMR1 (CGG in 5′ UTR)
  • FOXL2 (GCN in Coding)
  • FXN (GAA in Intron)
  • GIPC1 (GGC in 5′ UTR)
  • GLS (GCA in 5′ UTR)
  • HOXA13_1 (GCN in Coding)
  • HOXA13_2 (GCN in Coding)
  • HOXA13_3 (GCN in Coding)
  • HOXD13 (GCN in Coding)
  • HTT (CAG in Coding)
  • JPH3 (CTG in 3′ UTR)
  • LRP12 (CGG in 5′ UTR)
  • MARCHF6 (TTTCA in Intron)
  • NIPA1 (GCG in 5′ UTR)
  • NOP56 (GGCCTG in Intron)
  • NOTCH2NLC (GGC in 5′ UTR)
  • NUTM2B-AS1 (CGG in Noncoding transcript)
  • PABPN1 (GCG in Coding)
  • PHOX2B (GCN in Coding)
  • PPP2R2B (CAG in 5′ UTR)
  • PRDM12 (GCC in Coding)
  • PRNP (CCTCATGGTGGTGGCTGGGGGCAG in Coding)
  • RAPGEF2 (TTTCA in Intron)
  • RFC1 (AAGGG in Intron)
  • RFC1 (ACAGG in Intron)
  • RILPL1 (GGC in 5′ UTR)
  • RUNX2 (GCN in Coding)
  • SAMD12 (TTTCA in Intron)
  • SOX3 (GCN in Coding)
  • STARD7 (ATTTC in Intron)
  • TBP (CAG in Coding)
  • TBX1 (GCN in Coding)
  • TCF4 (CTG in Intron)
  • THAP11 (CAG in Coding)
  • TNRC6A (TTTCA in Intron)
  • VWA1 (GGCGCGGAGC in Coding)
  • XYLT1 (GGC in 5′ UTR)
  • YEATS2 (TTTCA in Intron)
  • ZFHX3 (GGC in Coding)
  • ZIC2 (GCN in Coding)
  • ZIC3 (GCC in Coding)
Reference genome
Chromosome naming
Extended analysis

* While off-target regions enable genotyping of alleles longer than the fragment length, there is also a chance of obtaining overestimated genotypes (which may vary depending on the locus). Additionally, you should ensure that there are no other expansions of the same repeat unit in the genome. Overall, interpret your results with caution and visualise reads using the REViewer tool.

Known issues with using the catalogue and ExpansionHunter (v4.0.2 & v5.0.0):
• ARX_1 and ARX_2 as well as HOXA13_1, HOXA13_2 and HOXA13_3 tracts are too close to each other and false positive results for these loci are often observed.
• Genotypes returned for Replaced and Nested types of repeats also includes the non-pathogenic repeats. You could use STRipy to determine the presence of the pathogenic motif in your sample.
• Long homozyogus alleles (over the fragment length, above 400-500 bp on average;) are likely determined as heterozyous with one allele being overestimated and the other underestimated (see figure C on this plot).
Additional issues with ExpansionHunter v4.0.1 or an earlier version: genotyping AR, ATXN1 and TCF4 loci are limited to the read length (approximately).