In silico detection of variable number tandem repeats associated with Alzheimer’s disease from short-read sequencing data

More Info
expand_more

Abstract

Motivation: Alzheimer’s disease (AD) is a highly prevalent disease whose genetic risk factors remain largely unknown. One potential genetic risk factor is tandem repeat expansions, which have been associated with over 40 diseases, most of which affect the nervous system. Detecting VNTRs from short-read data is a challenging task, leaving many VNTRs unidentified. To date only one variable number tandem repeat (VNTR) expansion (in the ABCA7 gene) has been linked to AD. We hypothesize there are many more VNTR expansions to be discovered that associate with an increased risk of AD.
Results: We created a pipeline with which we overcame the common limitations of VNTR detection (namely, the need for a predefined set of repeats and limited detectable VNTR sizes due to read length). We performed a genome-wide search for VNTRs with a motifsize ≥ 7 bp that show repeat size variations associated with AD. We detected 71 VNTR expansions and 1242 contractions, including expansions in genes ADAMTSL3, ARHGEF10, DIP2C, EVC2, GRM8, MPPED1, PID1 and an expansion in the SCIMP gene close to a well-known AD single nucleotide polymorphism (SNP). Our pipeline is, to our knowledge, one of the very few to detect VNTRs exceeding read length without a predefined set of repeats. It is able to detect both previously reported and novel VNTRs, resulting in a promising set of VNTRs showing an association with AD.