Long-Short Term Memory Model for chromosomal aberration detection in Non-Invasive Prenatal Testing
L.P. van Ruyven (TU Delft - Electrical Engineering, Mathematics and Computer Science)
MJT Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Tom Mokveld – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
D. M J Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
Julián Urbano – Graduation committee member (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
In 1997 it was discovered that fragments of DNA circulate freely in the blood plasma and, in the case of pregnancy, this DNA consists of DNA belonging to both the mother and the fetus. This circulating free DNA has made it possible to test for chromosomal aberration in the fetus through non-invasive methods, thereby avoiding the 1 in 100 chance of causing a miscarriage. Since then, multiple methods have been developed to detect chromosomal abnormalities with increasing accuracy and decreasing costs. The current state-of-the-art WISECONDOR uses a within-sample reference set, which is then used to calculate the z-score on a sliding window to determine whether an aberration is present or not. Here, we introduce a deep learning approach to non-invasive prenatal testing in the form of a Long-Short Term Memory model, which takes a sequence of GC normalized read counts per bin on the genome and outputs the label healthy or aberrated per bin. To test the performance of both WISECONDOR and the newly proposed model, data is simulated, and multiple experiments are set up to test the influence of certain aspects of NIPT. When comparing the LSTM model to WISECONDOR, it was shown that the LSTM model is still too inconsistent in its performance. This is caused by its reliance on the initialization of the weights and its dependence on the training set's composition.