Exploring deep learning to improve allelic peak calling in forensic DNA analysis

More Info
expand_more

Abstract

When processing a trace DNA sample at the Netherlands Forensic Institute, an STR electropherogram can be created. An analyst uses this electropherogram and analysis software to read out peaks signifying DNA. After analysis, the DNA profile is used in the interpretation process, which can include the comparison to a reference DNA profile of a person of interest. The software that is currently being used for profile analysis is threshold-based and the process includes the intervention of trained analysts. To further automate (allelic) peak identification in STR electropherograms, as well as to increase efficiency and uniformity, neural networks were studied and applied. Previous work by Duncan Taylor and David Powers provided a proof of concept using a simple fully connected neural net. After reviewing literature, the U-net was selected to be used in this thesis. Training U-net on electropherograms proved successful and achieved a 95% accuracy on the per-pixel labels. However, translating the per-pixel output to alleles was more difficult than expected, so an upper bound on the score was calculated. The upper bound got close to an analyst’s performance and demonstrated the potential of this method.