WB
W.W. Büthker
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Peaks on Trial
Deep Learning for Allelic Peak Classification in Forensic DNA Electropherograms
Allele calling is a critical step in forensic DNA analysis, and better automation could increase throughput and consistency in casework. However, few studies systematically compare machine-learning architectures for allele calling, and limited high-quality training data constrain progress.
We designed and evaluated a peak-based allele-calling pipeline that makes peak-level classifications rather than profile-level segmentations. The pipeline comprises three models: the Peak Model, Autoencoder Model, and Combined Model; and we evaluated them on datasets with ground-truth annotations and with imperfect analyst annotations. We compared performance against state-of-the-art baselines.
The Combined Model outperformed DNANet on NFI research data with ground-truth annotations (pixel F1 0.934 vs. 0.923; p = 0.001). Ablation experiments showed that each component contributed to performance. Autoencoder pretraining improved accuracy when training data were scarce (fewer than 1,000 DNA profiles). Error analysis further indicated that small peaks are hardest to classify as allelic (true DNA) versus artefactual compared with medium and high peaks.
Overall, peak-based classification improves allele-calling performance over current models and clarifies key failure regimes, bringing fully automated allele calling closer to forensic deployment. ...
We designed and evaluated a peak-based allele-calling pipeline that makes peak-level classifications rather than profile-level segmentations. The pipeline comprises three models: the Peak Model, Autoencoder Model, and Combined Model; and we evaluated them on datasets with ground-truth annotations and with imperfect analyst annotations. We compared performance against state-of-the-art baselines.
The Combined Model outperformed DNANet on NFI research data with ground-truth annotations (pixel F1 0.934 vs. 0.923; p = 0.001). Ablation experiments showed that each component contributed to performance. Autoencoder pretraining improved accuracy when training data were scarce (fewer than 1,000 DNA profiles). Error analysis further indicated that small peaks are hardest to classify as allelic (true DNA) versus artefactual compared with medium and high peaks.
Overall, peak-based classification improves allele-calling performance over current models and clarifies key failure regimes, bringing fully automated allele calling closer to forensic deployment. ...
Allele calling is a critical step in forensic DNA analysis, and better automation could increase throughput and consistency in casework. However, few studies systematically compare machine-learning architectures for allele calling, and limited high-quality training data constrain progress.
We designed and evaluated a peak-based allele-calling pipeline that makes peak-level classifications rather than profile-level segmentations. The pipeline comprises three models: the Peak Model, Autoencoder Model, and Combined Model; and we evaluated them on datasets with ground-truth annotations and with imperfect analyst annotations. We compared performance against state-of-the-art baselines.
The Combined Model outperformed DNANet on NFI research data with ground-truth annotations (pixel F1 0.934 vs. 0.923; p = 0.001). Ablation experiments showed that each component contributed to performance. Autoencoder pretraining improved accuracy when training data were scarce (fewer than 1,000 DNA profiles). Error analysis further indicated that small peaks are hardest to classify as allelic (true DNA) versus artefactual compared with medium and high peaks.
Overall, peak-based classification improves allele-calling performance over current models and clarifies key failure regimes, bringing fully automated allele calling closer to forensic deployment.
We designed and evaluated a peak-based allele-calling pipeline that makes peak-level classifications rather than profile-level segmentations. The pipeline comprises three models: the Peak Model, Autoencoder Model, and Combined Model; and we evaluated them on datasets with ground-truth annotations and with imperfect analyst annotations. We compared performance against state-of-the-art baselines.
The Combined Model outperformed DNANet on NFI research data with ground-truth annotations (pixel F1 0.934 vs. 0.923; p = 0.001). Ablation experiments showed that each component contributed to performance. Autoencoder pretraining improved accuracy when training data were scarce (fewer than 1,000 DNA profiles). Error analysis further indicated that small peaks are hardest to classify as allelic (true DNA) versus artefactual compared with medium and high peaks.
Overall, peak-based classification improves allele-calling performance over current models and clarifies key failure regimes, bringing fully automated allele calling closer to forensic deployment.
Due to the increasing popularity of various types of sensors in traffic management, it has become significantly easier to collect data on traffic flow. However, the integrity of these data sets is often compromised due to missing values resulting from sensor failures, communication errors, and other malfunctions. This study investigates the effect of missing data on the performance of Long Short-Term Memory (LSTM) models in traffic flow prediction and assesses strategies to handle these missing values. By actively removing values from a complete data set, three strategies to handle these missing values are evaluated: dropping null values, replacing them with zero, and linear interpolation. We show that LSTM models are surprisingly resilient to missing data, with little impact on prediction accuracy for up to 40% of missing data, irrespective of the strategy used. For higher proportions of missing data, dropping null values leads to significant performance degradation, while zero-filling and interpolation maintain predictive accuracy. This paper provides insights into the choice of missing data handling strategies in time-series prediction tasks, demonstrating the potential of LSTM models for traffic forecasting under less-than-ideal data conditions
...
Due to the increasing popularity of various types of sensors in traffic management, it has become significantly easier to collect data on traffic flow. However, the integrity of these data sets is often compromised due to missing values resulting from sensor failures, communication errors, and other malfunctions. This study investigates the effect of missing data on the performance of Long Short-Term Memory (LSTM) models in traffic flow prediction and assesses strategies to handle these missing values. By actively removing values from a complete data set, three strategies to handle these missing values are evaluated: dropping null values, replacing them with zero, and linear interpolation. We show that LSTM models are surprisingly resilient to missing data, with little impact on prediction accuracy for up to 40% of missing data, irrespective of the strategy used. For higher proportions of missing data, dropping null values leads to significant performance degradation, while zero-filling and interpolation maintain predictive accuracy. This paper provides insights into the choice of missing data handling strategies in time-series prediction tasks, demonstrating the potential of LSTM models for traffic forecasting under less-than-ideal data conditions