T.O. Mokveld | TU Delft Repository

Evaluating the effectiveness of pre-operative diagnosis of ovarian cancer using minimally invasive liquid biopsies by combining serum human epididymis protein 4 and cell-free DNA in patients with an ovarian mass

Journal article (2024) - Duco H.K. Gaillard, Pien Lof, Erik A. Sistermans, Tom Mokveld, Hugo Mark Horlings, Constantijne H. Mom, Marcel J.T. Reinders, Frédéric Amant, Daan Van Den Broek, Lodewyk F.A. Wessels

Objective: To assess the feasibility of scalable, objective, and minimally invasive liquid biopsy-derived biomarkers such as cell-free DNA copy number profiles, human epididymis protein 4 (HE4), and cancer antigen 125 (CA125) for pre-operative risk assessment of early-stage ovarian cancer in a clinically representative and diagnostically challenging population and to compare the performance of these biomarkers with the Risk of Malignancy Index (RMI). Methods: In this case-control study, we included 100 patients with an ovarian mass clinically suspected to be early-stage ovarian cancer. Of these 100 patients, 50 were confirmed to have a malignant mass (cases) and 50 had a benign mass (controls). Using WisecondorX, an algorithm used extensively in non-invasive prenatal testing, we calculated the benign-calibrated copy number profile abnormality score. This score represents how different a sample is from benign controls based on copy number profiles. We combined this score with HE4 serum concentration to separate cases and controls. Results: Combining the benign-calibrated copy number profile abnormality score with HE4, we obtained a model with a significantly higher sensitivity (42% vs 0%; p<0.002) at 99% specificity as compared with the RMI that is currently employed in clinical practice. Investigating performance in subgroups, we observed especially large differences in the advanced stage and non-high-grade serous ovarian cancer groups. Conclusion: This study demonstrates that cell-free DNA can be successfully employed to perform pre-operative risk of malignancy assessment for ovarian masses; however, results warrant validation in a more extensive clinical study. ...

DNA comparisons in genomics

A reference-based perspective

Doctoral thesis (2023) - T.O. Mokveld, M.J.T. Reinders, Z. Al-Ars

Genomics is a field devoted to understanding the differences in genetics between populations, individuals, and even within individuals. By constantly comparing and contrasting data from diverse sources, genomics can refine our understanding of life and identify new ways to improve our lives. However, this often presents technical and biological challenges that require careful consideration of what is compared, in what context, and what might be present. In this thesis I contribute to resolving these challenges in three different domains:

In genomic data analysis, analysts often compare and contrast new genomic data to an established reference to reduce costs. However, this approach biases comparisons in favor of population-specific genetics since such references encode only a fraction of the genetics of a given population. To address this bias, I propose a method that accounts for population variability in a way that integrates it directly into the comparison process. This integration ensures that the contrast between sample and reference becomes smaller and closer to personalized, so they are treated the same way regardless of the underlying population. The method improves genome characterization and simplifies downstream analyses that rely on these comparisons. As a result, a more accurate portrayal of the genetics of a given population as a whole is obtained.

In non-invasive sequencing-based prenatal testing, we rely on circulating cell-free DNA from maternal plasma to detect pathogenic variants that may affect the fetus. A healthy baseline, which describes the normative state, is generally required to determine the presence of such variants. However, because this DNA is a mixture of maternal and much lower fetal proportions, it remains difficult to disentangle the two, primarily because of biological and technical biases. While this bias can partially be mitigated by changing the baseline and thus contrasting within the individual DNA mixture rather than to a divergent population of mixtures, further improvements are still needed. I present a generalized framework in which the signal-to-noise ratio can be further improved by fully exploiting the information in sequencing data, allowing for more robust predictions at even earlier stages of pregnancy.

The composition of the gut ecosystem can have short- and long-term effects on our health. It is therefore important to understand how it is formed and how a healthy balance can be maintained for as long as possible to preserve our health. To do this, ecosystems must be stratified and compared based on health indices. I show in extremely contrasting Dutch subpopulations that we can obtain valuable characteristics of divergent health states by comparing the gut ecosystems of centenarians with those of Alzheimer's patients. However, significant efforts are required to enable these comparisons due to the many organisms present and the technological limitations in measuring them, introducing bias at all levels. ...

Genomics is a field devoted to understanding the differences in genetics between populations, individuals, and even within individuals. By constantly comparing and contrasting data from diverse sources, genomics can refine our understanding of life and identify new ways to improve our lives. However, this often presents technical and biological challenges that require careful consideration of what is compared, in what context, and what might be present. In this thesis I contribute to resolving these challenges in three different domains:

In genomic data analysis, analysts often compare and contrast new genomic data to an established reference to reduce costs. However, this approach biases comparisons in favor of population-specific genetics since such references encode only a fraction of the genetics of a given population. To address this bias, I propose a method that accounts for population variability in a way that integrates it directly into the comparison process. This integration ensures that the contrast between sample and reference becomes smaller and closer to personalized, so they are treated the same way regardless of the underlying population. The method improves genome characterization and simplifies downstream analyses that rely on these comparisons. As a result, a more accurate portrayal of the genetics of a given population as a whole is obtained.

In non-invasive sequencing-based prenatal testing, we rely on circulating cell-free DNA from maternal plasma to detect pathogenic variants that may affect the fetus. A healthy baseline, which describes the normative state, is generally required to determine the presence of such variants. However, because this DNA is a mixture of maternal and much lower fetal proportions, it remains difficult to disentangle the two, primarily because of biological and technical biases. While this bias can partially be mitigated by changing the baseline and thus contrasting within the individual DNA mixture rather than to a divergent population of mixtures, further improvements are still needed. I present a generalized framework in which the signal-to-noise ratio can be further improved by fully exploiting the information in sequencing data, allowing for more robust predictions at even earlier stages of pregnancy.

The composition of the gut ecosystem can have short- and long-term effects on our health. It is therefore important to understand how it is formed and how a healthy balance can be maintained for as long as possible to preserve our health. To do this, ecosystems must be stratified and compared based on health indices. I show in extremely contrasting Dutch subpopulations that we can obtain valuable characteristics of divergent health states by comparing the gut ecosystems of centenarians with those of Alzheimer's patients. However, significant efforts are required to enable these comparisons due to the many organisms present and the technological limitations in measuring them, introducing bias at all levels.

A comprehensive performance analysis of sequence-based within-sample testing NIPT methods

Journal article (2023) - T.O. Mokveld, Z. Al-Ars, Erik A. Sistermans, M.J.T. Reinders

Background

Non-Invasive Prenatal Testing is often performed by utilizing read coverage-based profiles obtained from shallow whole genome sequencing to detect fetal copy number variations. Such screening typically operates on a discretized binned representation of the genome, where (ab)normality of bins of a set size is judged relative to a reference panel of healthy samples. In practice such approaches are too costly given that for each tested sample they require the resequencing of the reference panel to avoid technical bias. Within-sample testing methods utilize the observation that bins on one chromosome can be judged relative to the behavior of similarly behaving bins on other chromosomes, allowing the bins of a sample to be compared among themselves, avoiding technical bias.
Results

We present a comprehensive performance analysis of the within-sample testing method Wisecondor and its variants, using both experimental and simulated data. We introduced alterations to Wisecondor to explicitly address and exploit paired-end sequencing data. Wisecondor was found to yield the most stable results across different bin size scales while producing more robust calls by assigning higher Z-scores at all fetal fraction ranges.
Conclusions

Our findings show that the most recent available version of Wisecondor performs best.
...

WisecondorFF

Improved Fetal Aneuploidy Detection from Shallow WGS through Fragment Length Analysis

Journal article (2022) - Tom Mokveld, Zaid Al-Ars, Erik A. Sistermans, Marcel Reinders

In prenatal diagnostics, NIPT screening utilizing read coverage-based profiles obtained from shallow WGS data is routinely used to detect fetal CNVs. From this same data, fragment size distributions of fetal and maternal DNA fragments can be derived, which are known to be different, and often used to infer fetal fractions. We argue that the fragment size has the potential to aid in the detection of CNVs. By integrating, in parallel, fragment size and read coverage in a within-sample normalization approach, it is possible to construct a reference set encompassing both data types. This reference then allows the detection of CNVs within queried samples, utilizing both data sources. We present a new methodology, WisecondorFF, which improves sensitivity, while maintaining specificity, relative to existing approaches. WisecondorFF increases robustness of detected CNVs, and can reliably detect even at lower fetal fractions (<2%). ...

CHOP

Haplotype-aware path indexing in population graphs

Journal article (2020) - Tom Mokveld, Jasper Linthorst, Zaid Al-Ars, Henne Holstege, Marcel Reinders

The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project. ...