Counting Empirical Cluster Sizes Of Identical COVID-19 Genetic Sequences

Master Thesis (2024)
Author(s)

S.B. van der Niet (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Komjáthy – Mentor (TU Delft - Applied Probability)

G Jongbloed – Graduation committee member (TU Delft - Statistics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
02-07-2024
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis aims to enhance existing models that infer parameters describing the spread of a virus by analyzing the distribution of empirical cluster sizes of identical genetic sequences. An approach that has gained recent popularity assumes that each individual cluster can be modeled as a Bienaymé-Galton-Watson process, with the distribution of empirical cluster sizes being equal to the law of the final size $\widetilde{Y}_\infty$ of the branching process. By employing the theory of general branching processes counted by characteristics, we demonstrate that the empirical cluster size distribution $C^\alpha$ stochastically dominates $\widetilde{Y}_\infty$ due to the exponential growth of the branching process. Under the assumption that the underlying branching tree follows either a Bienaymé-Galton-Watson process or an age-dependent process, we show that the mean of the empirical cluster size distribution can be used for a (strongly) consistent estimator for the probability of mutation $\nu$. For both branching models, we compute $P(C^\alpha=n)$ for $n=1,2$. We conjecture that $P(C^\alpha=n)$ is independent of the underlying model and that it can be expressed as a function of the mean of the offspring distribution $X$, and the probability mass function of $bin(X, 1-\nu)$. An extension of the model is considered where the probability of mutation is sampled from a distribution $\nu$ for each cluster. We show that under this assumption the empirical mean of the cluster sizes estimates the quantity $\int \nu^{-1}(r) dr$. We also show that the $\nu$ can still be estimated by the empirical mean of the cluster sizes, when the population is divided into a finite number of types with inhomogeneous offspring distributions.

Files

License info not available