Counting Empirical Cluster Sizes Of Identical COVID-19 Genetic Sequences
S.B. van der Niet (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J. Komjáthy – Mentor (TU Delft - Applied Probability)
G Jongbloed – Graduation committee member (TU Delft - Statistics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis aims to enhance existing models that infer parameters describing the spread of a virus by analyzing the distribution of empirical cluster sizes of identical genetic sequences. An approach that has gained recent popularity assumes that each individual cluster can be modeled as a Bienaymé-Galton-Watson process, with the distribution of empirical cluster sizes being equal to the law of the final size $\widetilde{Y}_\infty$ of the branching process. By employing the theory of general branching processes counted by characteristics, we demonstrate that the empirical cluster size distribution $C^\alpha$ stochastically dominates $\widetilde{Y}_\infty$ due to the exponential growth of the branching process. Under the assumption that the underlying branching tree follows either a Bienaymé-Galton-Watson process or an age-dependent process, we show that the mean of the empirical cluster size distribution can be used for a (strongly) consistent estimator for the probability of mutation $\nu$. For both branching models, we compute $P(C^\alpha=n)$ for $n=1,2$. We conjecture that $P(C^\alpha=n)$ is independent of the underlying model and that it can be expressed as a function of the mean of the offspring distribution $X$, and the probability mass function of $bin(X, 1-\nu)$. An extension of the model is considered where the probability of mutation is sampled from a distribution $\nu$ for each cluster. We show that under this assumption the empirical mean of the cluster sizes estimates the quantity $\int \nu^{-1}(r) dr$. We also show that the $\nu$ can still be estimated by the empirical mean of the cluster sizes, when the population is divided into a finite number of types with inhomogeneous offspring distributions.