Counting Empirical Cluster Sizes Of Identical COVID-19 Genetic Sequences

None, None

Counting Empirical Cluster Sizes Of Identical COVID-19 Genetic Sequences

Master Thesis (2024)

Author(s)

S.B. van der Niet (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Komjáthy – Mentor (TU Delft - Applied Probability)

G Jongbloed – Graduation committee member (TU Delft - Statistics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Branching processes Malthusian parameter Infinite alleles model Reproduction number Genetic clusters

To reference this document use:

https://resolver.tudelft.nl/uuid:1fd53459-97cd-46fd-86fc-25a1c6a240fc

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

02-07-2024

Awarding Institution

Delft University of Technology

Programme

['Applied Mathematics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis aims to enhance existing models that infer parameters describing the spread of a virus by analyzing the distribution of empirical cluster sizes of identical genetic sequences. An approach that has gained recent popularity assumes that each individual cluster can be modeled as a Bienaymé-Galton-Watson process, with the distribution of empirical cluster sizes being equal to the law of the final size $\widetilde{Y}_\infty$ of the branching process. By employing the theory of general branching processes counted by characteristics, we demonstrate that the empirical cluster size distribution $C^\alpha$ stochastically dominates $\widetilde{Y}_\infty$ due to the exponential growth of the branching process. Under the assumption that the underlying branching tree follows either a Bienaymé-Galton-Watson process or an age-dependent process, we show that the mean of the empirical cluster size distribution can be used for a (strongly) consistent estimator for the probability of mutation $\nu$. For both branching models, we compute $P(C^\alpha=n)$ for $n=1,2$. We conjecture that $P(C^\alpha=n)$ is independent of the underlying model and that it can be expressed as a function of the mean of the offspring distribution $X$, and the probability mass function of $bin(X, 1-\nu)$. An extension of the model is considered where the probability of mutation is sampled from a distribution $\nu$ for each cluster. We show that under this assumption the empirical mean of the cluster sizes estimates the quantity $\int \nu^{-1}(r) dr$. We also show that the $\nu$ can still be estimated by the empirical mean of the cluster sizes, when the population is divided into a finite number of types with inhomogeneous offspring distributions.

Files

Master_thesis_Sjoerd_van_der_N... (pdf)

(pdf | 0.517 Mb)

License info not available