Optimizing strains in Metabolic Engineering: comparative analysis of β-Conditional Variational Auto-encoder and Probabilistic PCA for synthetic data generation

Bachelor Thesis (2024)
Author(s)

U.D. Kirbeyi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

P.H. van Lent – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Doruk Kirbeyi
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Doruk Kirbeyi
Graduation Date
02-02-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research explores the landscape of dataset generation through the lens of Probabilistic Principal Component Analysis (PPCA) and β-Conditional Variational Auto-encoder (β-CVAE) models. We conduct a comparative analysis of their respective capabilities in reproducing datasets that mirror the distribution of the original data that comes from a hypothetical pathway kinetic model based on an E.coli strain using varied parameter settings falling within a specified range. The requirement of significant prior investment in acquiring accurate details about the distinct mechanisms governing each reaction and its parameters for the construction of these kinetic models push us to find alternative ways to generate data that guide metabolic engineering processes. This paper tries to find a viable option through compression algorithms that reduce dimensionality. The PPCA model demonstrates commendable fidelity in capturing overarching patterns, though areas for refinement in reproducing specific data points are identified. In contrast, the β-CVAE model exhibits higher fidelity, precision, and consistency, positioning it as a robust choice for data generation tasks. This study was constrained by both time and the specificity of the model architectures and the dataset. These limitations underscore the imperative for continual exploration and refinement within the dynamic landscape of generative modeling. Opportunities could be found in the refinement of both VAE, CVAE and β-CVAE models utilizing varied hyperparameters alongside different architectures, to increase applicability across diverse datasets within the realm of metabolic engineering.

Files

License info not available