Optimizing strains in Metabolic Engineering: comparative analysis of β-Conditional Variational Auto-encoder and Probabilistic PCA for synthetic data generation

None, None

Optimizing strains in Metabolic Engineering: comparative analysis of β-Conditional Variational Auto-encoder and Probabilistic PCA for synthetic data generation

Bachelor Thesis (2024)

Author(s)

U.D. Kirbeyi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Thomas Abeel – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.H. van Lent – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Hanjalic – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Synthetic Data Generation Generative Models Β-Conditional-VAE Metabolic Engineering Strain Optimization

To reference this document use

https://resolver.tudelft.nl/uuid:7f9959a8-2290-4258-b82c-0b80d373a2f7

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

02-02-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

312

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research explores the landscape of dataset generation through the lens of Probabilistic Principal Component Analysis (PPCA) and β-Conditional Variational Auto-encoder (β-CVAE) models. We conduct a comparative analysis of their respective capabilities in reproducing datasets that mirror the distribution of the original data that comes from a hypothetical pathway kinetic model based on an E.coli strain using varied parameter settings falling within a specified range. The requirement of significant prior investment in acquiring accurate details about the distinct mechanisms governing each reaction and its parameters for the construction of these kinetic models push us to find alternative ways to generate data that guide metabolic engineering processes. This paper tries to find a viable option through compression algorithms that reduce dimensionality. The PPCA model demonstrates commendable fidelity in capturing overarching patterns, though areas for refinement in reproducing specific data points are identified. In contrast, the β-CVAE model exhibits higher fidelity, precision, and consistency, positioning it as a robust choice for data generation tasks. This study was constrained by both time and the specificity of the model architectures and the dataset. These limitations underscore the imperative for continual exploration and refinement within the dynamic landscape of generative modeling. Opportunities could be found in the refinement of both VAE, CVAE and β-CVAE models utilizing varied hyperparameters alongside different architectures, to increase applicability across diverse datasets within the realm of metabolic engineering.

Files

Ugur_Doruk_Kirbeyi_CSE3000_Fin... (pdf)

(pdf | 0.644 Mb)

License info not available