Synthetic data generation for the optimization of strains in metabolic engineering using latent space representations derived from a Conditional Variational Autoencoder

None, None

Synthetic data generation for the optimization of strains in metabolic engineering using latent space representations derived from a Conditional Variational Autoencoder

Bachelor Thesis (2024)

Author(s)

N.M. Alwani (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.E.P.M.F. Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

P.H. van Lent – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Combinatorial Optimization Metabolic Flux Analysis Variational Autoencoder (VAE) Probabilistic analysis Principal Component Analysis (PCA)

To reference this document use:

https://resolver.tudelft.nl/uuid:0f0fbe65-257d-491d-9fe5-5c1b3864dfd4

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

02-02-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study investigates the application of generative models for synthetic data generation in pathway optimization experiments within the field of metabolic engineering. Conditional Variational Autoencoders (CVAEs) use neural networks and latent variable distributions to generate new, plausible data samples. We adapt this model by conditioning the training process on the target flux to acquire increased performance.

Additionally, a baseline model, namely Probabilistic Principal Component Analysis (PPCA), was selected for a comparative analysis to generate the underlying latent space to test the hypothesis that a type of Variational Autoencoder (VAE) can be used to learn a reduced-dimensional latent space for configurations of a kinetic pathway model. A dataset comprising 5000 hypothetical configurations of a kinetic pathway model was utilized to extract relationships between elements of a kinetic pathway.

The results indicate that PPCA can model the underlying distribution of the dataset when the latent space is large enough. However, the traditional CVAE might struggle to capture the underlying distribution, resulting in an entangled latent space. The study suggests that an implementation of $\beta$-CVAE could lead to a better balance between parts of the objective function during training, offering improved prospects for generating cost-efficient kinetic pathways for combinatorial pathway optimization experiments.

Files

RP_paper_Final_3_.pdf

(pdf | 0.937 Mb)

License info not available