Synthetic data generation for the optimization of strains in metabolic engineering using generative adversarial networks

Bachelor Thesis (2024)
Author(s)

M.W. Jarosz (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

P.H. van Lent – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Alan Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Marcin Jarosz
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Marcin Jarosz
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research investigates the application of Generative Adversarial Networks (GANs) and probabilistic Principal Component Analysis (PPCA) in generating synthetic data for pathway optimization in metabolic engineering. The study aims to compare the performance of these generative models, addressing key questions regarding their utilization, the quality of generated data compared to experimental data, and overall efficiency. The dataset comprises 5000 parameter configurations of kinetic models that simulate a hypothetical pathway. Constructing kinetic models traditionally involves obtaining complex scientific knowledge, a process that may be alleviated through a data-driven approach. Results indicate that both models, tried with different sizes of latent space, demonstrate good performance in modeling the underlying latent space of the data. However, GANs with the right set of parameters exhibit a better performance, evidenced by lower KL divergence and superior visual structure in the generated data. The findings highlight the potential of GANs to outperform probabilistic PCA, offering valuable insights for more cost-effective and streamlined strain optimization in metabolic engineering. Overall, this research advocates for further investigation of GANs capabilities in metabolic engineering as a potentially powerful tool for synthetic data generation.

Files

License info not available