Synthetic data generation for the optimization of strains in metabolic engineering using generative adversarial networks
M.W. Jarosz (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
P.H. van Lent – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Alan Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This research investigates the application of Generative Adversarial Networks (GANs) and probabilistic Principal Component Analysis (PPCA) in generating synthetic data for pathway optimization in metabolic engineering. The study aims to compare the performance of these generative models, addressing key questions regarding their utilization, the quality of generated data compared to experimental data, and overall efficiency. The dataset comprises 5000 parameter configurations of kinetic models that simulate a hypothetical pathway. Constructing kinetic models traditionally involves obtaining complex scientific knowledge, a process that may be alleviated through a data-driven approach. Results indicate that both models, tried with different sizes of latent space, demonstrate good performance in modeling the underlying latent space of the data. However, GANs with the right set of parameters exhibit a better performance, evidenced by lower KL divergence and superior visual structure in the generated data. The findings highlight the potential of GANs to outperform probabilistic PCA, offering valuable insights for more cost-effective and streamlined strain optimization in metabolic engineering. Overall, this research advocates for further investigation of GANs capabilities in metabolic engineering as a potentially powerful tool for synthetic data generation.