Synthetic data generation for the optimization of strains in metabolic engineering using generative adversarial networks

More Info
expand_more

Abstract

This research investigates the application of Generative Adversarial Networks (GANs) and probabilistic Principal Component Analysis (PPCA) in generating synthetic data for pathway optimization in metabolic engineering. The study aims to compare the performance of these generative models, addressing key questions regarding their utilization, the quality of generated data compared to experimental data, and overall efficiency. The dataset comprises 5000 parameter configurations of kinetic models that simulate a hypothetical pathway. Constructing kinetic models traditionally involves obtaining complex scientific knowledge, a process that may be alleviated through a data-driven approach. Results indicate that both models, tried with different sizes of latent space, demonstrate good performance in modeling the underlying latent space of the data. However, GANs with the right set of parameters exhibit a better performance, evidenced by lower KL divergence and superior visual structure in the generated data. The findings highlight the potential of GANs to outperform probabilistic PCA, offering valuable insights for more cost-effective and streamlined strain optimization in metabolic engineering. Overall, this research advocates for further investigation of GANs capabilities in metabolic engineering as a potentially powerful tool for synthetic data generation.