Optimizing strains in Metabolic Engineering: comparative analysis of β-Conditional Variational Auto-encoder and Probabilistic PCA for synthetic data generation

More Info
expand_more

Abstract

This research explores the landscape of dataset generation through the lens of Probabilistic Principal Component Analysis (PPCA) and β-Conditional Variational Auto-encoder (β-CVAE) models. We conduct a comparative analysis of their respective capabilities in reproducing datasets that mirror the distribution of the original data that comes from a hypothetical pathway kinetic model based on an E.coli strain using varied parameter settings falling within a specified range. The requirement of significant prior investment in acquiring accurate details about the distinct mechanisms governing each reaction and its parameters for the construction of these kinetic models push us to find alternative ways to generate data that guide metabolic engineering processes. This paper tries to find a viable option through compression algorithms that reduce dimensionality. The PPCA model demonstrates commendable fidelity in capturing overarching patterns, though areas for refinement in reproducing specific data points are identified. In contrast, the β-CVAE model exhibits higher fidelity, precision, and consistency, positioning it as a robust choice for data generation tasks. This study was constrained by both time and the specificity of the model architectures and the dataset. These limitations underscore the imperative for continual exploration and refinement within the dynamic landscape of generative modeling. Opportunities could be found in the refinement of both VAE, CVAE and β-CVAE models utilizing varied hyperparameters alongside different architectures, to increase applicability across diverse datasets within the realm of metabolic engineering.