Protein Structure and Sequence Co-Design through Graph Based Generative Diffusion Modeling

Master Thesis (2024)
Author(s)

M.H. Bhuradia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

H Jamali-Rad – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Amelia Villegas Morcillo – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Marcel J. T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

J.W. Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
24-06-2024
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Artificial Intelligence']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Proteins are fundamental biological macromolecules essential for cellular structure, enzymatic catalysis, and immune defense, making the generation of novel proteins crucial for advancements in medicine, biotechnology, and material sciences. This study explores protein design using deep generative models, specifically Denoising Diffusion Probabilistic Models (DDPMs). While traditional methods often focus on either protein structure or sequence design independently, recent trends emphasize a co-design approach addressing both aspects simultaneously. We propose a novel methodology utilizing Equivariant Graph Neural Networks (EGNNs) within the diffusion framework to co-design protein structures and sequences. We modify the EGNN architecture to improve its effectiveness in learning intricate data patterns. Experimental results show that our approach effectively generates high-quality protein sequences, although challenges remain in producing plausible protein backbones and ensuring strong sequence-structure correlation.

Files

License info not available