GA

G.J. Admiraal

info

Please Note

3 records found

Advancing protein design is crucial for breakthroughs in medicine and biotechnology. Traditional approaches for protein sequence representation often rely solely on the 20 canonical amino acids, limiting the representation of non-canonical amino acids and residues that undergo post-translational modifications. This work explores discrete diffusion models for generating novel protein sequences using the all-atom chemical representation SELFIES. By encoding the atomic composition of each amino acid in the protein, this approach expands the design possibilities beyond standard sequence representations. Using a modified ByteNet architecture within the discrete diffusion D3PM framework, we evaluate the impact of this all-atom representation on protein quality, diversity, and novelty, compared to conventional amino acid-based models. To this end, we develop a comprehensive assessment pipeline to determine whether generated SELFIES sequences translate into valid proteins containing both canonical and non-canonical amino acids. Additionally, we examine the influence of two noise schedules within the diffusion process—uniform (random replacement of tokens) and absorbing (progressive masking)—on generation performance. While models trained on the all-atom representation struggle to consistently generate fully valid proteins, the successfully generated proteins show improved novelty and diversity compared to their amino acid-based model counterparts. Furthermore, the all-atom representation achieves structural foldability results comparable to those of amino acid-based models. Lastly, our results highlight the absorbing noise schedule as the most effective for both representations. Data and code are available at https://github.com/Intelligent-molecular-systems/All-Atom-Protein-Sequence-Generation. ...
Advancing protein design is crucial for breakthroughs in medicine and biotechnology, yet traditional approaches often fall short by focusing solely on representing protein sequences using the 20 canonical amino acids. This thesis explores discrete diffusion models for generating novel protein sequences with an all-atom representation, specifically SELFIES a widely used molecular string representation. This all-atom approach considers the atomic composition of each amino acid in the protein. Enabling the inclusion of non-canonical amino acids and post-translational modifications. Using a modified ByteNet architecture and the D3PM framework, we compare the effects of this all-atom representation to the standard amino acid representation on the generated proteins' quality, diversity and novelty. Additionally, we see how a uniform or absorbing noise process affects the results. While models trained on the all-atom representation struggle to generate fully valid proteins consistently, those successfully designed showed improved novelty and diversity. Moreover, the all-atom representation can achieve comparable structural reliability results from OmegaFold to the amino acid models. Lastly, our results show that the use of an absorbing noise schedule is the most effective for both the all-atom and amino acid representation. ...
Bachelor thesis (2022) - G.J. Admiraal, G. Iosifidis, N. Mhaisen
The advent of wireless networks such as content distribution networks and edge computing networks calls for more effective online caching policies. Traditional policies lose performance since these new networks deal with highly non-stationary requests and frequent popularity shifts. Consequently, a new framework called Online Convex Optimization (OCO), which does not assume the request pattern, has recently been used to tackle the online caching problem. Besides, in many practical scenarios, a request prediction of unknown quality is available. This paper will leverage that and proposes a new online caching policy that uses these predictions. This policy will use the Optimistic Online Mirror Descent (OOMD) algorithm to solve the OCO problem. The policy will still obtain the same regret bound as its non-optimistic counterpart up to some constant even if the predictions are not accurate. The performance of the proposed policy is evaluated and compared with previous OCO-based policies with the use of trace-driven numerical tests. ...