Evgeny A. Pidko
Please Note
149 records found
1
ConforFormer
Representation for molecules through understanding of conformers
Molecular properties of chemical compounds are governed not by a single unique arrangement of atoms (2D molecular graph) but by ensembles of three-dimensional conformers, yet most molecular representations for machine learning approaches either ignore conformational diversity or use it implicitly to augment molecular graphs. Here we introduce ConforFormer, a geometry-first foundation model capable of learning conformation-robust molecular embeddings directly from the 3D atomic coordinates. By aligning representations across multiple conformers of the same molecules through a novel contrastive objective, ConforFormer produces compact, task-agnostic embeddings that can be generated once and directly applied to downstream tasks, including property prediction and structural similarity, without extensive fine-tuning. Across a range of quantum-chemical and bioactivity benchmarks, these frozen embeddings achieve competitive performance without task-specific fine-tuning, while offering improved stability on small datasets. Beyond property prediction, the learned embedding space allows to discriminate with high-precision molecular conformers and isomers, substantially outperforming classical fingerprint-based similarity measures. This implies that explicit exposure to conformational relationships induces representations that generalize beyond the conformer recognition task itself, capturing chemically meaningful structural constraints directly from 3D geometries. More broadly, our results suggest that incorporating conformation-awareness as a foundational learning task provides a fundamental route towards transferable, geometry-centered molecular representations particularly relevant for complex chemical systems, where conventional graph-based representations are ambiguous or ill-defined.
Machine Learning Interatomic Potentials (MLIPs) promise to transform computational catalysis by delivering near-density functional theory (DFT) accuracy at a fraction of the computational cost. Here, we evaluate the Universal Machine Learning Potential for Atoms (UMA) on two data sets of transition-metal complexes. UMA enables high-throughput evaluations in seconds per structure on consumer-grade GPUs. Analysis of per-ligand Spearman rank correlations (ρ > 0.6, p < 0.05) reveals variability in ranking reliability that is not captured by aggregate metrics such as R2 or RMSE. However, these inaccuracies are shown to mainly occur in the near-DFT accuracy regime where these complexes are practically indistinguishable. For square-planar Ni complexes, reliable rankings are obtained for 84% of ligands in rigid Ni–Cl2 complexes and drop to 53% for flexible asymmetric coordination environments, particularly only when conformers differ by <2 kJ/mol. Data set 2 shows a similar trend, with 61% and 44% reliability for Ru(II) and Mn(I) complexes, respectively, and, as expected, challenges for fluxional systems with small (<5 kJ/mol) relative energy gaps. These findings highlight the promise of MLIPs for both rigid, well-defined systems and highly flexible or fluxional catalysts, while underscoring the need to combine the speed of ML with validation and domain expertise to ensure robust and meaningful chemical insights.
Transesterification reactions are fundamental transformations in organic chemistry, yet performing them in aqueous media is challenging because of the competing hydrolysis reaction. In this study, we describe a mutant of alcohol oxidase from Phanerochaete chrysosporium (PcAOx-VPN) that also exhibits transesterification activity. Moreover, PcAOx-VPN displays no detectable hydrolytic activity, owing to its hydrophobic active site, which effectively excludes water. These characteristics make PcAOx-VPN a promising catalyst for transesterification reactions in aqueous media, a context that is typically compromised by competing hydrolysis.
Traditional heterogeneous catalysis is constrained by kinetic and thermodynamic limits, such as the Sabatier principle and reaction equilibrium. Dynamic and resonant catalysts hold promise to overcome these limitations by actively oscillating a catalyst’s physical or electronic structure at the time scale of the catalytic cycle, allowing programmable control over reaction pathways, and leading to improved rate and selectivity. External stimuli such as temperature swing, mechanical strain, electric charge, and light can perturb catalyst surfaces in different ways, altering adsorbate coverage, binding energies, and transition states beyond what steady-state catalysis allows. This work surveys the current state of dynamic catalysis, introduces the concept of “stimulando” characterization for observing transient dynamics, and outlines key modeling, mechanistic, and benchmarking strategies to advance the field toward improved chemical transformation.
From Plastic Waste to Pharmaceutical Precursors
PET Upcycling Through Ruthenium Catalyzed Semi-Hydrogenation
The Gutmann–Beckett method involves the reaction of a phosphine oxide with a Lewis acid, followed by measurement of the change in 31P NMR chemical shift (Δδ) relative to the free phosphine oxide. This is the most commonly used experimental method to assess Lewis acid strength in solution and on solid materials containing Lewis acid sites. This study describes the origin of the 31P NMR Δδ deshielding that occurs in triethylphosphine oxide (TEPO) adducts of Lewis acids. 57 Lewis acid adducts were studied using DFT methods. These models span typical three-, four-, and five-coordinate Lewis acids as well as models that approximate the coordination sphere of Lewis acid sites proposed to be present in heterogeneous materials. When a TEPO···Lewis acid adduct forms, electron density from the oxygen is transferred to the Lewis acid, which reduces the negative hyperconjugation from the oxygen to the σ*P–C that weakens the P═O bond. Experimental and DFT studies show that the 31P NMR chemical shift deshields in TEPO···Lewis adducts because the most shielded δ33 component of the chemical shift tensor shifts dramatically downfield. This deshielding is correlated with the weakening of the P═O bond. Natural chemical shift (NCS) analysis shows that δ33 deshielding in Lewis acid adducts is due to coupling of the filled σP–C with the empty π*P═O, the LUMO of the TEPO fragment. This study connects the 31P NMR chemical shift, in particular the experimentally observable Δδ33, to P═O bond weakening. Thus, the Gutmann–Beckett method does not provide information on adduct formation energy, the more typically sought measure of Lewis acidity, but rather provides a different thermodynamic descriptor of Lewis acid strength in the weakening of the P═O bond.
Conversion of Polypropylene to Light Olefins by HMFI Catalysts Below Pyrolytic Temperature
Catalytic, Spectroscopic, and Theoretical Studies
Catalysis Science & Technology, Evgeny Pidko and Núria López would like to acknowledge Weixue Li for their contributions to the Digital Catalysis themed collection as a Guest Editor.
Transition-metal complexes serve as highly enantioselective homogeneous catalysts for various transformations, making them valuable in the pharmaceutical industry. Data-driven prediction models can accelerate high-throughput catalyst design but require computer-readable representations that account for conformational flexibility. This is typically achieved through high-level conformer searches, followed by DFT optimization of the transition-metal complexes. However, conformer selection remains reliant on human assumptions, with no cost-efficient and generalizable workflow available. To address this, we introduce an automated approach to correlate CREST(GFN2-xTB//GFN-FF)-generated conformer ensembles with their DFT-optimized counterparts for systematic conformer selection. We analyzed 24 precatalyst structures, performing CREST conformer searches, followed by full DFT optimization. Three filtering methods were evaluated: (i) geometric ligand descriptors, (ii) PCA-based selection, and (iii) DBSCAN clustering using RMSD and energy. The proposed methods were validated on Rh-based catalysts featuring bisphosphine ligands, which are widely employed in hydrogenation reactions. To assess general applicability, both the precatalyst and its corresponding acrylate-bound complex were analyzed. Our results confirm that CREST overestimates ligand flexibility, and energy-based filtering is ineffective. PCA-based selection failed to distinguish conformers by DFT energy, while RMSD-based filtering improved selection but lacked tunability. DBSCAN clustering provided the most effective approach, eliminating redundancies while preserving key configurations. This method remained robust across data sets and is computationally efficient without requiring molecular descriptor calculations. These findings highlight the limitations of energy-based filtering and the advantages of structure-based approaches for conformer selection. While DBSCAN clustering is a practical solution, its parameters remain system-dependent. For high-accuracy applications, refined energy calculations may be necessary; however, DBSCAN-based clustering offers a computationally accessible strategy for rapid catalyst representations involving conformational flexibility.
This chapter provides a brief introduction to computational chemistry in the context of zeolite research, emphasizing the capabilities and limitations of modern theoretical models for investigating their reactivity and chemical properties under operando conditions. A brief overview of the computational chemistry toolbox is given, followed by a discussion of state-of-the-art applications in zeolite chemistry and catalysis. This chapter also highlights the increasing impact of data-driven techniques, such as machine-learning potentials, in advancing computational methods in zeolite studies.
Chemical recycling of polyolefins represented by polyethylene (PE) and polypropylene (PP) via catalytic cracking has emerged as a promising strategy for converting waste plastics into valuable hydrocarbons. In this study, we investigated the selective hydrocracking of PP into light alkanes (C1–C5) using zeolite catalysts at 280 °C under 1 MPa H2. An HMFI zeolite with high Al content exhibited the best catalytic performance among various zeolite catalysts tested. In situ DRIFTS comparing bare HMFI and externally-silylated HMFI suggested that the external surface Brønsted acid sites serve as the active sites for the cracking of PP. Combination of in situ DRIFTS and UV–vis spectroscopy analyses identified the formation and consumption of oligomeric species as a reaction intermediate during reaction. Density functional theory (DFT) calculations suggested that a route in which the carbocation and alkoxide intermediates generated by hydrocracking of PP undergo low-energy barrier transformations into gaseous products such as C3 and C4 hydrocarbons. This study advances the development of sustainable polyolefin recycling technologies.
Enantioselective hydrogenation of olefins by Rh-based chiral catalysts has been extensively studied for more than 50 years. Naively, one would expect that everything about this transformation is known and that selecting a catalyst that induces the desired reactivity or selectivity is a trivial task. Nonetheless, ligand engineering or selection for any new prochiral olefin remains an empirical trial-error exercise. In this study, we investigated whether machine learning techniques could be used to accelerate the identification of the most efficient chiral ligand. For this purpose, we used high throughput experimentation to build a large dataset consisting of results for Rh-catalyzed asymmetric olefin hydrogenation, specially designed for applications in machine learning. We showcased its alignment with existing literature while addressing observed discrepancies. Additionally, a computational framework for the automated and reproducible quantum-chemistry based featurization of catalyst structures was created. Together with less computationally demanding representations, these descriptors were fed into our machine learning pipeline for both out-of-domain and in-domain prediction tasks of selectivity and reactivity. For out-of-domain purposes, our models provided limited efficacy. It was found that even the most expensive descriptors do not impart significant meaning to the model predictions. The in-domain application, while partly successful for predictions of conversion, emphasizes the need for evaluating the cost-benefit ratio of computationally intensive descriptors and for tailored descriptor design. Challenges persist in predicting enantioselectivity, calling for caution in interpreting results from small datasets. Our insights underscore the importance of dataset diversity with broad substrate inclusion and suggest that mechanistic considerations could improve the accuracy of statistical models.
In the past decade, computational tools have become integral to catalyst design. They continue to offer significant support to experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning have garnered considerable attention for their expansive capabilities. This Perspective provides an overview of diverse initiatives in the realm of computational catalyst design and introduces our automated tools tailored for high-throughput in silico exploration of the chemical space. While valuable insights are gained through methods for high-throughput in silico exploration and analysis of chemical space, their degree of automation and modularity are key. We argue that the integration of data-driven, automated and modular workflows is key to enhancing homogeneous catalyst design on an unprecedented scale, contributing to the advancement of catalysis research.