AK

A.V. Kalikadien

info

Please Note

8 records found

Bridging Chemistry and Machine Learning for Homogeneous Catalyst Design

Doctoral thesis (2026) - A.V. Kalikadien, Evgeny A. Pidko, B. Dam
Catalysis lies at the heart of modern society: from producing fuels and fertilizers to manufacturing pharmaceuticals and materials, it enables the chemical transformations that sustain our daily lives. Among the different forms of catalysis, homogeneous catalysis, where well-define molecular complexes drive the production of molecular products, plays a central role in both fundamental research and industrial applications. Yet, the discovery and optimization of catalysts remain resource-intensive, relying heavily on serendipity. The design of transition-metal based homogeneous catalysts remains a central challenge in modern chemistry. While recent advances in artificial intelligence have demonstrated transformative potential across domains such as natural language processing and image generation, their application to molecular design and catalysis has proven more limited. This dissertation explores the integration of high-throughput experimentation, computational chemistry, automation, and machine learning for in silico methodologies aimed at rational design of transition-metal based catalysts. Across eight Chapters, key challenges are addressed in the generation of descriptors, digital representations for machine learning, conformational and configurational flexiblity of ligands and practical examples of machine learning modeling in data-driven catalysis...... ...
Journal article (2026) - Adarsh V. Kalikadien, Evgeny A. Pidko
Machine Learning Interatomic Potentials (MLIPs) promise to transform computational catalysis by delivering near-density functional theory (DFT) accuracy at a fraction of the computational cost. Here, we evaluate the Universal Machine Learning Potential for Atoms (UMA) on two data sets of transition-metal complexes. UMA enables high-throughput evaluations in seconds per structure on consumer-grade GPUs. Analysis of per-ligand Spearman rank correlations (ρ > 0.6, p < 0.05) reveals variability in ranking reliability that is not captured by aggregate metrics such as R2 or RMSE. However, these inaccuracies are shown to mainly occur in the near-DFT accuracy regime where these complexes are practically indistinguishable. For square-planar Ni complexes, reliable rankings are obtained for 84% of ligands in rigid Ni–Cl2 complexes and drop to 53% for flexible asymmetric coordination environments, particularly only when conformers differ by <2 kJ/mol. Data set 2 shows a similar trend, with 61% and 44% reliability for Ru(II) and Mn(I) complexes, respectively, and, as expected, challenges for fluxional systems with small (<5 kJ/mol) relative energy gaps. These findings highlight the promise of MLIPs for both rigid, well-defined systems and highly flexible or fluxional catalysts, while underscoring the need to combine the speed of ML with validation and domain expertise to ensure robust and meaningful chemical insights. ...
Transition-metal complexes serve as highly enantioselective homogeneous catalysts for various transformations, making them valuable in the pharmaceutical industry. Data-driven prediction models can accelerate high-throughput catalyst design but require computer-readable representations that account for conformational flexibility. This is typically achieved through high-level conformer searches, followed by DFT optimization of the transition-metal complexes. However, conformer selection remains reliant on human assumptions, with no cost-efficient and generalizable workflow available. To address this, we introduce an automated approach to correlate CREST(GFN2-xTB//GFN-FF)-generated conformer ensembles with their DFT-optimized counterparts for systematic conformer selection. We analyzed 24 precatalyst structures, performing CREST conformer searches, followed by full DFT optimization. Three filtering methods were evaluated: (i) geometric ligand descriptors, (ii) PCA-based selection, and (iii) DBSCAN clustering using RMSD and energy. The proposed methods were validated on Rh-based catalysts featuring bisphosphine ligands, which are widely employed in hydrogenation reactions. To assess general applicability, both the precatalyst and its corresponding acrylate-bound complex were analyzed. Our results confirm that CREST overestimates ligand flexibility, and energy-based filtering is ineffective. PCA-based selection failed to distinguish conformers by DFT energy, while RMSD-based filtering improved selection but lacked tunability. DBSCAN clustering provided the most effective approach, eliminating redundancies while preserving key configurations. This method remained robust across data sets and is computationally efficient without requiring molecular descriptor calculations. These findings highlight the limitations of energy-based filtering and the advantages of structure-based approaches for conformer selection. While DBSCAN clustering is a practical solution, its parameters remain system-dependent. For high-accuracy applications, refined energy calculations may be necessary; however, DBSCAN-based clustering offers a computationally accessible strategy for rapid catalyst representations involving conformational flexibility. ...
Journal article (2025) - A.V. Kalikadien, N.J. van der Lem, Cecile Valsecchi, Laurent Lefort, E.A. Pidko
Computational exploration of chemical space is a powerful tool for designing organometallic homogeneous catalysts. While catalytic properties depend on ligand properties and spatial arrangement, the role of stereoisomerism in defining catalyst selectivity and reactivity has only been elucidated sporadically, leaving gaps in virtual screening workflows. This study investigates the necessity of exploring ligand configurations for virtual high-throughput (HT) screening of octahedral transition metal complexes. Using automated workflows, ligand configuration ensembles were generated for bisphosphine ligands with Ir(III), Ru(II), and Mn(I) metal centers. DFT calculations revealed distinct preferences for Ir(III) configurations, whereas Mn(I)- and Ru(II)-complexes displayed significant fluxionality, with multiple configurations within a 10 kJ mol−1 energy range. Linear regression analyses showed that global descriptors, such as bite angle and HOMO–LUMO gap, are transferable across configurations and metal centers, while local steric descriptors lacked such transferability. Machine learning (ML) models successfully classified ligand configurations (balanced accuracy >0.8) but struggled to predict stability across metal centers, especially for Mn(I) and Ru(II). Thus, improved descriptors of the first coordination sphere to capture fluxionality and stability more effectively can improve ML models. Overall, this study underscores the limitations of ignoring stereoisomerism in virtual HT screening, which may lead to incomplete exploration of chemical space and underrepresentation of key catalyst features. Until dynamic digital representations are developed, exhaustive stereoisomerism exploration should be implemented for screening workflows. ...
Journal article (2024) - Adarsh V. Kalikadien, Cecile Valsecchi, Robbert van Putten, Tor Maes, Mikko Muuronen, Natalia Dyubankova, Laurent Lefort, Evgeny A. Pidko
Enantioselective hydrogenation of olefins by Rh-based chiral catalysts has been extensively studied for more than 50 years. Naively, one would expect that everything about this transformation is known and that selecting a catalyst that induces the desired reactivity or selectivity is a trivial task. Nonetheless, ligand engineering or selection for any new prochiral olefin remains an empirical trial-error exercise. In this study, we investigated whether machine learning techniques could be used to accelerate the identification of the most efficient chiral ligand. For this purpose, we used high throughput experimentation to build a large dataset consisting of results for Rh-catalyzed asymmetric olefin hydrogenation, specially designed for applications in machine learning. We showcased its alignment with existing literature while addressing observed discrepancies. Additionally, a computational framework for the automated and reproducible quantum-chemistry based featurization of catalyst structures was created. Together with less computationally demanding representations, these descriptors were fed into our machine learning pipeline for both out-of-domain and in-domain prediction tasks of selectivity and reactivity. For out-of-domain purposes, our models provided limited efficacy. It was found that even the most expensive descriptors do not impart significant meaning to the model predictions. The in-domain application, while partly successful for predictions of conversion, emphasizes the need for evaluating the cost-benefit ratio of computationally intensive descriptors and for tailored descriptor design. Challenges persist in predicting enantioselectivity, calling for caution in interpreting results from small datasets. Our insights underscore the importance of dataset diversity with broad substrate inclusion and suggest that mechanistic considerations could improve the accuracy of statistical models. ...
Journal article (2024) - Margareth S. Baidun, Adarsh V. Kalikadien, Laurent Lefort, Evgeny A. Pidko
Data-driven catalyst design is a promising approach for addressing the challenges in identifying suitable catalysts for synthetic transformations. Models with descriptor calculations relying solely on the precatalyst structure are potentially generalizable but may overlook catalyst-substrate interactions. This study explores substrate-specific interactions in the context of Rh-catalyzed asymmetric hydrogenation to elucidate the impact of substrate inclusion on the catalyst structure and on the descriptors derived from it. We compare a catalyst-substrate complex with methyl 2-acetamidoacrylate as a model substrate with the generic precatalyst structure involving a placeholder substrate, norbornadiene, across 11 Rh-based catalysts with bidentate bisphosphine ligands. For these systems, a full conformer ensemble analysis reveals an intriguing finding: the rigid substrate induces conformational freedom in the ligand. This flexibility gives rise to a more diverse conformer landscape, showing a previously overlooked aspect of catalyst-substrate dynamics. Electronic descriptor variations particularly highlight differences between substrate-specific and precatalyst structures. This study suggests that generic precatalyst-like models may lack crucial insights into the conformational freedom of the catalyst. We speculate that such conformational freedom may be a more general phenomenon that can influence the development of generalizable predictive models of computational TM-based catalysis. ...
In the past decade, computational tools have become integral to catalyst design. They continue to offer significant support to experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning have garnered considerable attention for their expansive capabilities. This Perspective provides an overview of diverse initiatives in the realm of computational catalyst design and introduces our automated tools tailored for high-throughput in silico exploration of the chemical space. While valuable insights are gained through methods for high-throughput in silico exploration and analysis of chemical space, their degree of automation and modularity are key. We argue that the integration of data-driven, automated and modular workflows is key to enhancing homogeneous catalyst design on an unprecedented scale, contributing to the advancement of catalysis research. ...

Exploration of chemical space by automated functionalization of molecular scaffold

Journal article (2022) - A.V. Kalikadien, E.A. Pidko, V. Sinha
Exploration of the local chemical space of molecular scaffolds by post-functionalization (PF) is a promising route to discover novel molecules with desired structure and function. PF with rationally chosen substituents based on known electronic and steric properties is a commonly used experimental and computational strategy in screening, design and optimization of catalytic scaffolds. Automated generation of reasonably accurate geometric representations of post-functionalized molecular scaffolds is highly desirable for data-driven applications. However, automated PF of transition metal (TM) complexes remains challenging. In this work a Python-based workflow, ChemSpaX, that is aimed at automating the PF of a given molecular scaffold with special emphasis on TM complexes, is introduced. In three representative applications of ChemSpaX by comparing with DFT and DFT-B calculations, we show that the generated structures have a reasonable quality for use in computational screening applications. Furthermore, we show that ChemSpaX generated geometries can be used in machine learning applications to accurately predict DFT computed HOMO–LUMO gaps for transition metal complexes. ChemSpaX is open-source and aims to bolster and democratize the efforts of the scientific community towards data-driven chemical discovery. ...