Transferability of descriptors for in silico catalyst screening
A. Najl Hossaini (TU Delft - Applied Sciences)
Evgeny A. Pidko – Mentor (TU Delft - ChemE/Inorganic Systems Engineering)
A.V. Kalikadien – Mentor (TU Delft - ChemE/Inorganic Systems Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Homogeneous transition metal-based (TM) catalysts are crucial to producing chemically pure drugs, stemming from their ability to obtain high product selectivity. However, experimental screening of TM-based complexes is expensive, so computational methods are leveraged instead. Especially machine learning (ML) approaches show promise due to being efficient as well as unbiased. ML of homogeneous TM-based catalysts is based on physiochemical properties named descriptors. Descriptors are dependent on the method of simulation and the simulated complex itself. Methods with a higher level of theory are more accurate, but also more resource intensive. Similarly, larger complexes simply demand more computational resources. Two general methods to minimize the number of resources needed are: 1) using the lowest level of theory containing reasonable accuracy and 2) using the simplest representative complex. In this thesis, possible simplifications were investigated for a homogeneous TM-based catalyst screening workflow. Objective 1 was investigating the effect of levels of theory for geometry optimization on descriptors. Structures were optimized for four levels of theory relevant to this workflow, namely: MACE, GFN-FF, GFN2-xTB, and DFT. Subsequently, xTB level descriptors were calculated for the first three levels of theory and were then correlated against xTB level descriptors of the benchmark, DFT. In addition, it was investigated how descriptors obtained from xTB and DFT single-point calculations differ. Objective 2 was investigating the effect of the chemical structure on descriptors. To do so, a set of octahedral complexes and a set of simplified structures were generated, and descriptors of both sets were correlated against each other. Regarding objective 1, it was observed that solely descriptors from the GFN2-xTB level of theory correlated well with DFT, at least for the majority of descriptors. Next to that, it was found that GFN2-xTB geometries more or less coincide with DFT geometries. Regarding objective 2, it was found that the bidentate ligands in the model set deform towards the metal centre, which leads to decreased correlations among the majority of the descriptors. Additionally, it was found that clustering occurred due to the presence of two different ligand classes in the dataset. The primary conclusion of this research was that geometries originating from GFN2-xTB geometry optimization are structurally comparable to geometries originating from DFT geometry optimization. However, descriptors obtained from GFN2-xTB single-point calculations are not comparable to descriptors obtained from DFT single-point calculations. As such, to accurately extract descriptors, DFT single-point calculations are necessitated.