T.H. Chow

Master thesis (1)

1 records found

Data, Representation, Models and Analysis: the four horsemen of machine learning for homogeneous catalysis

Master thesis (2025) - T.H. Chow (author) , E.A. Pidko (mentor) , A.V. Kalikadien (mentor)

Bidentate ligand-coordinated transition metal complexes are often used as homogeneous catalysts, as they have the ability to produce enantioselective compounds. These compounds are of high interest in the pharmaceutical and food industries. However, identifying high performing catalysts relies on trial-and-error approaches, which is time-consuming and costly. The use of data-driven predictive models could improve this process significantly by shifting most of the work from experimental work to computational work. Previous work from the group has attempted to develop such a predictive model using Machine Learning (ML), a representation of a manually generated static structure, and a database generated through High-Throughput Experimentation (HTE). However, these models faced challenges in terms of model performance and consistency between different substrates. This research aims to enhance these models by improving the representations used in ML to achieve more accurate predictions. To bring the representations closer to reality, both dynamic and new static approaches are tested, using conformer ensembles (CEs) generated by CREST. These structures were then used in DFT calculations to obtain accurate properties of these complexes. Additionally, new HTE data, which is closer to the complexes used in the simulation, was incorporated to improve training data for the ML models. The investigated reaction is the hydrogenation of norbornadiene (NBD) using Rh-NBD complexes. The performance of both classification and regression was compared across different representations: a cheap topological connectivity fingerprint (ECFP), semi-empirical DFT representations, and expensive fully DFT-optimized representations. The results conclude that none of the DFT-based representations outperforms the cheap topological fingerprint for this specific reaction. The study also highlights the importance of high-quality data in training the models. Ultimately, while the representation was improved, the much simpler topological method was the most effective for prediction of catalyst performance.