Transition metal complexes as homogeneous catalysts enable high enantioselectivity in hydrogenation reactions, making them especially beneficial for the pharmaceutical industry. The development of data-driven prediction models enhances high-throughput catalyst design. However, th
...
Transition metal complexes as homogeneous catalysts enable high enantioselectivity in hydrogenation reactions, making them especially beneficial for the pharmaceutical industry. The development of data-driven prediction models enhances high-throughput catalyst design. However, these models often focus solely on static molecular representation, neglecting the dynamic behavior of the system, such as the formation of conformer ensembles. Currently, no method is available to systematically account for these conformational effects at reasonable costs. In light of this, the study aimed to develop a practical tool that allows predictive models to incorporate the dynamic characteristics of catalysts via conformer ensembles. A dataset of Rh-based precatalysts with mainly bidentate ligands was utilized. Three cheminformatic tools—RDKit, OpenBabel, and CREST—were explored for reliable, automated conformer ensemble generation. Among them, only CREST proved feasible, although it exhibited several limitations and required manual modification. A mapping between the conformer geometries obtained from GFN2-xTB and DFT calculations was achieved based on the relative energies and root mean square deviations. This revealed that many conformers generated by CREST converge into the same DFT local minimum. A classification method was developed to bridge the gap between conformers obtained from the two quantum chemical calculations by selecting a subset of conformers from the CREST ensemble that appear as distinct conformers in the DFT ensemble. This approach allows DFT calculations to be performed only on conformers that would result in different DFT minima on the potential energy surface, thereby eliminating redundant calculations and saving significant costs. This unsupervised DBSCAN clustering algorithm was applied to the GFN2-xTB energy and RMSD of the conformers, reducing the number of redundant conformers by 46% in the original dataset of Rh-based precatalyst structures.