Catalysts play an essential role in industry and for the general progress of mankind. With the parallel energy and technological transformations, it is important to create tools that aid in the development of better catalysts. To achieve this feat, it is firstly required to have
...
Catalysts play an essential role in industry and for the general progress of mankind. With the parallel energy and technological transformations, it is important to create tools that aid in the development of better catalysts. To achieve this feat, it is firstly required to have a fully automated approach for in silico structure generation. Thus, in this study the OBeLiX workflow has been developed. The designed package includes a scaffold generation tool, a substituent placement tool, GFNn-xTB optimization and conformer search tools completed by a fully automated descriptor calculator. Even though descriptor databases can be found in literature, their reproducibility is limited. Consequently, the ability to reconstruct proposed approaches for new chemical reactions is hindered. OBeLiX has been used to investigate a series of hydrogenation reactions catalyzed by rhodium phosphine complexes. The approach begins with the creation of a structure database for 192 such complexes. To simplify this process, it was opted to use a mechanistically relevant model catalyst structure. In the first step of the catalytic cycle, π-complexation occurs between the substrate and the metal center. Thus, a symmetric chelating norbornadiene molecule has been chosen to model the asymmetric substrates. The generated database of model catalysts has been featurized through OBeLiX. The use of model structures underlined that the substrates have to be quantified as well. While for the complex model catalysts a series of chemically descriptive features have been created, the substrates were converted to two-dimensional fingerprints, and Sterimol parameters that describe the 3D size of the substrate around the double bond that is to be hydrogenated. Therefore, featurization of the chemical reaction has been achieved. Training machine learning algorithms on these features, yielded high correlations including out-of-sample binary reactivity classification for substrates outside the training set.