Accurate and rapid prediction of pKaof transition metal complexes

semiempirical quantum chemistry with a data-augmented approach

Journal Article (2021)
Author(s)

V. Sinha (TU Delft - ChemE/Inorganic Systems Engineering)

Jochem J. Laan (Student TU Delft)

Evgeny A. Pidko (TU Delft - ChemE/Inorganic Systems Engineering, TU Delft - ChemE/Algemeen)

Research Group
ChemE/Inorganic Systems Engineering
Copyright
© 2021 V. Sinha, Jochem J. Laan, E.A. Pidko
DOI related publication
https://doi.org/10.1039/d0cp05281g
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 V. Sinha, Jochem J. Laan, E.A. Pidko
Research Group
ChemE/Inorganic Systems Engineering
Issue number
4
Volume number
23
Pages (from-to)
2557-2567
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rapid and accurate prediction of reactivity descriptors of transition metal (TM) complexes is a major challenge for contemporary quantum chemistry. The recently-developed GFN2-xTB method based on the density functional tight-binding theory (DFT-B) is suitable for high-throughput calculation of geometries and thermochemistry for TM complexes albeit with moderate accuracy. Herein we present a data-augmented approach to improve substantially the accuracy of the GFN2-xTB method for the prediction of thermochemical properties using pKavalues of TM hydrides as a representative model example. We constructed a comprehensive database forca.200 TM hydride complexes featuring the experimentally measured pKavalues as well as the GFN2-xTB-optimized geometries and various computed electronic and energetic descriptors. The GFN2-xTB results were further refined and validated by DFT calculations with the hybrid PBE0 functional. Our results show that although the GFN2-xTB performs well in most cases, it fails to adequately describe TM complexes featuring multicarbonyl and multihydride ligand environments. The dataset was analyzed with the ordinary least squares (OLS) fitting and was used to construct an automated machine learning (AutoML) approach for the rapid estimation of pKaof TM hydride complexes. The results obtained show a high predictive power of the very fast AutoML model (RMSE ∼ 2.7) comparable to that of the much slower DFT calculations (RMSE ∼ 3). The presented data-augmented quantum chemistry-based approach is promising for high-throughput computational screening workflows of homogeneous TM-based catalysts.