Development of a Catalysis Analytics Platform

Enabling Machine Learning in Catalyst Discovery

More Info
expand_more

Abstract

Recently machine learning (ML) has become increasingly popular, and has been shown to be a powerful predictive technique. The applications of ML cover a wide range of disciplines, including the natural sciences. Presently, the field of catalysis is still relatively unexposed to ML and other data-driven techniques. This can largely be attributed to the broad variety and complexity of catalytic data, which obstructs data unification into large structured databases. This is problematic because ML requires large amounts of information rich data to ensure its effectiveness. Additionally, applying ML techniques requires expertise and often coding experience. These requisites impede the adoption of ML by catalysis researchers, that are not necessarily programming experts. In this thesis a data management and analytics platform is developed to reduce this inaccessibility barrier. Our platform is designed to guide catalysis researchers through the ML workflow, and construct effective ML models. Throughout this process the platform supports several key functionalities, which include interaction with a database instance, data visualization, ML model construction, and ML model application. Furthermore, the usefulness of our platform is demonstrated in a case study, where ML models are built to predict the catalytic performance based on molecular descriptors of the catalyst. Due to a lack of suitable existing catalytic datasets, we construct and use an artificial dataset that mimics kinetic catalytic data. Artificially constructing the data allows us full control of its underlying mechanisms. This aspect is used to build dataset variations where we study the effect of database size and descriptor strength on the performance of ML models. The results of the case study quantify the effects of database size and descriptor strength, which are useful in identifying the objectives for future construction of databases from real catalytic data.