Mv
M.C. van Wijngaarden
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Large chemical reaction databases often suffer from incompleteness, such as missing molecules or stoichiometric information. Concurrently, numerous computational models are being developed in predictive chemistry that rely on reaction databases and would hugely benefit from complete reaction equations. Also, research in sustainable chemistry often focuses on automated mass balance tasks, which require a full reaction to properly evaluate. In this work, we present a hybrid approach for computational completion of reaction equations. Specifically, we combine a rule-based method and a machine learning (ML) model to complete reactions. The rule-based approach constructs a balance of atoms and charge on either side of the reaction in an attempt to find missing molecules. We tailor the pre-trained transformer model on the chemical language domain to take partial reactions as inputs and predict missing molecules. Furthermore, we present a novel approach to measure the correctness of our model, which is useful when we apply it to the uncurated dataset and the ground-truth is unknown.
...
Large chemical reaction databases often suffer from incompleteness, such as missing molecules or stoichiometric information. Concurrently, numerous computational models are being developed in predictive chemistry that rely on reaction databases and would hugely benefit from complete reaction equations. Also, research in sustainable chemistry often focuses on automated mass balance tasks, which require a full reaction to properly evaluate. In this work, we present a hybrid approach for computational completion of reaction equations. Specifically, we combine a rule-based method and a machine learning (ML) model to complete reactions. The rule-based approach constructs a balance of atoms and charge on either side of the reaction in an attempt to find missing molecules. We tailor the pre-trained transformer model on the chemical language domain to take partial reactions as inputs and predict missing molecules. Furthermore, we present a novel approach to measure the correctness of our model, which is useful when we apply it to the uncurated dataset and the ground-truth is unknown.
Bachelor thesis
(2020)
-
Roald van der Heijden, Matthijs van Wijngaarden, Wouter Zonneveld, Asterios Katsifodimos
CodeFeedr is a Mining Software Repository (MSR) tool designed to efficiently mine massive amounts of streaming data of projects from various sources using Flink’s streaming framework in combination with Kafka. Commissioned by researchers at TU Delft on the field of Data Science and Software Engineering, the goal of this project was to expand further on the product, as it already existed in a development stage. At the start of the project, CodeFeedr consisted of a core pipeline functionality and a limited amount of plugins which process data sources. CodeFeedr-1Up, as this development team calls itself, aimed to achieve two goals: the first goal is increasing the current amount of available plugins, defined by usable software repository sources, to be used by the client; the second goal is to implement a REPL functionality which requests user-friendly SQL-like queries and outputs the queried data stream. Maven, Cargo, NPM and ClearlyDefined have been developed and have extended the CodeFeedr tool. Furthermore, querying on the aforementioned data sources depending on their data structure is possible for sequential pipelines. With user aid and documentation in mind, logical data models of a plugin’s internal structure have been drawn and supplied in the report.
...
CodeFeedr is a Mining Software Repository (MSR) tool designed to efficiently mine massive amounts of streaming data of projects from various sources using Flink’s streaming framework in combination with Kafka. Commissioned by researchers at TU Delft on the field of Data Science and Software Engineering, the goal of this project was to expand further on the product, as it already existed in a development stage. At the start of the project, CodeFeedr consisted of a core pipeline functionality and a limited amount of plugins which process data sources. CodeFeedr-1Up, as this development team calls itself, aimed to achieve two goals: the first goal is increasing the current amount of available plugins, defined by usable software repository sources, to be used by the client; the second goal is to implement a REPL functionality which requests user-friendly SQL-like queries and outputs the queried data stream. Maven, Cargo, NPM and ClearlyDefined have been developed and have extended the CodeFeedr tool. Furthermore, querying on the aforementioned data sources depending on their data structure is possible for sequential pipelines. With user aid and documentation in mind, logical data models of a plugin’s internal structure have been drawn and supplied in the report.