M.F. Theisen | TU Delft Repository

Teaching machine learning to programming novices

An action-oriented didactic concept

Conference paper (2024) - Michal Tkáč, Jakub Sieber, Radwa El Shawi, Anne Meyer, Lara Kuhlmann, Matthias Brueggenolte, Alexandru Rinciog, Michael Henke, Artur M. Schweidtmann, Qinghe Gao, Maximilian F. Theisen

Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. However, teaching ML is a daunting task. Aside from the methodological complexity of ML algorithms, both with respect to theory and implementation, the interdisciplinary and empirical nature of the field need to be taken into consideration. This paper introduces the MachineLearnAthon format, an innovative didactic concept designed to be inclusive for students of different disciplines with heterogeneous levels of mathematics, programming, and domain expertise. The format is grounded in a systematic literature review and the didactic principles action orientation, constructivism, and problem orientation. At the heart of the concept lie ML challenges, which make use of industrial data sets to solve real-world problems. Micro-lectures enable students to learn about ML concepts and algorithms, and associated risks. They cover the entire ML pipeline, promoting data literacy and practical skills, from data preparation, through deployment, to evaluation. ...

Digitization of chemical process flow diagrams using deep convolutional neural networks

Journal article (2023) - Maximilian F. Theisen, Kenji Nishizaki Flores, Lukas Schulze Balhorn, Artur M. Schweidtmann

Advances in deep convolutional neural networks led to breakthroughs in many computer vision applications. In chemical engineering, a number of tools have been developed for the digitization of Process and Instrumentation Diagrams. However, there is no framework for the digitization of process flow diagrams (PFDs). PFDs are difficult to digitize because of the large variability in the data, e.g., there are multiple ways to depict unit operations in PFDs. We propose a two-step framework for digitizing PFDs: (i) unit operations are detected using a deep learning powered object detection model, (ii) the connectivities between unit operations are detected using a pixel-based search algorithm. To ensure robustness, we collect and label over 1000 PFDs from diversified sources including various scientific journals and books. To cope with the high intra-class variability in the data, we define 47 distinct classes that account for different drawing styles of unit operations. Our algorithm delivers accurate and robust results on an independent test set. We report promising results for line and unit operation detection with an Average Precision at 50 percent (AP50) of 88% and an Average Precision (AP) of 68% for the detection of unit operations. ...

Sparse PCA Support Exploration of Process Structures for Decentralized Fault Detection

Journal article (2021) - M. Theisen, G. Dörgö, J. Abonyi, A. Palazoglu

With the ever-increasing use of sensor technologies in industrial processes and more data becoming available to engineers, the fault detection and isolation activities in the context of process monitoring have gained significant momentum in recent years. A statistical procedure frequently used in this domain is principal component analysis (PCA), which can reduce the dimensionality of large data sets without compromising the information content. While most process monitoring methods offer satisfactory detection capabilities, understanding the root cause of malfunctions and providing the physical basis for their occurrence have been challenging. The relatively new sparse PCA techniques represent a further development of the PCA in which not only the data dimension is reduced but also the data are made more interpretable, revealing clearer correlation structures among variables. Hence, taking a step forward from classical fault detection methods, in this work, a decentralized monitoring approach is proposed based on a sparse algorithm. The resulting control charts reveal the correlation structures associated with the monitored process and facilitate a structural analysis of the occurred faults. The applicability of the proposed method is demonstrated using data generated from the simulation of the benchmark vinyl acetate process. It is shown that the sparse principal components, as a foundation to a decentralized multivariate monitoring framework, can provide physical insight toward the origins of process faults. ...