M. Lourenço Baptista | TU Delft Repository

Deep Reinforcement Learning for Facilitating Human-Robot Interaction in Manufacturing

Book chapter (2025) - Nathan Eskue, Marcia L. Baptista

The ability for humans to work in close contact with robots in a manufacturing environment has been limited due to safetySafety in manufacturing concerns and the robot’s inability to sense, react, and coordinate with a human without explicit, rigid programming. However, advances in Deep Reinforcement Learning (DRL) have shown considerable promise in developing processes that allow robots to work in a dynamic environment, solving problems and adapting to the actions and communication from human counterparts. This chapter explores the current state of the art for Human Robot Interaction (HRI), discussing the tools, algorithms, and methods being explored. Representative use cases are discussed to better understand what can be accomplished in today’s manufacturing environment and what challenges could be faced. The concerns around safetySafety in manufacturing, ethics, and unintended consequences are identified. Finally, the chapter looks ahead at the obstacles that still need to be overcome before HRI can be fully scalable and widely used. ...

Counterfactual explanations for remaining useful life estimation within a Bayesian framework

Journal article (2025) - Jilles Andringa, Marcia L. Baptista, Bruno F. Santos

Machine learning has contributed to the advancement of maintenance in many industries, including aviation. In recent years, many neural network models have been proposed to address the problems of failure identification and estimating the remaining useful life (RUL). Nevertheless, the black-box nature of neural networks often limits their transparency and interpretability. Interpretability (or explainability) in maintenance refers to the ability of a predictive model to provide insights into its decision-making process for predicting failures or estimating metrics like RUL. Counterfactual Explanations (CFEs) from Explainable AI (XAI) addresses this problem by explaining model decisions through hypothetical scenarios leading to alternative outcomes. A kind of neural network that could benefit from increased interpretability is Bayesian networks. In general, Bayesian models improve interpretability by quantifying uncertainty. However, incorporating Bayesian uncertainty into neural networks adds complexity because we often need a statistical distribution for each network parameter. This study investigates the use of CFEs within a Bayesian framework to achieve two key objectives simultaneously: (1) enhance the interpretability of RUL estimations and (2) improve model accuracy. We generate two types of CFEs: (1) RUL CFEs that increase/decrease the RUL estimation and (2) uncertainty CFEs with reduced estimation uncertainty, which we use to augment the dataset and increase model accuracy. We apply this method to a classical case study, the C-MAPSS dataset, using a Bayesian Long Short-Term Memory (B-LSTM) model. We demonstrate that CFEs can help identify critical features and fine-tune corrective actions to achieve specific outcomes. For example, following a maintenance action that increased the temperature by 1°F, CFEs can reveal that this adjustment extended the equipment's useful life by 30 cycles. This ability to correlate specific actions with effects enhances both decision-making and maintenance efficiency. Additionally, our data augmentation approach results in a 5% improvement in α−λ accuracy for a strict α of 20%. The root mean square error (RMSE) of the B-LSTM model decreases from 9.56 to 8.47 cycles, demonstrating the potential of Uncertainty CFEs to improve accuracy in aircraft maintenance. The code is publicly available at Github. ...

Machine learning has contributed to the advancement of maintenance in many industries, including aviation. In recent years, many neural network models have been proposed to address the problems of failure identification and estimating the remaining useful life (RUL). Nevertheless, the black-box nature of neural networks often limits their transparency and interpretability. Interpretability (or explainability) in maintenance refers to the ability of a predictive model to provide insights into its decision-making process for predicting failures or estimating metrics like RUL. Counterfactual Explanations (CFEs) from Explainable AI (XAI) addresses this problem by explaining model decisions through hypothetical scenarios leading to alternative outcomes. A kind of neural network that could benefit from increased interpretability is Bayesian networks. In general, Bayesian models improve interpretability by quantifying uncertainty. However, incorporating Bayesian uncertainty into neural networks adds complexity because we often need a statistical distribution for each network parameter. This study investigates the use of CFEs within a Bayesian framework to achieve two key objectives simultaneously: (1) enhance the interpretability of RUL estimations and (2) improve model accuracy. We generate two types of CFEs: (1) RUL CFEs that increase/decrease the RUL estimation and (2) uncertainty CFEs with reduced estimation uncertainty, which we use to augment the dataset and increase model accuracy. We apply this method to a classical case study, the C-MAPSS dataset, using a Bayesian Long Short-Term Memory (B-LSTM) model. We demonstrate that CFEs can help identify critical features and fine-tune corrective actions to achieve specific outcomes. For example, following a maintenance action that increased the temperature by 1°F, CFEs can reveal that this adjustment extended the equipment's useful life by 30 cycles. This ability to correlate specific actions with effects enhances both decision-making and maintenance efficiency. Additionally, our data augmentation approach results in a 5% improvement in α−λ accuracy for a strict α of 20%. The root mean square error (RMSE) of the B-LSTM model decreases from 9.56 to 8.47 cycles, demonstrating the potential of Uncertainty CFEs to improve accuracy in aircraft maintenance. The code is publicly available at Github.

Correction to

Advancing aircraft engine RUL predictions: an interpretable integrated approach of feature engineering and aggregated feature importance (Scientific Reports, (2023), 13, 1, (13466), 10.1038/s41598-023-40315-1)

Journal article (2024) - Yazan Alomari, Mátyás Andó, Marcia L. Baptista

Correction to: Scientific Reportshttps://doi.org/10.1038/s41598-023-40315-1, published online 18 August 2023 The original version of this Article contained an error in Figure 1, where “FD004” was omitted from the “Testing” block. The original Figure 1 and accompanying legend appear below. (Figure presented.) Flowchart illustrating the proposed workflow. The original Article has been corrected. ...

Health index estimation through integration of general knowledge with unsupervised learning

Journal article (2024) - Kristupas Bajarunas, Marcia L. Baptista, Kai Goebel, Manuel Arias Chao

Accurately estimating a Health Index (HI) from condition monitoring data (CM) is essential for reliable and interpretable prognostics and health management (PHM) in complex systems. In most scenarios, complex systems operate under varying operating conditions and can exhibit different fault modes, making unsupervised inference of an HI from CM data a significant challenge. Hybrid models combining prior knowledge about degradation with deep learning models have been proposed to overcome this challenge. However, previously suggested hybrid models for HI estimation usually rely heavily on system-specific information, limiting their transferability to other systems. In this work, we propose an unsupervised hybrid method for HI estimation that integrates general knowledge about degradation into the convolutional autoencoder's model architecture and learning algorithm, enhancing its applicability across various systems. The effectiveness of the proposed method is demonstrated in two case studies from different domains: turbofan engines and lithium batteries. The results show that the proposed method outperforms other competitive alternatives, including residual-based methods, in terms of HI quality and their utility for Remaining Useful Life (RUL) predictions. The case studies also highlight the comparable performance of our proposed method with a supervised model trained with HI labels. ...

Revision and implementation of metrics to evaluate the performance of prognostics models

Journal article (2024) - Marcia L. Baptista, Sahil Panse, Bruno F. Santos

Prognostics is used in predictive maintenance to estimate the remaining time to the end of the life of a system or component. Among the many challenges of prognostics is the need for model verification and validation. Over the years, several objective metrics have been utilized by the community. Some of these metrics came from statistics, others from forecasting, and others have been proposed specifically for prognostics. A single “perfect” metric has not yet been put forward. Finding one metric that can excel in all evaluation dimensions and case studies is an open question. In this review, we analyze the most important metrics of prognostics. A set of 19 metrics is subject to analysis and implementation. The metrics are implemented on a public GitHub project. Our analysis focuses only on metrics for deterministic predictions. Stochastic predictions are out of the scope. The paper describes properties, advantages, disadvantages, and industrial applicability of each metric. We also discuss potential modifications to the existing metrics and the development of new metrics. A final table summarizes the main properties of the metrics. Our goal is to raise awareness about prognostics metrics and help establish a common evaluation procedure. Code available at: MetricsForPrognostics. ...

Unsupervised Physics-Informed Health Indicator Discovery for Complex Systems

Conference paper (2023) - Kristupas Bajarunas, Marcia Baptista, Kai Goebel, Manuel Arias Chao

Discovering health indicators (HI) is essential for prognostics and health management of complex systems, as an HI enables timely interventions and effective maintenance strategies. However, most of the existing methodologies for HI discovery rely on labeled data which is expensive and complicated to obtain in the real world. In this paper, we propose a novel, unsupervised physics-informed model structured after expert knowledge in the form of a graphical representation of the expected relationships between sensor readings, operating conditions, and degradation. In addition, a soft constraint is used to guide the representation of the HI according to generally available expert knowledge about degradation. We evaluated the model on a turbofan engine dataset and conducted four experiments by manipulating the original data to create realistic real-world scenarios. The proposed method discovers an HI that exhibits better intrinsic qualities than the current state-of-the-art methodologies, leading to enhanced prognostic performance. Notably, in situations where the initial health state of each system varies, the proposed method achieves an average prognostic performance improvement of approximately 20% compared to existing state-of-the-art methods. ...

Cross-Version Software Defect Prediction Considering Concept Drift and Chronological Splitting

Journal article (2023) - Md Alamgir Kabir, Atiq Ur Rehman, M. M.Manjurul Islam, Nazakat Ali, Marcia L. Baptista

Concept drift (CD) refers to a phenomenon where the data distribution within datasets changes over time, and this can have adverse effects on the performance of prediction models in software engineering (SE), including those used for tasks like cost estimation and defect prediction. Detecting CD in SE datasets is difficult, but important, because it identifies the need for retraining prediction models and in turn improves their performance. If the concept drift is caused by symmetric changes in the data distribution, the model adaptation process might need to account for this symmetry to maintain accurate predictions. This paper explores the impact of CD within the context of cross-version defect prediction (CVDP), aiming to enhance the reliability of prediction performance and to make the data more symmetric. A concept drift detection (CDD) approach is further proposed to identify data distributions that change over software versions. The proposed CDD framework consists of three stages: (i) data pre-processing for CD detection; (ii) notification of CD by triggering one of the three flags (i.e., CD, warning, and control); and (iii) providing guidance on when to update an existing model. Several experiments on 30 versions of seven software projects reveal the value of the proposed CDD. Some of the key findings of the proposed work include: (i) An exponential increase in the error-rate across different software versions is associated with CD. (ii) A moving-window approach to train defect prediction models on chronologically ordered defect data results in better CD detection than using all historical data with a large effect size (Formula presented.). ...

Advancing aircraft engine RUL predictions

An interpretable integrated approach of feature engineering and aggregated feature importance

Journal article (2023) - Yazan Alomari, Mátyás Andó, Marcia L. Baptista

In this study, we present a comprehensive approach for predicting the remaining useful life (RUL) of aircraft engines, incorporating advanced feature engineering, dimensionality reduction, feature selection techniques, and machine learning models. The process begins with a rolling time series window, followed by the extraction of a multitude of statistical features, and the application of principal component analysis for dimensionality reduction. We utilize a variety of feature selection methods, such as Genetic Algorithm, Recursive Feature Elimination, Least Absolute Shrinkage and Selection Operator Regression, and Feature Importances from a Random Forest model. As a significant contribution, we introduce the novel aggregated feature importances with cross-validation (AFICv) technique, which ranks features based on their mean importance. We establish a selection criterion that retains features with a cumulative mean sum equal to 70%, thereby reducing the complexity of machine learning models and enhancing their generalizability. Four machine learning regression models-Natural and Extreme Gradient Boosting, Random Forest, and Multi-Layer Perceptron-were employed to evaluate the effectiveness of the selected features. The performance of our proposed method is assessed by the evaluation metrics Root Mean Square Error (RMSE) and R2 Score, and also considered within-interval percentages and relative accuracy metrics. Importantly, a novel PCA interpretability was introduced to provide real-world context and enhance the utility of our findings for domain experts. Our results indicate that the proposed AFICv technique efficiently achieves competitive performance across the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) sub-datasets using a significantly smaller subset of features, thus contributing to a more effective and interpretable RUL prediction methodology for aircraft engines. ...

Aircraft Engine Bleed Valve Prognostics Using Multiclass Gated Recurrent Unit

Journal article (2023) - M. Lourenço Baptista, Helmut Prendinger

Prognostics and health management is an engineering discipline that aims to support system operation while ensuring maximum safety and performance. Prognostics is a key step of this framework, focusing on developing effective maintenance policies based on predictive methods. Traditionally, prognostics models forecast the degradation process using regression techniques that approximate a mapping function from input to continuous remaining useful life estimates. These models are typically of high complexity and low interpretability. Classification approaches are an alternative solution to these types of models. We propose a predictive classification model that translates the input into discrete output variables instead of mapping the input to a single remaining useful life estimate. Each discrete output variable corresponds to a range of remaining useful life values. In other words, each output class variable represents the likelihood or risk of failure within a specific time range. We apply this model to a real-world case study involving the unscheduled and scheduled removals of a set of engine bleed valves from a fleet of Boeing 737 aircraft. The model can reach an area under the (micro-average) receiver operating characteristic curve of 72%. Our results suggest that the proposed multiclass gated recurrent unit network can provide valuable information about the different fault stages (corresponding to intervals of residual lives) of the studied valves. ...

Generic Hybrid Models for Prognostics of Complex Systems

Conference paper (2023) - Kristupas Bajarunas, Marcia Baptista, Kai Goebel, Manuel Arias Chao

Hybrid models combining physical knowledge and machine learning show promise for obtaining accurate and robust prognostic models. However, despite the increased interest in hybrid models in recent years, the proposed solutions tend to be domain-specific. As a result, there is no compelling strategy of what, where, and how physics-derived knowledge can be integrated into deep learning models depending on the available representation of physical knowledge and the quality of data for the development of prognostic models for complex systems. This Ph.D. project aims to develop a general strategy for hybridizing prognostic models by exploring multiple methods to incorporate physical knowledge at various stages of the learning algorithm. The project will prioritize general expert knowledge as the primary source of information, while domain-specific knowledge will serve as an additional feature when applicable. ...

1D-DGAN-PHM

A 1-D denoising GAN for Prognostics and Health Management with an application to turbofan

Journal article (2022) - Marcia L. Baptista, Elsa M.P. Henriques

The performance of prognostics is closely related to the quality of condition monitoring signals (e.g., temperature, pressure, or vibration signals), which reveal the degradation of the system of interest. However, typical condition monitoring signals include noise and outliers. Disentangling noise from these signals is essential to obtain the actual degradation trajectories. Different denoising methods have been proposed in prognostics. Conventional denoising methods have low complexity but usually do not preserve edge information and do not involve physical considerations. A promising deep learning approach is denoising generative models. This approach is popular in Computer Vision, which has been shown to outperform other classical techniques but has seldom been used in prognostics on 1-D signals. In this paper, we propose the 1-D Denoising Generative Adversarial Network for Prognostics and Health Management (1D-DGAN-PHM). The 1D-DGAN-PHM is trained on synthetic data generated by a custom data generator that infuses physics-of-failure knowledge in paired samples of noisy and noise-free trajectories. The network consists of two components, a denoising generator and a discriminator. The denoising generator aims to learn to denoise a 1-D input signal. The discriminator guides the learning by comparing noise-free signals with signals from the denoising generator. Advantages of the 1D-DGAN-PHM include the physics-of-failure information in the synthetic data generator and the model sophistication. In this work, we apply the 1D-DGAN-PHM to denoise the raw signals derived from NASA's C-MAPSS simulator of an aircraft turbofan engine. Baseline methods are Moving Average, Median filter, Savitzky–Golay filter, and a denoising autoencoder. The 1D-DGAN-PHM produces smooth trajectories and preserves the initial linear degradation of the signals. The 1D-DGAN-PHM has the most significant improvement in prognosability (on average, 0.73 to 0.81). Data from the 1D-DGAN-PHM resulted in the best MAE (29 to 25 cycles) and RMSE (score of 39 to 36) for a Random Forest. The code is publicly available at 1D-DGAN-PHM. ...

The performance of prognostics is closely related to the quality of condition monitoring signals (e.g., temperature, pressure, or vibration signals), which reveal the degradation of the system of interest. However, typical condition monitoring signals include noise and outliers. Disentangling noise from these signals is essential to obtain the actual degradation trajectories. Different denoising methods have been proposed in prognostics. Conventional denoising methods have low complexity but usually do not preserve edge information and do not involve physical considerations. A promising deep learning approach is denoising generative models. This approach is popular in Computer Vision, which has been shown to outperform other classical techniques but has seldom been used in prognostics on 1-D signals. In this paper, we propose the 1-D Denoising Generative Adversarial Network for Prognostics and Health Management (1D-DGAN-PHM). The 1D-DGAN-PHM is trained on synthetic data generated by a custom data generator that infuses physics-of-failure knowledge in paired samples of noisy and noise-free trajectories. The network consists of two components, a denoising generator and a discriminator. The denoising generator aims to learn to denoise a 1-D input signal. The discriminator guides the learning by comparing noise-free signals with signals from the denoising generator. Advantages of the 1D-DGAN-PHM include the physics-of-failure information in the synthetic data generator and the model sophistication. In this work, we apply the 1D-DGAN-PHM to denoise the raw signals derived from NASA's C-MAPSS simulator of an aircraft turbofan engine. Baseline methods are Moving Average, Median filter, Savitzky–Golay filter, and a denoising autoencoder. The 1D-DGAN-PHM produces smooth trajectories and preserves the initial linear degradation of the signals. The 1D-DGAN-PHM has the most significant improvement in prognosability (on average, 0.73 to 0.81). Data from the 1D-DGAN-PHM resulted in the best MAE (29 to 25 cycles) and RMSE (score of 39 to 36) for a Random Forest. The code is publicly available at 1D-DGAN-PHM.

Relation between prognostics predictor evaluation metrics and local interpretability SHAP values

Journal article (2022) - Marcia L. Baptista, Kai Goebel, Elsa M.P. Henriques

Maintenance decisions in domains such as aeronautics are becoming increasingly dependent on being able to predict the failure of components and systems. When data-driven techniques are used for this prognostic task, they often face headwinds due to their perceived lack of interpretability. To address this issue, this paper examines how features used in a data-driven prognostic approach correlate with established metrics of monotonicity, trendability, and prognosability. In particular, we use the SHAP model (SHapley Additive exPlanations) from the field of eXplainable Artificial Intelligence (XAI) to analyze the outcome of three increasingly complex algorithms: Linear Regression, Multi-Layer Perceptron, and Echo State Network. Our goal is to test the hypothesis that the prognostics metrics correlate with the SHAP model's explanations, i.e., the SHAP values. We use baseline data from a standard data set that contains several hundred run-to-failure trajectories for jet engines. The results indicate that SHAP values track very closely with these metrics with differences observed between the models that support the assertion that model complexity is a significant factor to consider when explainability is a consideration in prognostics. ...

Classification prognostics approaches in aviation

Journal article (2021) - Marcia L. Baptista, Elsa M.P. Henriques, Helmut Prendinger

Traditionally, prognostics approaches to predictive maintenance have focused on estimating the remaining useful life of the equipment. However, from an industrial point of view, the goal is often not to predict the residual life but to determine the need for a maintenance action at a given time window. This approach allows us to frame the data-driven prognostics problem as a binary classification task rather than a regression one. To address this problem, we propose in this paper to explore the relative strengths and limitations of a set of classifier approaches such as random forests, support vector machines, nearest neighbors, and deep learning techniques. We evaluate the models using metrics such as sensitivity, specificity, accuracy, receiver operating characteristic curve, and F-score. This work's novelty lies in adopting a modeling approach with a natural probabilistic interpretation of the prognostics exercise. The comparison of an extensive range of classifier models is performed on two real-world datasets from the aeronautics sector. Results indicate that deep learning classifier methods are well suited for this kind of prognostics and can outperform by a significant margin the traditional classification techniques. Importantly, the proposed modeling approach aims to generate an alternative prognostics representation that goes in line with the expectations of aeronautical engineers. ...

A self-organizing map and a normalizing multi-layer perceptron approach to baselining in prognostics under dynamic regimes

Journal article (2021) - Marcia Lourenco Baptista, Elsa M. Elsa, Kai Goebel

When the influence of changing operational and environmental conditions, such as temperature and external loading, is not factored out from sensor data it can be difficult to observe a clear deterioration path. This can significantly affect the task of engineering prognostics and other health management operations. To address this problem of dynamic operating regimes, it is necessary to baseline the data, typically by first finding the operating regimes and then normalizing the data within each regime. This paper describes a baselining solution based on neural networks. A self-organizing map is used to identify the regimes, and a multi-layer perceptron is used to normalize the sensor data according to the detected regimes. Tests are performed on public datasets from a turbofan simulator. The approach can produce similar results to classical methods without the need to specify in advance the number of regimes and the explicit computation of the statistical properties of a hold-out dataset. Importantly, the techniques can be integrated into a deep learning system to perform prognostics in a single pass. ...

More effective prognostics with elbow point detection and deep learning

Journal article (2020) - Marcia L. Baptista, Elsa M.P. Henriques, Kai Goebel

Prior to failure, most systems exhibit signs of changed characteristics. The early detection of this change is important to remaining useful life estimation. To have the ability to detect the inflection point or “elbow point” of an asset, i.e. the point of the degradation curve that marks the transition from nominal to faulty condition, can enable more sophisticated prognostics because this divide and conquer tactic allows the prediction to focus on the window before failure when significant changes are being expected. In this work, we compare prognostics with and without change point detection. We use different recurrent neural network techniques (standard recurrent neural network, long short-term memory and gated recurrent unit) to find the elbow point location. The actual estimation of the remaining time to failure is based on the echo state network, a state-of-the-art approach in prognostics. Two different experiments are performed on simulated data obtained from NASA Ames prognostics repository. We first compare the performance of the elbow point detectors based on recurrent neural networks against three baseline models: the Z-test, multi-layer perceptron and random forests. Results indicate that recurrent neural networks can outperform the baseline approaches. In the second experiment, the best elbow detection model, the gated recurrent unit, is integrated within an echo state network, with a significant increase in overall performance in terms of remaining useful life estimation. ...