F. Airaldi
Please Note
8 records found
1
This work expands on the current state-of-the-art results that combine these two distinct approaches into a single framework. While not always straightforward, it is well known that endowing these decision-making processes with model-based knowledge can not only enhance their performance but also benefit their interpretability and analysis: model-based RL, also known as Approximate Dynamic Programming (ADP), is perhaps the most renowned machine learning paradigm to craft these intelligent predictive agents. This dissertation aims to look at RL from a different perspective. Instead of as an alternative to model-based control, RL is used as a performance-enhancing mechanism operating within rigorously defined safety requirements. Concurrently, this thesis establishes MPC as a unifying and scalable foundation block for learning-based control and optimisation for constrained, uncertain, and distributed decision-making systems...... ...
This work expands on the current state-of-the-art results that combine these two distinct approaches into a single framework. While not always straightforward, it is well known that endowing these decision-making processes with model-based knowledge can not only enhance their performance but also benefit their interpretability and analysis: model-based RL, also known as Approximate Dynamic Programming (ADP), is perhaps the most renowned machine learning paradigm to craft these intelligent predictive agents. This dissertation aims to look at RL from a different perspective. Instead of as an alternative to model-based control, RL is used as a performance-enhancing mechanism operating within rigorously defined safety requirements. Concurrently, this thesis establishes MPC as a unifying and scalable foundation block for learning-based control and optimisation for constrained, uncertain, and distributed decision-making systems......
Greenhouse climate control is concerned with maximizing performance in terms of crop yield and resource efficiency. One promising approach is model predictive control (MPC), which leverages a model of the system to optimize the control inputs, while enforcing physical constraints. However, prediction models for greenhouse systems are inherently inaccurate due to the complexity of the real system and the uncertainty in predicted weather profiles. For model-based control approaches such as MPC, this can degrade performance and lead to constraint violations. Existing approaches address uncertainty in the prediction model with robust or stochastic MPC methodology; however, these necessarily reduce crop yield due to conservatism and often bear higher computational loads. In contrast, learning-based control approaches, such as reinforcement learning (RL), can handle uncertainty naturally by leveraging data to improve performance. This work proposes an MPC-based RL control framework to optimize the climate control performance in the presence of prediction uncertainty. The approach employs a parametrized MPC scheme that learns directly from data, in an online fashion, the parametrization of the constraints, prediction model, and optimization cost that minimizes constraint violations and maximizes climate control performance. Simulations show that the approach can learn an MPC controller that significantly outperforms the current state-of-the-art in terms of constraint violations and efficient crop growth.
Purpose: To describe the validation of a novel automated analysis of preoperative pan-corneal endothelial cell viability. Methods: Preclinical experimental study. Dead endothelial cells and denuded areas of Descemet membrane of corneoscleral rims were stained with trypan blue (TB) 0.05%. Endothelial mortality was estimated by an experienced eye bank technician ("gold standard") and by deep learning-aided automated segmentation of TB-positive areas (TBPAs) on images of the stained corneas ("V-CHECK method"). V-CHECK mortality was calculated for the whole cornea and for concentric 2-mm rings. The agreement in the estimation of endothelial mortality between the two methods was assessed with Bland-Altman analysis and correlation tests. Results: Nineteen corneas deemed unsuitable for transplantation were used for the experiment. The automated V-CHECK method was able to accurately segment the corneal endothelium and the TBPAs. The gold standard and the V-CHECK method showed a strong positive correlation for all rings (Pearson's ρ, range 0.76-0.81, all P < 0.001). The V-CHECK method resulted in a higher average estimated endothelial mortality (mean difference range +6.5% to +9.5%). Conclusions: The V-CHECK method enables reproducible estimation of endothelial cell viability in donor corneas. Incorporating this technique into the preoperative assessment of donor corneal tissues (in the eye bank and in the operating theater) can provide a reliable evaluation of endothelial health, thereby improving the consistency of tissue quality and further supporting optimal surgical results. Translational Relevance: The V-CHECK deep learning-assisted computer vision protocol will allow surgeons and eye bank technicians to perform an independent, preoperative assessment of global corneal endothelial viability.
In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning (RL) techniques within the Model Predictive Control (MPC) framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of the constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which leverages the MPC optimal problem as a function approximation for the RL algorithm, is proposed to learn to efficiently control an on-ramp and satisfy its constraints despite uncertainties in the system model and variable demands. Simulations are performed on a benchmark small-scale highway network to compare the proposed methodology against other state-of-the-art control approaches. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance that is superior to the other controllers.
This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example.
This paper proposes a method to encourage safety in Model Predictive Control (MPC)-based Reinforcement Learning (RL) via Gaussian Process (GP) regression. The framework consists of 1) a parametric MPC scheme that is employed as model-based controller with approximate knowledge on the real system's dynamics, 2) an episodic RL algorithm tasked with adjusting the MPC parametrization in order to increase its performance, and 3) GP regressors used to estimate, directly from data, constraints on the MPC parameters capable of predicting, up to some probability, whether the parametrization is likely to yield a safe or unsafe policy. These constraints are then enforced onto the RL updates in an effort to enhance the learning method with a probabilistic safety mechanism. Compared to other recent publications combining safe RL with MPC, our method does not require further assumptions on, e.g., the prediction model in order to retain computational tractability. We illustrate the results of our method in a numerical example on the control of a quadrotor drone in a safety-critical environment.