Autonomous greenhouse climate control with Q-learning using ENMPC as a function approximator

More Info
expand_more

Abstract

Greenhouses allow production of crops that would otherwise be impossible. Permitting more local, fresher and nutrient richer crop production. Eorts are taken to minimize societal harm due to energy and resource consumption by greenhouse production systems. One way to control such systems is by using model predictive control. Optimal crop yield and resource eciency can, in theory, be achieved by model predictive control. Unfortunately, one major drawback of model predictive control is that it is not well equipped to deal with parametric uncertainty. Significant prediction errors can occur when a mismatch between the model and the real system exists, resulting in deteriorated performance of the system. Strategies exist, such as robust MPC, that are designed to handle uncertainty, but those often result in conservative control policies. This thesis proposes to use model predictive control as a function approximator for RL in order to learn values for model and MPC parameters that can deliver optimal performance in the case of model mismatch.
In this thesis, data-driven economic nonlinear model predictive control using Q-learning is proposed as a method to alter the model parameters. The performance of the system af- ter learning is compared to approaches using robust and nominal model predictive control. Three dierent goals are determined: maximizing economic profit, minimizing the constraint violations and maximizing the economic performance while minimizing constraint violations.
In this work, an ENMPC scheme is used as a function approximator in a Q-learning envi- ronment. The optimization solution from the ENMPC scheme is used as the input to the system, while the Q-learning agent optimizes the parameter values of the ENMPC scheme and model for the environment. The performance of the system after learning is compared to approaches using robust and nominal model predictive control. The simulation results show that the data-driven ENMPC using reinforcement learning is able to decrease constraint vi- olations by up to 94%, but unable to increase economic performance compared to nominal MPC, compared to robust MPC the EPI is increased by almost 10% while keeping constraint violations at a similar level.