AD

A. Dabiri

info

Please Note

19 records found

Master thesis (2026) - A. Muntada Palomares, A. Dabiri, R.D. McAllister, L. Ferranti, Jacob Hemming
The increasing penetration of renewable energy sources (RES), worsening grid congestion and rising electricity price volatility are changing the operating conditions under which prosumers participate in electricity markets. In the Netherlands, these developments are accompanied by increasingly restrictive and time-varying grid connection limits, requiring energy management systems (EMSs) to reduce operational costs while maintaining feasible operation under uncertainty. This thesis develops and evaluates a risk-aware residual reinforcement learning (RRL) framework for day-ahead market (DAM) bidding by a grid-connected prosumer equipped with a commercial building load, a solar photovoltaic (PV) installation and a battery energy storage system (BESS). The proposed method combines a scenario-based optimisation baseline with a learning-based residual policy. This allows the optimisation layer to provide a constraint-aware reference schedule while the reinforcement learning (RL) agent learns corrective bidding and battery-scheduling actions. Time-varying grid limits are incorporated throughout the operational framework, constraining both the day-ahead schedule and the real-time imbalance market operation, where a rule-based controller operates the BESS and curtails PV generation during delivery. The proposed strategies are evaluated under Dutch electricity market conditions and compared against deterministic linear programming (LP) and scenario-based optimisation benchmarks in terms of operational cost, robustness and grid-limit violations. ...

Experimental Validation on a High-Speed Autonomous Vehicle

Autonomous driving near the handling limits of a vehicle places stringent demands on the control layer, where nonlinear dynamics, actuator delay, and model mismatch dominate closedloop behavior. Sampling-based optimal control methods, and in particular Model Predictive Path Integral (MPPI), offer an attractive alternative to deterministic receding-horizon control through stochastic rollouts and importance weighting. While MPPI has demonstrated strong performance in simulation, its real-world behavior at high speed remains insufficiently characterized.
This thesis investigates the deployment of baseline MPPI and three smoothness-oriented extensions, Dynamic Covariance MPPI, Smooth MPPI (SMPPI), and Low-pass Filtered Sampling MPPI (LFS-MPPI), on a small-scale autonomous racing platform. Experiments focus on high-speed trajectory tracking under tight lane constraints and obstacle interactions, where steering bandwidth limitations and actuation delay significantly influence stability. Results show that baseline MPPI achieves competitive speeds and low nominal tracking error, but generates excessive high-frequency steering activity that induces sustained oscillations on hardware. Stability improvements are consistently associated with reduced steering-rate excitation. Among the evaluated extensions, LFS-MPPI provides the most favorable tradeoff between smoothness, tracking accuracy, and robustness by shaping the sampling process rather than post-processing the control signal.
The experiments further reveal a fundamental coupling between model fidelity and sampling efficiency under strict real-time constraints. By increasing the proportion of dynamically plausible rollouts at reduced sample counts, LFS-MPPI enables stable deployment of a dynamic bicycle prediction model and reliable tracking up to 3.5 m/s on the physical racetrack. These findings demonstrate that sampling structure is a decisive factor in achieving stable, high-performance, real-world MPPI control. ...
Continuum soft robots present significant opportunities for advancing robotics, but they also introduce substantial technical challenges. These systems are highly nonlinear, infinite-dimensional, and severely underactuated, making control particularly difficult. While recent advancements in model-based control have addressed some of these issues for soft robotics, numerical optimal control has shown strong potential, especially given its success in other severely underactuated domains such as bipedal and quadrupedal locomotion.

However, the application of optimal control in soft robotics has largely relied on simplified models, and its use with more accurate and geometrically consistent formulations remains underexplored, particularly for explicitly tackling underactuation. This thesis investigates the use of Differential Dynamic Programming (DDP) to control continuum soft robots modeled using the Geometric Variable Strain (GVS) framework. The focus is on the Soft Inverted Pendulum (SIP) as a template system to evaluate DDP’s feasibility, robustness, and performance in underactuated settings, including low-stiffness regimes where collocated feedback strategies break down. The implementation leverages the use of analytical gradients computed via the Recursive Newton-Euler Algorithm (RNEA) to improve convergence and computational efficiency.

The results show that DDP outperforms traditional Partial Feedback Linearization (PFL) methods, both collocated and non-collocated, especially across challenging mass-stiffness combinations. This effectively extends control authority and stability into regimes previously considered difficult to handle. This thesis extends the method to more complex hybrid soft–rigid systems, examining real-time feasibility and practical implementation, thereby laying the foundation for a generalizable optimal control framework for soft robots. ...
The formal verification of multi-agent systems in safety-critical domains is challenged by the need to certify population-level behaviours, such as formation control, under environmental uncertainty. Traditional state-based verification techniques are often inadequate for expressing these system-wide objectives and face scalability limitations. This thesis addresses this gap by developing a distributional reachability framework that models the evolution of the system directly over the space of probability distributions, using Interval Markov Decision Processes (IMDPs) to capture model uncertainty. We introduce two complementary analysis algorithms to compute guaranteed bounds on the set of all reachable distributions: a forward method using occupation measures and McCormick relaxations, and a robust backward algorithm based on value iteration over a discretised distribution space. Case studies in swarm deployment demonstrate the efficacy of the framework in computing robust, set-based approximations of reachable distributions. Furthermore, results for the robust backward reachability algorithm are presented for a running example. This capability allows for the formal verification of complex distributional specifications and the synthesis of control policies with certified safety guarantees, establishing a computational foundation for designing certifiably safe, large-scale autonomous systems.
...

A hybridization of Reinforcement Learning with Model Predictive Control

This thesis presents a hybrid control framework that combines Reinforcement Learning (RL) and Model Predictive Control (MPC) to achieve constraint-satisfying flutter suppression and load alleviation in flexible aircraft subject to turbulent gusts. During training, MPC computes safe input bounds using high-fidelity Linear Parameter Varying (LPV) models and long prediction horizons, exploiting known disturbances to accurately capture aeroelastic behavior. A Q-learning agent is trained to select control actions within these bounds, adapting to nonlinear dynamics and actuator delay. At deployment, the learned policy operates from a lightweight Q-table, with certified interpolation ensuring constraint satisfaction even for unseen states. By integrating the anticipatory capabilities of MPC with the adaptability of RL, the framework enables effective control under turbulence and structural uncertainty. While demonstrated on an aeroelastic aircraft, the approach can be generalized to other systems with similar dynamics, where constraint handling and adaptive control under uncertainty are critical. ...

Improving Constraint Compliance with Stochastic Model Predictive Control under Weather Forecast and Parameter Uncertainty

Several stochastic model predictive control schemes are formulated to reduce constraint violations in the economic control of the climate in a lettuce greenhouse under weather forecast and parameter uncertainty. The schemes are tested in simulation. Two separate approaches are taken in the formulations. The first involves analytical constraint tightening through system linearization. Linearizing the system around the trajectory is found to improve performance compared to linearizing around a point. The linearized schemes proved to be overly conservative, especially under parameter uncertainty. The second approach is through tracking the average constraint violations to formulate adaptive constraints which do not require prior information about the underlying uncertainties. Originally proposed for linear systems, this approach is simplified and modified to impose a constraint tightening on deterministic nonlinear model predictive control. The adaptive schemes improve constraint compliance with reduced conservatism leading to a more acceptable increase in input costs compared to the linearized schemes. The results indicate that adaptive average violation constraints may be a useful tool in stochastic model predictive control and warrant further investigation. ...
Master thesis (2024) - E. Boelen, A.J.J. van den Boom, Lucy Smeets, Mart Ruijs, A. Dabiri
Automation of machines is becoming increasingly widespread and advanced, of which an example is the use robots for Prime Vision, which sorts parcels for postal services. The coordination of scheduling a fleet of robots picking up and dropping off many parcels while avoiding collisions, within a limited space, following predefined routes in a floorplan, is a complex problem.

This logistical challenge can be effectively modelled using max-plus linear algebra to allow an optimization for the route scheduling as was previously done by L. Smeets. The goal of this research is to improve the existing scheduling model and use this to develop a reinforcement learning-based algorithm that determines the optimal floorplan for the parcel delivery robots. Two methods are applied to improve the existing scheduling model. Firstly, nodes where no decisions are made are identified and removed. Secondly, certain constraints are also removed to simplify the model.

The results of the scheduler are used to determine a key performance indicator to allow a reinforcement learning based algorithm to identify the optimal floorplan for the robots. The reinforcement learning algorithm employed a deep Q-learning approach, with the neural network trained using various action space approaches, tuned rewards and hyper-parameters. The greedy-epsilon method was applied to address the exploration vs. exploitation problem. While the scheduler improvements significantly enhanced its computational costs, the neural network did not converge, and the potential causes are thoroughly discussed. ...
Master thesis (2024) - T.K. Scheepstra, B. Atasoy, A. Dabiri
Multi-agent path finding (MAPF) is the task of finding non-conflicting paths for multiple agents that operate in a environment with shared resources.
Finding an optimal solution quickly becomes intractable for many applications and consequently suboptimal methods are also explored extensively in literature.
This work presents the Decentralized Optimization (DECOP) algorithm: a novel receding horizon control algorithm that exploits insights from MAPF research as well as decentralized control. In the proposed framework, each travelling agent communicates with agents in its proximity to solve a local MAPF problem that considers only a selected tractable number of agents. Inter-agent cooperation and conflict free operation are induced through applying a common local optimization policy during parallel local optimization and through a subsequent path reservation scheme based on random priorities. Inter-agent communication consists of sharing respective route alternatives from which additional information with regard to an agents' entanglement can be inferred which can also be included in the local optimization cost function.
Comparative results with other decentralized algorithms show that the DECOP algorithm yields competitive results while guaranteeing conflict free operations, with limited required communication and without the need of any training time. Among many degrees of freedom to be explored further, including information about the entanglements of an agent's route alternatives in the common policy for local optimization yields an increase in performance and suggests an increased extent of induced cooperation. ...

Addressing Sampling Challenges and Multi-Modal Traffic Dynamics

Master thesis (2024) - K.F. Ceton, S. Grammatico, Tijs van Bakel, G. Pantazis, A. Dabiri
Deep Reinforcement Learning (DRL) is a promising approach to Traffic Signal Control (TSC). However, significant challenges remain in translating this potential into real-world traffic management solutions. This thesis investigates obstacles hindering the application of DRL in real-world TSC, focusing on low sampling frequencies and the complexities of multi-modal traffic scenarios.

We developed a high-frequency sampling Proximal Policy Optimization (PPO) approach for TSC at a four-legged intersection, integrating both vehicle and pedestrian traffic in a multimodal setting. By employing Invalid Action Masking (IAM), we effectively handle signal timing constraints across these modalities. The framework was evaluated through traffic volume sensitivity analyses, assessments of generalization capabilities, disturbance rejection tests, and comparisons of methods for handling invalid actions.

The results indicate that short sampling intervals, such as 1 second, do not improve performance in terms of time-loss, with 4 to 6 seconds identified as the optimal range for PPO in TSC of a four-legged intersection. The findings also demonstrate that IAM can effectively be incorporated without compromising performance. When evaluating the ability to handle sudden spikes in traffic volume, PPO demonstrated superior performance, outperforming baseline methods such as max-pressure and fixed-time strategies in terms of both overshoot and settling time. Also, the results show that PPO can effectively prioritize vehicle and pedestrian modalities, displaying a clear proportional decrease in time-loss for the prioritized modality.
...
Master thesis (2023) - H. Zhang, B.H.K. De Schutter, X. Liu, A. Dabiri
The railway timetable rescheduling problem is a challenging problem in both industry and academia. It is required to calculate a feasible and relatively good timetable within a limited time to reduce the negative impact of disturbances or disruptions. The railway timetable rescheduling problem is typically formulated as a mixed integer linear programming (MILP) problem, which is difficult to solve due to the existence of the integer variables. To address this problem, many optimization-based studies have been conducted. The main advantage of using optimization-based methods is that they are easy to implement and more straightforward. However, the main disadvantage is that most optimization-based methods cannot reach the time requirements for large railway timetable rescheduling problems. There are also some researches using reinforcement learning techniques to solve this problem. By using reinforcement learning, the time requirement could be fulfilled.

In this thesis, an algorithm that combines both reinforcement learning and optimization approaches is proposed to solve the railway timetable rescheduling problem. In the beginning, the reinforcement learning environment is constructed from the railway timetable rescheduling problem. By selecting the independent integer variables as the action, the constraints involving the integer variables are satisfied. After that, a value-based reinforcement learning algorithm is implemented to determine the independent integer variables of the MILP problem. Then, the complete solution of the integer variables could be derived from these independent integer variables. With the solution of integer variables, the MILP problem could be transformed into a linear programming problem, which could be solved efficiently.

Several case studies are conducted in this thesis based on part of the Dutch railway network from Utrecht to 's-Hertogenbosch. The simulation results show that the proposed method makes a great improvement compared with the baseline regarding reducing the total delay of the system. Meanwhile, the reinforcement learning-based method also has an obvious advantage in terms of running time. ...

Study of optimal control architectures for robotic metamaterials with relative measurements

Master thesis (2022) - V.F. Buskes, M.B. Kaczmarek, S.H. Hossein Nia Kani, S. Grammatico, A. Hunt, Corentin Coulais, Jonas Veenstra, A. Dabiri
Vibrations and disturbances are becoming more of a concern as lightweight, flexible structures in high-tech systems are pushed towards faster speeds and higher precision. Active Vibration Control (AVC) methods have been effectively used to attenuate vibrations and increase the bandwidth of these systems. With the miniaturisation of electronics, an increasing amount of sensor and actuator pairs can be used for AVC applications. Not only does this allow for higher active damping, it also grants more flexibility in terms of control. This trend has led to the study of robotic metamaterials and meta-structures: large-scale engineered materials build out of a repeating pattern of unit cells, where each unit cell contains a sensor, actuator and sometimes even a computing unit. The optimal control architecture to use for these systems is a difficult dilemma, since decentralised and centralised control schemes both have fundamental trade-offs in terms of performance and scalability. In this paper we study distributed control, a promising middle-ground solution that is hardly used in AVC applications. We show with the use of LQR that a distributed control architecture can achieve optimal performance in the low-frequency range for robotic materials with relative measurements. Additionally, the actuators use lower maximum control forces and a distributed control architecture remains scalable for implementation in large-scale systems. In this paper the robotic cantilever beam is studied as a specific example as it represent many typical high-tech applications. Furthermore, implications on periodic robotic meta-structures are made using LQR in the Spatial Fourier Domain.

...

"Comparing algorithms for network-wide traffic management using Eclipse SUMO: A pragmatic approach versus Model Predictive Control"

Master thesis (2022) - L. Heunks, S. Grammatico, Tijs Van Bakel, S.D. Gonçalves Melo Pequito, A. Dabiri
The need for smart traffic control has grown over the last years. Initiated by an increased amount of traffic. Network-wide traffic control is becoming a more interesting field for traffic control. Mainly because computer power has increased and optimisation techniques improved. Network-wide traffic aims to improve the overall traffic state by looking at the entire problem instead of sub-problems. Besides improving traffic conditions, network-wide traffic control could support road operators in simplifying their work by taking over some tasks and keeping track of the situation.

For this research, we compare a pragmatic, user-friendly and transparent control method versus a Model Predictive Control (MPC) approach. For the MPC controller, the second-order macroscopic METANET model is chosen. The METANET model describes a network as a directed graph. We test both controllers on two small scale freeway traffic networks. The control measures that are implemented are ramp-metering and rerouting. The simulations are done in a SUMO environment. The key performance index (KPI) used for comparison is Total Time Spend (TTS). The resulting optimisation problem is a Mixed Non-linear Integer Problem (MINLP). This problem is solved with a heuristic method by a Genetic Algorithm (GA).

Simulations for the two networks in a demand scenario around critical density are analysed over 20 iterations. The results prove the potential of both algorithms since both improved the TTS significantly. The NM excels in ease of implementation and ease of understanding for non-experts. While the MPC outperforms the NM in TTS reduction, it is harder to configure and understand for non-experts. The MPC is successfully tested on its capability to prevent undesired behaviour from happening by adding penalties to the objective function. For future research, larger networks need to be investigated, with a focus on simplifying the resulting optimisation problem. It is expected that a piecewise affine approximation is a promising method. ...
Master thesis (2022) - D.F. Edens, F. Schulte, R.R. Negenborn, A. Dabiri
Digital Twins are a key component of Industry 4.0. They are a digital representation of a real-world entity containing both the structure and the dynamics of its real-world counterpart. The use of Digital Twins appears to offer a powerful an compelling application for production processes. A Self-configurable and self-adjustable Digital Twin is developed for the Greenfield production process of FOCUS-ON. FOCUS-ON expects a fast growth in order demand and needs to adjust their production line accordingly. A Digital Twin is developed according to the 5C architecture. The Digital Twin proposes adjustments to the production line to keep up with the increasing order demand. ...
Master thesis (2022) - C.G. van der Horst, A. Hegyi, A.M. Salomons, A. Dabiri, R.T. van Katwijk
Urban mobility is challenged with increasing demand, while at the same time reducing emissions, without leading to an unpleasant living environment. Reducing the number of stops on the urban arterial controlled by the coordinated traffic controller can provide (part of) the solution to these challenges for urban mobility. Coordinated traffic controllers are subject to some limitations, especially regarding coordination between locally optimal signal timing plans and proactive optimization of the coordination with regards to traffic demand. These limitations where explored in this thesis, where a variable speed was proposed to be able to provide coordination between locally optimal signal timing plans and where the usage of predictions with regards to proactive optimizations was tested. A theoretical study showed no realistic potential for coordination between unequal cycle times, however theory does show significant potential of the usage of a variable speed for coordination between locally optimal signal timing plans, when coordinating in two directions. This potential of the variable speed was confirmed by using the MAXBAND model and performing a simulation study, which indicated a significant decrease in stops on the main arterial over optimizing with a fixed speed. Furthermore, the variable speed allowed for a lower network cycle time, which resulted in a decrease in delay on the side directions. Tests of demand predictions in TopTrac yielded no significant improvements of stops nor delay. In the investigated network, control decisions of the coordinated traffic controller did not correlate closely with fluctuating demand, which is needed for a prediction of the demand to produce significant improvements regarding the stops and delay in the network. Future research should focus on the variable speed, evaluating the theoretical applications in other networks and exploring the practical applications, potentially via Intelligent Speed Adaptation (ISA). ...
Optimizing delivery routes is a well-researched topic, however, most of the classical approaches do not incorporate preferences of drivers, as those approaches focus on minimizing the time or distance of the routes. As a result, the actual driven route of an experienced driver often deviates from the proposed route since the drivers have tacit knowledge about the real-life conditions of the road network. Amazon proposed a challenge to learn a delivery route planning strategy from historically driven routes and thus incorporate this tacit knowledge.
In this thesis, we will tackle the challenge using data-driven inverse optimization to learn the zone sequencing patterns of drivers. The zone sequences of expert drivers are assumed to be the solutions to a traveling salesman problem (TSP) in which the weights represent the preference of a driver to use a certain edge. The values of the weights will be learned through inverse optimization. Our final approach achieves a score that ranks 4th out of the 48 models that qualified for the final round of the challenge. ...
Ongoing research in autonomous driving currently focuses on creating new applications for autonomous vehicles (AV) and connected autonomous vehicles (CAV). Specifically, motion planning and control solutions are being developed based on the combination of Artificial Potential Functions (APF) with economic Model Predictive Control (eMPC). These two methods are integrated into a new Comprehensive Predictive Control (CPC) strategy. Although preliminary research shows promising results, a performance analysis of this approach, both for AV and CAV, has not yet been published. Therefore this thesis studies the capabilities of this novel APF-eMPC framework by carrying out numerical simulations. Multiple manoeuvres and varying amounts of white noise are utilized to test the controller's limitations. For the AV part, multiple basic driving manoeuvres are simulated: lane-keeping, car-following and lane-changing. The results show that an AV based on this framework can execute these different manoeuvres without precise measurements. The CAV concept is simulated using a platoon scenario. The gap-closing behaviour of the multiple CAVs in a platoon is examined. The state-of-the-art gap-closing APF is compared with an APF based on inter-molecular dynamics and fitted on actual traffic. Various experiments are carried out using a constant time-headway in combination with different time gaps between the vehicles. The results show that the resulting behaviour by the inter-molecular APF better matches human driving behaviour and results in less dangerous gap-closing behaviour than the quadratic platoon APF. The latter has a more considerable change of lateral instability occurring. Therefore the APF based on inter-molecular dynamics and fitted on actual traffic data outperforms the APF based on a quadratic function. Lastly, it was found that the coupling between the longitudinal and lateral dynamics, often neglected in literature, cannot be ignored during platoon stability analysis. ...
Master thesis (2021) - T.R. Robeerts, R. Babuska, H. Vallery, A. Dabiri
Twisted and coiled polymer muscles (TCPMs) show promise to function as artificial muscles, because of their lightweight, low cost, large contraction, and respectively low hysteresis. A TCPM contracts when it is heated and extends when it is cooled. Different modeling and controlling techniques have been implemented. \cite{VanDerWeijde_2019} implemented a self-sensing model that does not need large apparatus for measurements of force and deflection. The goal of this thesis is to design a force controller that works with this model. Parameter estimation of the self-sensing model is done. The fit of the model is not high enough for control. A first order black-box model is estimated and used instead. A P and PI controller is simulated and tested on the setup. The force oscillates around the reference value. This is because the actual model is of order 2. A D-action needs to be added to dampen the oscillations. The integral action reduces the max to min and vice versa input behavior. The model parameter differs for each TCPM. The controller parameters have to be adjusted for each TCPM. This is impractical in large-scale applications. Further research can be done into using model-free controllers. ...
In this thesis, we consider the problem of controlling multiple autonomous vehicles in a highway scenario, via MPC. By iteratively solving a motion planning OCP, MPC is perfectly suited for unknown dynamic environments, while optimally computing path and vehicle inputs. Moreover, MPC can ensure the satisfaction of collision avoidance constraints, a prerequisite for safe automated driving.

The collision avoidance constraints render the OCP non-convex. This thesis tackles this non-convexity by either designing nonlinear MPC controllers, or by convexifying these non-convex constraints.

Moreover, control of a large, networked system of automated vehicles is achieved by designing local, subsystem-based controllers. We analyse three different algorithms to distribute the plantwide OCP. All controllers are subjected to an objective analysis and compared to see which is the most efficient and most practical to implement. Centralized MPC is used as benchmark, since this gives the plantwide optimal solution. The first decomposed algorithm is decentralized MPC, where subsystems communicate a single time every MPC iteration and compute their new trajectory based on the previously communicated trajectory of neighboring subsystems. The second method is based on sub-optimal cooperative distributed MPC. Here, vehicles perform multiple sub-optimal iterations of a Gauss-Jacobi type distributed optimization. For the last method, based on a Generalized Potential Game, the vehicles sequentially solve and communicate the solution of their local OCP in order to find an $\epsilon$-Nash Equilibrium. By relying on additional constraints or fixed ordering among vehicles, all three controllers are able to recursively feasible compute their own trajectory while avoiding other vehicles.

The distributed controllers are assessed in two different scenarios, using three different criteria, i.e., the overall effectiveness of the controller, the local effectiveness of the controller and the progress made, by each vehicle in the simulation. The first criteria gives an indication of the level of cooperation among vehicles, the second shows the individual satisfaction of each vehicle with respect to its reference, and the last represents the overall progress each vehicle has made in the highway simulation. ...

Performance evaluation applying a general evaluation methodology

Master thesis (2018) - Alwin Hillebrink, Andreas Hegyi, Aleksander Czechowski, Bart De Schutter, Azita Dabiri, Meng Lu
The ongoing increase in urbanization and traffic congestion creates an urgent need to operate our transportation systems with maximum efficiency. Traffic signal control optimization is considered one of the main ways to solve traffic problems in urban networks. In publications in the field of intelligent transportation systems, a vast amount of different optimal traffic
light control methods is described. With new optimization methods being developed, it is important to know their performance compared to similar methods. Such a comparison is only possible if the same performance evaluation methodology is applied to all these methods. Most of the studies in the field of intelligent transportation system consider a self-defined evaluation methodology. A general evaluation methodology is developed to objectively evaluate the performance of these optimization methods. The developed general evaluation methodology is used to evaluate the performance of a dynamic programming and Q-learning method. ...