A. Dabiri
Please Note
19 records found
1
Model Predictive Path Integral Control with Smoothness-Oriented Extensions
Experimental Validation on a High-Speed Autonomous Vehicle
This thesis investigates the deployment of baseline MPPI and three smoothness-oriented extensions, Dynamic Covariance MPPI, Smooth MPPI (SMPPI), and Low-pass Filtered Sampling MPPI (LFS-MPPI), on a small-scale autonomous racing platform. Experiments focus on high-speed trajectory tracking under tight lane constraints and obstacle interactions, where steering bandwidth limitations and actuation delay significantly influence stability. Results show that baseline MPPI achieves competitive speeds and low nominal tracking error, but generates excessive high-frequency steering activity that induces sustained oscillations on hardware. Stability improvements are consistently associated with reduced steering-rate excitation. Among the evaluated extensions, LFS-MPPI provides the most favorable tradeoff between smoothness, tracking accuracy, and robustness by shaping the sampling process rather than post-processing the control signal.
The experiments further reveal a fundamental coupling between model fidelity and sampling efficiency under strict real-time constraints. By increasing the proportion of dynamically plausible rollouts at reduced sample counts, LFS-MPPI enables stable deployment of a dynamic bicycle prediction model and reliable tracking up to 3.5 m/s on the physical racetrack. These findings demonstrate that sampling structure is a decisive factor in achieving stable, high-performance, real-world MPPI control. ...
This thesis investigates the deployment of baseline MPPI and three smoothness-oriented extensions, Dynamic Covariance MPPI, Smooth MPPI (SMPPI), and Low-pass Filtered Sampling MPPI (LFS-MPPI), on a small-scale autonomous racing platform. Experiments focus on high-speed trajectory tracking under tight lane constraints and obstacle interactions, where steering bandwidth limitations and actuation delay significantly influence stability. Results show that baseline MPPI achieves competitive speeds and low nominal tracking error, but generates excessive high-frequency steering activity that induces sustained oscillations on hardware. Stability improvements are consistently associated with reduced steering-rate excitation. Among the evaluated extensions, LFS-MPPI provides the most favorable tradeoff between smoothness, tracking accuracy, and robustness by shaping the sampling process rather than post-processing the control signal.
The experiments further reveal a fundamental coupling between model fidelity and sampling efficiency under strict real-time constraints. By increasing the proportion of dynamically plausible rollouts at reduced sample counts, LFS-MPPI enables stable deployment of a dynamic bicycle prediction model and reliable tracking up to 3.5 m/s on the physical racetrack. These findings demonstrate that sampling structure is a decisive factor in achieving stable, high-performance, real-world MPPI control.
Optimal Control of Slender Soft Robots in Low-Stiffness Regimes
A Model Based Approach
However, the application of optimal control in soft robotics has largely relied on simplified models, and its use with more accurate and geometrically consistent formulations remains underexplored, particularly for explicitly tackling underactuation. This thesis investigates the use of Differential Dynamic Programming (DDP) to control continuum soft robots modeled using the Geometric Variable Strain (GVS) framework. The focus is on the Soft Inverted Pendulum (SIP) as a template system to evaluate DDP’s feasibility, robustness, and performance in underactuated settings, including low-stiffness regimes where collocated feedback strategies break down. The implementation leverages the use of analytical gradients computed via the Recursive Newton-Euler Algorithm (RNEA) to improve convergence and computational efficiency.
The results show that DDP outperforms traditional Partial Feedback Linearization (PFL) methods, both collocated and non-collocated, especially across challenging mass-stiffness combinations. This effectively extends control authority and stability into regimes previously considered difficult to handle. This thesis extends the method to more complex hybrid soft–rigid systems, examining real-time feasibility and practical implementation, thereby laying the foundation for a generalizable optimal control framework for soft robots. ...
However, the application of optimal control in soft robotics has largely relied on simplified models, and its use with more accurate and geometrically consistent formulations remains underexplored, particularly for explicitly tackling underactuation. This thesis investigates the use of Differential Dynamic Programming (DDP) to control continuum soft robots modeled using the Geometric Variable Strain (GVS) framework. The focus is on the Soft Inverted Pendulum (SIP) as a template system to evaluate DDP’s feasibility, robustness, and performance in underactuated settings, including low-stiffness regimes where collocated feedback strategies break down. The implementation leverages the use of analytical gradients computed via the Recursive Newton-Euler Algorithm (RNEA) to improve convergence and computational efficiency.
The results show that DDP outperforms traditional Partial Feedback Linearization (PFL) methods, both collocated and non-collocated, especially across challenging mass-stiffness combinations. This effectively extends control authority and stability into regimes previously considered difficult to handle. This thesis extends the method to more complex hybrid soft–rigid systems, examining real-time feasibility and practical implementation, thereby laying the foundation for a generalizable optimal control framework for soft robots.
...
Constraint Aware Reinforcement Learning for Aeroelastic Aircraft
A hybridization of Reinforcement Learning with Model Predictive Control
Economical Greenhouse Climate Management
Improving Constraint Compliance with Stochastic Model Predictive Control under Weather Forecast and Parameter Uncertainty
This logistical challenge can be effectively modelled using max-plus linear algebra to allow an optimization for the route scheduling as was previously done by L. Smeets. The goal of this research is to improve the existing scheduling model and use this to develop a reinforcement learning-based algorithm that determines the optimal floorplan for the parcel delivery robots. Two methods are applied to improve the existing scheduling model. Firstly, nodes where no decisions are made are identified and removed. Secondly, certain constraints are also removed to simplify the model.
The results of the scheduler are used to determine a key performance indicator to allow a reinforcement learning based algorithm to identify the optimal floorplan for the robots. The reinforcement learning algorithm employed a deep Q-learning approach, with the neural network trained using various action space approaches, tuned rewards and hyper-parameters. The greedy-epsilon method was applied to address the exploration vs. exploitation problem. While the scheduler improvements significantly enhanced its computational costs, the neural network did not converge, and the potential causes are thoroughly discussed. ...
This logistical challenge can be effectively modelled using max-plus linear algebra to allow an optimization for the route scheduling as was previously done by L. Smeets. The goal of this research is to improve the existing scheduling model and use this to develop a reinforcement learning-based algorithm that determines the optimal floorplan for the parcel delivery robots. Two methods are applied to improve the existing scheduling model. Firstly, nodes where no decisions are made are identified and removed. Secondly, certain constraints are also removed to simplify the model.
The results of the scheduler are used to determine a key performance indicator to allow a reinforcement learning based algorithm to identify the optimal floorplan for the robots. The reinforcement learning algorithm employed a deep Q-learning approach, with the neural network trained using various action space approaches, tuned rewards and hyper-parameters. The greedy-epsilon method was applied to address the exploration vs. exploitation problem. While the scheduler improvements significantly enhanced its computational costs, the neural network did not converge, and the potential causes are thoroughly discussed.
Finding an optimal solution quickly becomes intractable for many applications and consequently suboptimal methods are also explored extensively in literature.
This work presents the Decentralized Optimization (DECOP) algorithm: a novel receding horizon control algorithm that exploits insights from MAPF research as well as decentralized control. In the proposed framework, each travelling agent communicates with agents in its proximity to solve a local MAPF problem that considers only a selected tractable number of agents. Inter-agent cooperation and conflict free operation are induced through applying a common local optimization policy during parallel local optimization and through a subsequent path reservation scheme based on random priorities. Inter-agent communication consists of sharing respective route alternatives from which additional information with regard to an agents' entanglement can be inferred which can also be included in the local optimization cost function.
Comparative results with other decentralized algorithms show that the DECOP algorithm yields competitive results while guaranteeing conflict free operations, with limited required communication and without the need of any training time. Among many degrees of freedom to be explored further, including information about the entanglements of an agent's route alternatives in the common policy for local optimization yields an increase in performance and suggests an increased extent of induced cooperation. ...
Finding an optimal solution quickly becomes intractable for many applications and consequently suboptimal methods are also explored extensively in literature.
This work presents the Decentralized Optimization (DECOP) algorithm: a novel receding horizon control algorithm that exploits insights from MAPF research as well as decentralized control. In the proposed framework, each travelling agent communicates with agents in its proximity to solve a local MAPF problem that considers only a selected tractable number of agents. Inter-agent cooperation and conflict free operation are induced through applying a common local optimization policy during parallel local optimization and through a subsequent path reservation scheme based on random priorities. Inter-agent communication consists of sharing respective route alternatives from which additional information with regard to an agents' entanglement can be inferred which can also be included in the local optimization cost function.
Comparative results with other decentralized algorithms show that the DECOP algorithm yields competitive results while guaranteeing conflict free operations, with limited required communication and without the need of any training time. Among many degrees of freedom to be explored further, including information about the entanglements of an agent's route alternatives in the common policy for local optimization yields an increase in performance and suggests an increased extent of induced cooperation.
Advancing Deep Reinforcement Learning for Real-World Traffic Signal Control
Addressing Sampling Challenges and Multi-Modal Traffic Dynamics
We developed a high-frequency sampling Proximal Policy Optimization (PPO) approach for TSC at a four-legged intersection, integrating both vehicle and pedestrian traffic in a multimodal setting. By employing Invalid Action Masking (IAM), we effectively handle signal timing constraints across these modalities. The framework was evaluated through traffic volume sensitivity analyses, assessments of generalization capabilities, disturbance rejection tests, and comparisons of methods for handling invalid actions.
The results indicate that short sampling intervals, such as 1 second, do not improve performance in terms of time-loss, with 4 to 6 seconds identified as the optimal range for PPO in TSC of a four-legged intersection. The findings also demonstrate that IAM can effectively be incorporated without compromising performance. When evaluating the ability to handle sudden spikes in traffic volume, PPO demonstrated superior performance, outperforming baseline methods such as max-pressure and fixed-time strategies in terms of both overshoot and settling time. Also, the results show that PPO can effectively prioritize vehicle and pedestrian modalities, displaying a clear proportional decrease in time-loss for the prioritized modality.
...
We developed a high-frequency sampling Proximal Policy Optimization (PPO) approach for TSC at a four-legged intersection, integrating both vehicle and pedestrian traffic in a multimodal setting. By employing Invalid Action Masking (IAM), we effectively handle signal timing constraints across these modalities. The framework was evaluated through traffic volume sensitivity analyses, assessments of generalization capabilities, disturbance rejection tests, and comparisons of methods for handling invalid actions.
The results indicate that short sampling intervals, such as 1 second, do not improve performance in terms of time-loss, with 4 to 6 seconds identified as the optimal range for PPO in TSC of a four-legged intersection. The findings also demonstrate that IAM can effectively be incorporated without compromising performance. When evaluating the ability to handle sudden spikes in traffic volume, PPO demonstrated superior performance, outperforming baseline methods such as max-pressure and fixed-time strategies in terms of both overshoot and settling time. Also, the results show that PPO can effectively prioritize vehicle and pedestrian modalities, displaying a clear proportional decrease in time-loss for the prioritized modality.
In this thesis, an algorithm that combines both reinforcement learning and optimization approaches is proposed to solve the railway timetable rescheduling problem. In the beginning, the reinforcement learning environment is constructed from the railway timetable rescheduling problem. By selecting the independent integer variables as the action, the constraints involving the integer variables are satisfied. After that, a value-based reinforcement learning algorithm is implemented to determine the independent integer variables of the MILP problem. Then, the complete solution of the integer variables could be derived from these independent integer variables. With the solution of integer variables, the MILP problem could be transformed into a linear programming problem, which could be solved efficiently.
Several case studies are conducted in this thesis based on part of the Dutch railway network from Utrecht to 's-Hertogenbosch. The simulation results show that the proposed method makes a great improvement compared with the baseline regarding reducing the total delay of the system. Meanwhile, the reinforcement learning-based method also has an obvious advantage in terms of running time. ...
In this thesis, an algorithm that combines both reinforcement learning and optimization approaches is proposed to solve the railway timetable rescheduling problem. In the beginning, the reinforcement learning environment is constructed from the railway timetable rescheduling problem. By selecting the independent integer variables as the action, the constraints involving the integer variables are satisfied. After that, a value-based reinforcement learning algorithm is implemented to determine the independent integer variables of the MILP problem. Then, the complete solution of the integer variables could be derived from these independent integer variables. With the solution of integer variables, the MILP problem could be transformed into a linear programming problem, which could be solved efficiently.
Several case studies are conducted in this thesis based on part of the Dutch railway network from Utrecht to 's-Hertogenbosch. The simulation results show that the proposed method makes a great improvement compared with the baseline regarding reducing the total delay of the system. Meanwhile, the reinforcement learning-based method also has an obvious advantage in terms of running time.
Distributed Vibration Control for Robotic cantilever beams
Study of optimal control architectures for robotic metamaterials with relative measurements
...
Traffic network management
"Comparing algorithms for network-wide traffic management using Eclipse SUMO: A pragmatic approach versus Model Predictive Control"
For this research, we compare a pragmatic, user-friendly and transparent control method versus a Model Predictive Control (MPC) approach. For the MPC controller, the second-order macroscopic METANET model is chosen. The METANET model describes a network as a directed graph. We test both controllers on two small scale freeway traffic networks. The control measures that are implemented are ramp-metering and rerouting. The simulations are done in a SUMO environment. The key performance index (KPI) used for comparison is Total Time Spend (TTS). The resulting optimisation problem is a Mixed Non-linear Integer Problem (MINLP). This problem is solved with a heuristic method by a Genetic Algorithm (GA).
Simulations for the two networks in a demand scenario around critical density are analysed over 20 iterations. The results prove the potential of both algorithms since both improved the TTS significantly. The NM excels in ease of implementation and ease of understanding for non-experts. While the MPC outperforms the NM in TTS reduction, it is harder to configure and understand for non-experts. The MPC is successfully tested on its capability to prevent undesired behaviour from happening by adding penalties to the objective function. For future research, larger networks need to be investigated, with a focus on simplifying the resulting optimisation problem. It is expected that a piecewise affine approximation is a promising method. ...
For this research, we compare a pragmatic, user-friendly and transparent control method versus a Model Predictive Control (MPC) approach. For the MPC controller, the second-order macroscopic METANET model is chosen. The METANET model describes a network as a directed graph. We test both controllers on two small scale freeway traffic networks. The control measures that are implemented are ramp-metering and rerouting. The simulations are done in a SUMO environment. The key performance index (KPI) used for comparison is Total Time Spend (TTS). The resulting optimisation problem is a Mixed Non-linear Integer Problem (MINLP). This problem is solved with a heuristic method by a Genetic Algorithm (GA).
Simulations for the two networks in a demand scenario around critical density are analysed over 20 iterations. The results prove the potential of both algorithms since both improved the TTS significantly. The NM excels in ease of implementation and ease of understanding for non-experts. While the MPC outperforms the NM in TTS reduction, it is harder to configure and understand for non-experts. The MPC is successfully tested on its capability to prevent undesired behaviour from happening by adding penalties to the objective function. For future research, larger networks need to be investigated, with a focus on simplifying the resulting optimisation problem. It is expected that a piecewise affine approximation is a promising method.
A Self-Configurable and Self-Adjustable Digital Twin For a Production Process
A case study at FOCUS-ON
Learning Drivers’ Preferences in Delivery Route Planning
An Inverse Optimization Approach
In this thesis, we will tackle the challenge using data-driven inverse optimization to learn the zone sequencing patterns of drivers. The zone sequences of expert drivers are assumed to be the solutions to a traveling salesman problem (TSP) in which the weights represent the preference of a driver to use a certain edge. The values of the weights will be learned through inverse optimization. Our final approach achieves a score that ranks 4th out of the 48 models that qualified for the final round of the challenge. ...
In this thesis, we will tackle the challenge using data-driven inverse optimization to learn the zone sequencing patterns of drivers. The zone sequences of expert drivers are assumed to be the solutions to a traveling salesman problem (TSP) in which the weights represent the preference of a driver to use a certain edge. The values of the weights will be learned through inverse optimization. Our final approach achieves a score that ranks 4th out of the 48 models that qualified for the final round of the challenge.
Distributed Model Predictive Control for Multi-Vehicle Autonomous Driving
Cooperative vs. Non-cooperative Control
The collision avoidance constraints render the OCP non-convex. This thesis tackles this non-convexity by either designing nonlinear MPC controllers, or by convexifying these non-convex constraints.
Moreover, control of a large, networked system of automated vehicles is achieved by designing local, subsystem-based controllers. We analyse three different algorithms to distribute the plantwide OCP. All controllers are subjected to an objective analysis and compared to see which is the most efficient and most practical to implement. Centralized MPC is used as benchmark, since this gives the plantwide optimal solution. The first decomposed algorithm is decentralized MPC, where subsystems communicate a single time every MPC iteration and compute their new trajectory based on the previously communicated trajectory of neighboring subsystems. The second method is based on sub-optimal cooperative distributed MPC. Here, vehicles perform multiple sub-optimal iterations of a Gauss-Jacobi type distributed optimization. For the last method, based on a Generalized Potential Game, the vehicles sequentially solve and communicate the solution of their local OCP in order to find an $\epsilon$-Nash Equilibrium. By relying on additional constraints or fixed ordering among vehicles, all three controllers are able to recursively feasible compute their own trajectory while avoiding other vehicles.
The distributed controllers are assessed in two different scenarios, using three different criteria, i.e., the overall effectiveness of the controller, the local effectiveness of the controller and the progress made, by each vehicle in the simulation. The first criteria gives an indication of the level of cooperation among vehicles, the second shows the individual satisfaction of each vehicle with respect to its reference, and the last represents the overall progress each vehicle has made in the highway simulation. ...
The collision avoidance constraints render the OCP non-convex. This thesis tackles this non-convexity by either designing nonlinear MPC controllers, or by convexifying these non-convex constraints.
Moreover, control of a large, networked system of automated vehicles is achieved by designing local, subsystem-based controllers. We analyse three different algorithms to distribute the plantwide OCP. All controllers are subjected to an objective analysis and compared to see which is the most efficient and most practical to implement. Centralized MPC is used as benchmark, since this gives the plantwide optimal solution. The first decomposed algorithm is decentralized MPC, where subsystems communicate a single time every MPC iteration and compute their new trajectory based on the previously communicated trajectory of neighboring subsystems. The second method is based on sub-optimal cooperative distributed MPC. Here, vehicles perform multiple sub-optimal iterations of a Gauss-Jacobi type distributed optimization. For the last method, based on a Generalized Potential Game, the vehicles sequentially solve and communicate the solution of their local OCP in order to find an $\epsilon$-Nash Equilibrium. By relying on additional constraints or fixed ordering among vehicles, all three controllers are able to recursively feasible compute their own trajectory while avoiding other vehicles.
The distributed controllers are assessed in two different scenarios, using three different criteria, i.e., the overall effectiveness of the controller, the local effectiveness of the controller and the progress made, by each vehicle in the simulation. The first criteria gives an indication of the level of cooperation among vehicles, the second shows the individual satisfaction of each vehicle with respect to its reference, and the last represents the overall progress each vehicle has made in the highway simulation.
Optimal traffic light control
Performance evaluation applying a general evaluation methodology
light control methods is described. With new optimization methods being developed, it is important to know their performance compared to similar methods. Such a comparison is only possible if the same performance evaluation methodology is applied to all these methods. Most of the studies in the field of intelligent transportation system consider a self-defined evaluation methodology. A general evaluation methodology is developed to objectively evaluate the performance of these optimization methods. The developed general evaluation methodology is used to evaluate the performance of a dynamic programming and Q-learning method. ...
light control methods is described. With new optimization methods being developed, it is important to know their performance compared to similar methods. Such a comparison is only possible if the same performance evaluation methodology is applied to all these methods. Most of the studies in the field of intelligent transportation system consider a self-defined evaluation methodology. A general evaluation methodology is developed to objectively evaluate the performance of these optimization methods. The developed general evaluation methodology is used to evaluate the performance of a dynamic programming and Q-learning method.