Xinglong Zhang
Please Note
4 records found
1
Toward Scalable Multirobot Control
Fast Policy Learning in Distributed MPC
Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10 000 units.
Model-Based Safe Reinforcement Learning With Time-Varying Constraints
Applications to Intelligent Vehicles
In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles - a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.
Koopman operators are of infinite dimension and capture the characteristics of nonlinear dynamics in a lifted global linear manner. The finite data-driven approximation of Koopman operators results in a class of linear predictors, useful for formulating linear model predictive control (MPC) of nonlinear dynamical systems with reduced computational complexity. However, the robustness of the closed-loop Koopman MPC under modeling approximation errors and possible exogenous disturbances is still a crucial issue to be resolved. Aiming at the above problem, this paper presents a robust tube-based MPC solution with Koopman operators, i.e., r-KMPC, for nonlinear discrete-time dynamical systems with additive disturbances. The proposed controller is composed of a nominal MPC using a lifted Koopman model and an off-line nonlinear feedback policy. The proposed approach does not assume the convergence of the approximated Koopman operator, which allows using a Koopman model with a limited order for controller design. Fundamental properties, e.g., stabilizability, observability, of the Koopman model are derived under standard assumptions with which, the closed-loop robustness and nominal point-wise convergence are proven. Simulated examples are illustrated to verify the effectiveness of the proposed approach.