Y. Zhou
Please Note
16 records found
1
Hierarchical Reinforcement Learning (HRL) provides an option to solve complex guidance and navigation problems with high-dimensional spaces, multiple objectives, and a large number of states and actions. The current HRL methods often use the same or similar reinforcement learning methods within one application so that multiple objectives can be easily combined. Since there is not a single learning method that can benefit all targets, hybrid Hierarchical Reinforcement Learning (hHRL) was proposed to use various methods to optimize the learning with different types of information and objectives in one application. The previous hHRL method, however, requires manual task-specific designs, which involves engineers’ preferences and may impede its transfer learning ability. This paper, therefore, proposes a systematic online guidance and navigation method under the framework of hHRL, which generalizes training samples with a function approximator, decomposes the state space automatically, and thus does not require task-specific designs. The simulation results indicate that the proposed method is superior to the previous hHRL method, which requires manual decomposition, in terms of the convergence rate and the learnt policy. It is also shown that this method is generally applicable to non-stationary environments changing over episodes and over time without the loss of efficiency even with noisy state information.
Globalized dual heuristic programming (GDHP) is the most comprehensive adaptive critic design, which employs its critic to minimize the error with respect to both the cost-to-go and its derivatives simultaneously. Its implementation, however, confronts a dilemma of either introducing more computational load by explicitly calculating the second partial derivative term or sacrificing the accuracy by loosening the association between the cost-to-go and its derivatives. This article aims at increasing the online learning efficiency of GDHP while retaining its analytical accuracy by introducing a novel GDHP design based on a critic network and an associated dual network. This associated dual network is derived from the critic network explicitly and precisely, and its structure is in the same level of complexity as dual heuristic programming critics. Three simulation experiments are conducted to validate the learning ability, efficiency, and feasibility of the proposed GDHP critic design.
Urban air mobility is a relatively new concept that has been proposed in recent years as a means of transporting passengers and goods in urban areas. It encompasses a diverse range of Vertical TakeOff and Landing (VTOL) vehicles that function more like passenger-carrying drones for on-demand transportation. Among them, the car-like VTOL is advantageous due to its compact configuration, safe rotors, high user affinity, and technological fashion. These characteristics are frequently derived from the flying car’s Ducted Fan Lift System (DFLS). This study aims to develop a method for the rapid design and the evaluation of the aerodynamic performance of the DFLS, to support the preliminary scheme demonstration of the ducted fan flying car. The proposed method uses blade element theory to design the unducted fan and applies momentum theory to calculate the aerodynamic thrust of the DFLS. The DFLS of a 1:3 scale verifier for a flying car scheme was designed and evaluated using the proposed method and a numerical method, respectively. To validate the proposed method, a prototype of the scale DFLS was manufactured and tested, and the result was compared with those of the proposed theoretical method and the numerical method. This study demonstrates that while both the theoretical and numerical methods are capable of designing an unducted fan accurately, the theoretical method is simpler and faster. Compared to the DFLS test results, the theoretical method’s average difference is approximately 1.9%. When evaluating the DFLS, the accuracy of the numerical calculation is reduced, and the difference is greater than 30% at low power. The theoretical method presented in this paper can be used to improve the aerodynamic design and evaluation efficiency of the DFLS and to aid in the configuration evaluation of VTOLs equipped with ducted fans.
Retraction:Deep Learning-based Monocular Obstacle Avoidance for Unmanned Aerial Vehicle Navigation in Tree Plantations
Faster Region-based Convolutional Neural Network Approach
In recent years, Unmanned Aerial Vehicles (UAVs) are widely utilized in precision agriculture, such as tree plantations. Due to limited intelligence, these UAVs can only operate at high altitudes, leading to the use of expensive and heavy sensors for obtaining important health information of the plants. To fly at low altitudes, these UAVs must possess the capability of obstacle avoidance to prevent crashes. However, most current obstacle avoidance systems with active sensors are not applicable to small aerial vehicles due to the cost, weight, and power consumption constraints. To this end, this paper presents a novel approach to the autonomous navigation of a small UAV in tree plantations only using a single camera. As the monocular vision does not provide depth information, a machine learning model, Faster Region-based Convolutional Neural Network (Faster R-CNN), was trained for the tree trunk detection. A control strategy was implemented to avoid the collision with trees. The detection model uses image heights of detected trees to indicate their distances from the UAV and image widths between trees to find the widest obstacle-free space. The control strategy allows the UAV to navigate until any approaching obstacle is detected and to turn to the safest area before continuing its flight. This paper demonstrates the feasibility and performance of the proposed algorithms by carrying out 11 flight tests in real tree plantation environments at two different locations, one of which is a new place. All the successful results indicate that the proposed method is accurate and robust for autonomous navigation in tree plantations.
Optical flow-based control strategies have always inspired robotic scientists, especially those in the field of Micro Air Vehicles (MAVs), thanks to their computational efficiency and relative simplicity. A major problem is that the success of optical flow control is governed by the availability of distance estimates, while optical flow provides only the ratio of velocity to distance. Therefore, with only monocular visual information, the inherent nonlinearity of optical flow observables has imposed several challenges in the controller design. In this paper, we propose a newly formulated controller, Extended Incremental Nonlinear Dynamic Inversion (EINDI), to deal with nonlinearities in the system output, such as optical flow control problems. The proposed method unlocks the potential of its predecessor (INDI) in output feedback control by removing the common assumption of time-scale separation, allowing internal dynamics to exist, and requiring only the input and output measurements. Furthermore, the EINDI method has been implemented on an MAV and tested successfully for optical flow landing in a simulation and a real-world outdoor environment. Both simulation and flight test results show 1) good tracking performance of the EINDI control compared to the conventional feedback control, 2) smooth landing trajectories without any oscillation, and 3) fast adaptation of the EINDI control even for landings at different heights and desired setpoints.
Heuristic dynamic programming is a class of reinforcement learning, which has been introduced to aerospace engineering to solve nonlinear, optimal adaptive control problems. However, it requires an off-line learning stage to train a global system model to represent the system dynamics. This paper uses an incremental model in heuristic dynamic programming to improve the online learning ability, which is incremental model based heuristic dynamic programming. The trait of the online identification of the incremental model makes this method an option for fault-tolerant control and partially observable control problems. This study, therefore, also extends this method to deal with partial observability. The presented method has been validated on two different online tracking problems: missile fault-tolerant control with full-state measurements and also spacecraft attitude control disturbed with liquid sloshing under partially observable conditions. The results reveal that the proposed method outperforms the conventional heuristic dynamic programming method in fault-tolerant control tasks, deals with partial observability, and is robust to internal uncertainties and external disturbances.
The use of Reinforcement Learning (RL) methods in Adaptive Flight Control has been an active research field over the past few years. Controllers that autonomously learn by interacting with the surrounding environment are highly interesting to the aerospace domain due to their adaptive and reconfigurable properties which are directly connected to in-flight safety. Approximate Dynamic Programming (ADP) is a class of RL methods that is able to deal with more complex systems in an online way. The core idea is to provide an approximation to the cost function which is used to determine what action to take such that a certain goal is achieved. In incremental Approximate Dynamic Programming (iADP), a simple quadratic in the state approximation is used together with an incremental form of the nonlinear system. This novel class of controllers is applied to an F-16 aircraft model. A nonlinear rate control system based on full state feedback is first trained offline in order to achieve a baseline controller. Experiments with failures are then carried out in which an online adaptation takes place. Results show a good tracking performance both before and after failure. A simple angle of attack controller based on output feedback is also designed and implemented, showing also a satisfactory tracking performance.
Autonomous guidance and navigation problems often have high-dimensional spaces, multiple objectives, and consequently a large number of states and actions, which is known as the ‘curse of dimensionality’. Furthermore, systems often have partial observability instead of a perfect perception of their environment. Recent research has sought to deal with these problems by using Hierarchical Reinforcement Learning, which often uses same or similar reinforcement learning methods within one application so that multiple objectives can be combined. However, there is not a single learning method that can benefit all targets. To acquire optimal decision-making most efficiently, this paper proposes a hybrid Hierarchical Reinforcement Learning method consisting of several levels, where each level uses various methods to optimize the learning with different types of information and objectives. An algorithm is provided using the proposed method and applied to an online guidance and navigation task. The navigation environments are complex, partially observable, and a priori unknown. Simulation results indicate that the proposed hybrid Hierarchical Reinforcement Learning method, compared to flat or non-hybrid methods, can help to accelerate learning, to alleviate the ‘curse of dimensionality’ in complex decision-making tasks. In addition, the mixture of relative micro states and absolute macro states can help to reduce the uncertainty or ambiguity at high levels, to transfer the learned results within and across tasks efficiently, and to apply to non-stationary environments. This proposed method can yield a hierarchical optimal policy for autonomous guidance and navigation without a priori knowledge of the system or the environment.
Approximate dynamic programming is a class of reinforcement learning, which solves adaptive, optimal control problems and tackles the curse of dimensionality with function approximators. Within this category, linear approximate dynamic programming provides a model-free control method by systematically using a quadratic cost-to-go function. Although efficient, linear approximate dynamic programming methods are difficult to apply to nonlinear systems or time-varying systems. To overcome the above limitations, this paper proposes an adaptive nonlinear tracking control method based on incremental approximate dynamic programming, which combines the advantages of linear approximate dynamic programming and incremental nonlinear control techniques. This is a model-free method for unknown, nonlinear systems and time-varying references. The trait of separating the local model information from the cost function approximation makes this method an option for partially observable control problems. This paper, therefore, proposes two reference tracking controllers for different observability conditions: the direct measurement of the full state, and the partially observable tracking error. In each condition, two algorithms are developed for off-line learning and online learning, respectively. These algorithms are applied to attitude control of a spacecraft disturbed by internal liquid sloshing. The results demonstrate that the proposed algorithms accurately deal with the unknown, time-varying internal dynamics while retaining efficient control, even with only partial observability.
This paper presents an adaptive control technique to deal with spacecraft attitude tracking and disturbance rejection problems in the presence of model uncertainties. Approximate dynamic programming has been proposed to solve adaptive, optimal control problems without using accurate systems models. Within this category, linear approximate dynamic programming systematically utilizes a quadratic cost-to-go function and simplifies the design process. Although modelfree and efficient, linear approximate dynamic programming methods are difficult to apply to nonlinear systems or timevarying systems, such as attitude control of spacecraft disturbed by internal liquid sloshing. To deal with this problem, this paper develops a model-free nonlinear self-learning attitude control method based on incremental Approximate Dynamic Programming to enhance the performance of the spacecraft attitude control system. This method combines the advantages of linear approximate dynamic programming and the incremental nonlinear control techniques, and generates a model-free controller for unknown, time-varying dynamical systems. In this paper, two reference tracking algorithms are developed for off-line learning and online learning, respectively. These algorithms are applied to the attitude control of a spacecraft disturbed by internal liquid sloshing. The results demonstrate that the proposed method deals with the unknown, timevarying internal dynamics adaptively while retaining accurate and efficient attitude control.
A self-learning controller which makes quick and successful adaptations to new conditions can considerably benefit autonomous operations of launch vehicles. To provide a model-free, adaptive process for optimal control, approximate dynamic programming has been introduced to aerospace engineering. A widely used structure of approximate dynamic programming for nonlinear systems is heuristic dynamic programming. This paper proposes a new method using incremental models in heuristic dynamic programming to improve the online learning capacity. This method generates an adaptive near-optimal controller online without a priori knowledge of the system dynamics or off-line learning of the system model. A comparison is made between the conventional heuristic dynamic programming algorithm and the incremental model based heuristic dynamic programming algorithm by applying them to an online flight control problem with an unknown nonlinear model. The results demonstrate that the incremental model based heuristic dynamic programming method accelerates online learning, improves the precision, and can deal with a wider range of initial states compared to the conventional heuristic dynamic programming method.