| 1 |
|
Guaranteed globally optimal continuous reinforcement learning
Self-learning controllers offer various strong benefits over conventional controllers, the most important one being their ability to adapt to unexpected circumstances. Their application is however limited, the most important reason being that, for self-learning controllers to work on continuous domains, nonlinear function approximators are required, and as soon as nonlinear function approximators are involved, it is uncertain whether convergence will occur.
This project has as goal to contribute towards achieving convergence guarantees. The first focus lies on using reinforcement learning, combined with neural network function approximation, to create self-learning controllers. A reinforcement learning controller architecture has been set up which is capable of controlling systems with continuous states and actions. Also an extension has been made that enables the controller to freely vary its timestep without any significant consequences.
A literature research has shown that there are no convergence proofs yet of practically feasible reinforcement learning controllers with nonlinear function approximators. Several proofs of convergence for the case of linear function approximators have been provided. Also, a proof exists that a certain reinforcement learning controller algorithm with nonlinear function approximation converges. However, the corresponding learning algorithm requires an infinite amount of iterations for the value function to converge even before the policy is updated, making it practically unfeasible.
The reasons why convergence often does not occur for reinforcement learning controllers with neural network function approximators include overestimated value functions, incorrect generalization, function approximators incapable of approximating the actual value function and error amplification due to bootstrapping. Furthermore, when convergence does occur, it’s not necessarily to a global optimum. Ideas to solve these issues have been offered, but it is still far from certain whether these ideas will work. Furthermore, they will make the algorithm very complex. Proving that the resulting algorithm will converge, even if it’s only convergence to a local optimum, will prove to be extremely difficult, if it is at all possible. Hence reinforcement learning controllers with neural network function approximators do not seem to be appropriate if convergence to the global optimum needs to be ensured.
The literature research has also revealed that so far no attempt has been made in combining reinforcement learning with interval analysis. To investigate the possibilities of combining these two fields, the interval Q-learning algorithm has been designed. This algorithm combines a discrete version of Q-learning with interval analysis techniques. This algorithm has proven convergence for the discrete case.
Subsequently, the discrete interval Q-learning algorithm has been expanded to the continuous domain. This was done using the main RL value assumption, which assumes that the derivatives of the value function with respect to every state parameter and action parameter have known bounds. The resulting continuous interval Q-learning algorithm was shown to have proven convergence to the optimal value function. Furthermore, bounds on how fast the algorithm converges were given.
The most important downside of the first version of the continuous interval Q-learning algorithm was its slow run-time. This made the algorithm practically infeasible. To still meet the goals of the project, a different function approximator was designed. This function approximator used many small blocks to bound the optimal value function in every part of the state-action-space. Furthermore, the number of blocks was increased dynamically as the algorithm learned, thus giving the function approximator a theoretically unlimited accuracy. Though the resulting algorithm used information slightly less efficiently than its precursor, its run-time was significantly improved. It became practically feasible to apply this algorithm.
In the end of the report, the algorithm has been applied to a few simple test problems. An important parameter here was the dimension D of the problem, which equals the sum of the number of state and action parameters. For two- and three-dimensional problems, the algorithm was able to sufficiently bound the optimal value function quite quickly, resulting in a controller with satisfactory performance. For the cart and pendulum system, which is a five-dimensional problem, this turned out to be different. A long training (in the order of several hours or more) will be required before a satisfactory performance can be obtained. However, since the algorithm has proven convergence to the globally optimal policy, this does not necessarily have to be a big problem.
Finally, it is mentioned that this thesis report has introduced the world’s first combination of reinforcement learning and interval analysis. It also introduced the world’s first practically feasible continuous RL controller with proven convergence to the global optimum. The key to accomplishing such a controller turned out to be (A) letting go of conventional ways of designing continuous RL controllers, and (B) quantifying the assumption that the value Q of two nearby states are similar through the main RL value assumption.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 2 |
|
Close formation flight control - with applications in commercial aviation
Ever since the beginning of manned flight, engineers have been inspired by nature in many ways. Birds have been flying in formation for as long as man remember and military aircraft followed suit. In the light of rising fuel prices, crowded airspace and stringent environmental regulations, formation flight could just be another leap forward in commercial aviation as well. This study aims to provide an insight in the implications and prerequisites of close formation flight in commercial aviation.
In this research, a wake vortex model for the Boeing 747 commercial jet is developed. The optimum separation between aircraft in a formation is determined. The location of the sweet spot and the drag reduction that are determined with the wake vortex model, are in good agreement with previous research. The location of the sweet spot can however differ from off-line predictions due to e.g. wind and simplifications in the model. It is found that the classic autopilot of the B-747 is not suitable for formation flight. A new autopilot is therefore developed and evaluated.
The new autopilot is based on a multiple loop architecture and is developed for a linearized model of the B-747 in cruise condition. It is shown that both in the linearized and the nonlinear model, the controller design complies with regulations and the aircraft successfully tracks the sweet spot under the influence of wake vortex effects. To circumvent the problem of an uncertain sweet spot location, the autopilot is extended with extremum seeking capability. By estimation and maximization of the induced angle of attack, the aircraft automatically finds and tracks the location in which maximum drag reduction is achieved.
For applications in commercial aviation, a number of practical issues is finally considered. Different strategies for sweet spot approach, formation flight in a turbulent atmosphere and multiple aircraft formations are considered and it is investigated how passenger comfort is affected. A considerable mean thrust reduction is achieved with the new autopilot set to extremum seeking, without causing disturbances to the passengers.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 3 |
|
Advanced Flight Control Design and Evaluation: An application of time delayed Incremental Backstepping
The sensor-based approach of Incremental Backstepping is applied to flight control law design in this research project. It allows the usage of the same control law on different types of aircraft without the need for redesign.
Apart from full state availability, the derivation of Incremental Backstepping assumes instantaneous control action. Due to actuator lags and delays, the implementation of control commands cannot necessarily be considered instantaneous. This mitigates the stability guarantee provided by Lyapunov theory. Therefore, a novel technique to estimate the time delay margins of the Incremental Backstepping controlled systems is proposed in the thesis. This provides an important stability measure for possible certification and widens the application range of Incremental Backstepping.
This simple, yet effective, Lyapunov-based control technique shows positive robustness properties with respect to model uncertainties, unknown parameters, external disturbances and time delay effects. It is applied to the DA 42 aircraft as a (pilot-in-the-loop) rate controller in the scope of this thesis. The implementation requires measurements of the aircrafts angular accelerations and control surface deflections. If the latter is not available, it is shown that filters can still be used in the control system. However, the usage of filters mitigates the highly favorable robustness properties of the closed-loop system.
Moreover, a controller evaluation strategy is proposed. It rates the performance and stability properties of the Incremental Backstepping controlled system in terms of the flight control system requirements. Evaluation of the Incremental Backstepping controller shows allowable input multiplicative uncertainties of up to 40% of the nominal value at the worst-case excitation frequency for a controller update rate of 100Hz. When no reference shaping is applied, the handling qualities of the incremental rate controller show to be less desirable than that of a conventional linear controller designed specifically for the DA 42. However, it is possible to improve handling characteristics by reference shaping. Furthermore, the handling characteristics of the incremental controller remain fairly constant along the flight envelope and in adverse flight conditions.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 4 |
|
Robust Nonlinear Spacecraft Attitude Control: an Incremental Backstepping Approach
In order to meet requirements in terms of robustness, stability, and performance for future generations of advanced attitude control systems, a sensor-based approach using Incremental Backstepping control is developed and proposed in this thesis.
Assuming full state availability and fast control action, the resulting time-scale separation between the state of the system and the state of the controller allows to consider an incremental form of the attitude dynamics, where backstepping controllers can be designed to achieve stability and convergence with incremental inputs. This results in integral-control action where information of angular acceleration and actuator output measurements is required.
The robustness and the full potential of Incremental Backstepping are evidenced in face of external disturbances, uncertainties, and unknown parameters. External disturbances are well suppressed in contrast with conventional backstepping and Lyapunov-based (non)linear controllers. Furthermore, the attitude stabilization results to be insensitive to parametric uncertainties and robust against model uncertainties. However, this comes at the expense of higher control effort. Moreover, with the influence of model and parametric uncertainties the resulting closed-loop dynamic performance can be better accounted for by studying the convergence and stability properties in terms of Lyapunov theory.
This methodology results in a simple, yet effective, family of robust nonlinear attitude controllers which aims to meet demanding requirements in terms of robustness, stability and performance, which in turn, close the gap towards the development of future advanced attitude control systems.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 5 |
|
Nonlinear Flight Control: Fault Tolerant Control with Sliding Modes and Control Allocation
Nonlinear Dynamic Inversion (NDI) is a promising method for Fault Tolerant Flight Control. The NDI algorithm cancels out the aircraft dynamics based on a dynamic aircraft model such that the closed loop system behaves linearly. The aircraft model is estimated online, which allows it to accommodate changes in the aircraft configurations and failures. It is important that an accurate dynamic aircraft model is provided in order to minimise the parasitic dynamics of imperfect dynamic inversion.
Sliding Mode Control (SMC) feedback is applied to increase the robustness of the NDI algorithm especially in the case of a failure. SMC is well known for its strong robustness properties and controls the system using brute force. 1st order SMC is a discontinuous control algorithm and the chattering behaviour is highly undesired in practical applications. Therefore 2nd order SMC is applied on the critical rate control loop. This algorithm is less sensitive to noise and the control signals are continuous.
It is shown that SMC can accommodate large uncertainties originating from control allocation error. Constraining the control solution minimises the parasitic dynamics and reduces the load on the feedback controllers. The Control Allocation problem is written as a low complexity linear program. For load balancing purposes the control solution is shaped with the Pseudo-Inverse.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 6 |
|
An optimal control approach for estimating aircraft command margins
This dissertation presents an optimal control framework to determine a collection of open-loop command signals, that mathematically guarantees operation of a dynamical system within prescribed state constraints. The framework is applied to estimate real-time command margins for aircraft control systems so that, safe operation within the flight envelope can be assured under appropriate control action. The margins are perceived as useful information to a pilot, especially during off-nominal conditions, as it can aid the pilot in avoiding flight envelope excursions, generally considered as causal factors to Loss-Of-Control incidents in aviation.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 7 |
|
Interval analysis applied to re-entry flight trajectory optimization
Trajectory optimization is an essential part of space plane mission design. One important
aspect of trajectory optimization for re-entry vehicles is to minimize the total heat load at the
surface when it returns and the heat flux should remain below certain limit, meanwhile, the
vehicle should land at the desired point. The methods used for re-entry trajectory
optimization is quite successful by now. However, if the model is non-linear, such as the reentry
vehicle, by using the classical optimization method, we can only find the local
minimum and the global minimum is never guaranteed. An innovative way of finding the
global minimum heat load for the trajectory design is introduced, namely the interval
analysis for global optimization.
In this thesis, the basic concept of the interval arithmetic is introduced. The main idea of the
interval arithmetic is to use small intervals for the calculation instead of numbers. As the
interval algorithm has a characteristics to check all the numbers within the interval and
contain all the feasible solutions, guaranteed global optimum can be found eventually.
In this report, interval method is used in both static global optimization and dynamic global
optimization problem. The application to interval analysis to static optimization problem is
very successful. However, although the application to interval analysis to dynamic system
can successfully find the global optimum, the interval global optimization method still suffer
greatly for the dependency problem, the wrapping effect, and huge number of feasible
solutions.
We apply the interval algorithm to find a guaranteed global minimum total heat load for reentry
flight trajectory design, find the difficulties and give recommendations for
improvements.
This thesis serves as a feasibility study using interval analysis for non-linear trajectory
optimization of re-entry vehicles.
|
 file embargo until: 2016-06-01
[Abstract]
|
| 8 |
|
Terminal area energy management trajectory optimization using interval analysis
|
 file embargo until: 2016-06-01
|