The very basis of this thesis is the collective behavior of ants in colonies. Ants are an excellent example of how rather simple behavior on a local level can lead to complex behavior on a global level that is beneficial for the individuals. The key in the self-organization of an
...
The very basis of this thesis is the collective behavior of ants in colonies. Ants are an excellent example of how rather simple behavior on a local level can lead to complex behavior on a global level that is beneficial for the individuals. The key in the self-organization of ants is communication through pheromones. When an ant forages for food, it is biased to search along trails of stronger pheromone concentrations. The moment it finds food, it will walk back to the nest while depositing pheromones and thereby contributing to the reinforcement of a successful trail. Inspired by this mechanism, research within an engineering context has led to the development of the field of Ant Colony Optimization (ACO). Specifically developed for efficiently solving combinatorial optimization problems, ACO has been successfully applied to routing in road traffic and Internet networks. In this thesis, we take the principles behind ACO to the domain of control policy learning. A control policy is a mapping from states to actions and our objective is to develop methods to learn the optimal control policy for a given dynamic system by interacting with it. We call our methods Ant Colony Learning (ACL) and their power lies in the fact that there is a set of ants, from which each ant interacts with the system and influences the other ants through updating pheromone levels associated with the visited state-action pairs. In experiments involving control problems that have a discrete state space and deterministic state transitions, it turns out that ACL converges quickly to the optimal solution. We also observe that increasing the number of ants in the algorithm results in a decrease of the number of trials required for convergence to the optimal policy. An analytical study of the convergence behavior of ACL reveals that for systems with discrete and noiseless state transitions, the expected policy converges to the optimal policy in the case of using only one ant. Another major part of this thesis deals with the application of ACL to control problems with continuous state spaces. In order to capture a continuous space with a finite number of elements, we study two ways of partitioning the state space and their incorporation in the ACL framework. In crisp ACL, the state space is partitioned using bins. Each state measurement is assigned to exactly one bin, which leads to the introduction of discretization noise, rendering an originally deterministic system non-deterministic and restricting the performance of the algorithm. We find that a better way of partitioning the state space is by using fuzzy triangular membership functions. The continuous state measurement then belongs to multiple membership functions to a certain degree. With fuzzy partitioning, the continuity of the state variables is preserved and no non-determinism is introduced. We call this method fuzzy ACL. The developed generalized ACL algorithm unifies both crisp and fuzzy ACL. The behavior and performance of crisp and fuzzy ACL are further studied using simulation experiments. We study the influence of the local and global pheromone decay rates, the number of ants, and the density of the state space partitioning grid on the learning performance. Especially, the performance of crisp ACL improves for a small local pheromone decay rate, while fuzzy ACL outperforms crisp ACL over the whole line. In general, crisp ACL is much more sensitive to the choice of the pheromone decay parameters than fuzzy ACL. We find that using more ants leads to faster convergence, but that the number of ants does not need to be extremely large to obtain a satisfactory performance. With regard to the scaling of ACL, crisp ACL reveals a slow, but gradual improvement of the learning for an increasing state space partitioning density. Fuzzy ACL, on the other hand, improves more rapidly and requires fewer ants to learn a better control policy. Finally, we present a general modeling framework for swarms of moving agents. It turns out that ACL fits within this framework and as such can be unified with other swarm intelligence techniques. In the future, this could result in beneficial integration of elements from other swarm intelligence techniques into ACL, or the other way around.