ED

E.S. Dam

info

Please Note

2 records found

Combatting Relative Overgeneralisation in Deep Independent Learners using Optimism and Similarity

Master thesis (2022) - E.S. Dam, J.W. Böhmer, M.T.J. Spaan, F.A. Oliehoek
Various pathologies can occur when independent learners are used in cooperative Multi-Agent Reinforcement Learning. One such pathology is Relative Overgeneralisation, which manifests when a suboptimal Nash Equilibrium in the joint action space of a problem is preferred over an optimal Equilibrium. Approaches exist to combat relative overgeneralisation in Q-Learning problems, yet many approaches do not scale well with the state space or joint action space, are hard to adapt or configure, or are not applicable in partially observable environments.

In this work, we introduce Deep Maximum Q-Learning (DMQL), a methodology combining Deep Recurrent Q-Networks [Hausknecht & Stone, 2015] and the optimistic assumption which can be found in Distributed Q-Learning [Lauer & Riedmiller, 2000]. DMQL is a maximum-based learning technique which can be scheduled to transition to an average-based learner (or any other arbitrary type of learner), which can utilise independent learners without communication. DMQL is designed to be relatively intuitive and easy to adapt and configure and is able to utilise notions of similarity to provide solutions in large and continuous state spaces.

DMQL clusters similar histories by mapping them to the same hash based on a subset of the information contained within them, such as the current observation, or other related available information sources, such as state information. Using these hashes, DMQL constructs a hash-action pseudo-maximum Q-value estimation dictionary which is updated at every gradient update step. A dictionary value degradation technique ensures stability by preventing overestimations from being retained in the dictionary by decaying them after they have been encountered. This way, optimism is introduced, and relative overgeneralisation is prevented without using true maximums of past Q-value estimates, as these are not guaranteed to be indicative of the real optimal Q-values. Contrasting similar deep learning methodologies [Palmer et al., 2017], DMQL augments Deep Q-Network targets through value replacement instead of value discardment, potentially leading to improved efficiency. In addition, DMQL can be adapted to be utilised as a maximisation-based step in the greater learning process of other deep learning algorithms.

Our experimental results indicate that DMQL is a successful extension of Distributed Q-learning, which can be used in small environments even without the usage of similarity. Using similarity, however, grants us the ability to learn in increasingly large and complex environments. Interestingly, various problems exist within the process of developing a suitable manner of incorporating similarity into hashes. We speculate on how these problems can be prevented or circumvented, and our experiments validate our circumvention methods. Lastly, our experiments show that DMQL can successfully be applied to combat relative overgeneralisation in partially observable environments as well. ...
Wisdom of the crowds is the idea that groups of people can collectively make wise decisions. Research suggests that these crowds can even outsmart experts. To gather the wisdom of the crowds, this project utilizes a prediction market. To successfully gather the wisdom of the crowds, a predictionmarket has to overcome serious challenges, such as gathering a large and active user base, and deciding on a fair initialmarket value. The main goal of the project is to create a prediction market that can overcome these challenges and successfully gather the wisdom of the crowds. Research has been done in the field of prediction markets. This process started with researching the theory behind prediction markets, the wisdom of the crowds. After that evaluating existing prediction markets and reviewing literature related to those markets was useful. Before and during the research phase, clear goals were set for the project, together with a clear set of requirements. These goals can be divided into: leveraging the wisdomof the crowd, solving problems associated with predictionmarkets and developing a product that is easily maintainable. The final product reaches the goals of the project and meets the requirements. The prediction market correctly aggregates the estimations of users on the market, and provides probabilities on real-world events. These probabilities are contained in the values on the market. The prediction markets solves the problems encountered on other prediction markets. The project makes use of gamification, an automated marketmaker and a reward system to correctly initialise market values. The system was thoroughly tested and developed with maintainability in mind. ...