M. Suau
Please Note
11 records found
1
Factored representations involve variables describing various features of the environment. These variables, along with their possible values, define the agent’s states. Unlike standard representations, factored representations provide a unique perspective that enables us to gain deeper insights into the underlying structure of the environment and refine our understanding of the problem at hand.
By analyzing variable dependencies, we can abstract simplified representations of the environment states and construct computationally lightweight models. To do so, we will explore potential factorizations of key functions governing the reinforcement learning problem, such as transitions, rewards, policies, or value functions. These factorizations can be achieved by exploiting variable redundancies and leveraging relations of conditional independence.
This thesis proposes a set of methods that are shown to improve the efficiency and scalability of reinforcement learning in complex scenarios. We hope that the findings of this research contribute to showcasing the potential of factored representations and serve as inspiration for future research in this direction.
...
Factored representations involve variables describing various features of the environment. These variables, along with their possible values, define the agent’s states. Unlike standard representations, factored representations provide a unique perspective that enables us to gain deeper insights into the underlying structure of the environment and refine our understanding of the problem at hand.
By analyzing variable dependencies, we can abstract simplified representations of the environment states and construct computationally lightweight models. To do so, we will explore potential factorizations of key functions governing the reinforcement learning problem, such as transitions, rewards, policies, or value functions. These factorizations can be achieved by exploiting variable redundancies and leveraging relations of conditional independence.
This thesis proposes a set of methods that are shown to improve the efficiency and scalability of reinforcement learning in complex scenarios. We hope that the findings of this research contribute to showcasing the potential of factored representations and serve as inspiration for future research in this direction.
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agents interact with a reduced portion of a larger environment while still being affected by the global dynamics. Our method combines the use of local simulators with learned models that mimic the influence of the global system. The experiments reveal that incorporating this idea into the deep RL workflow can considerably accelerate the training process and presents several opportunities for the future.
Influence-Augmented Local Simulators
A Scalable Solution for Fast Deep RL in Large Networked Systems
Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.