AM
A. Menor de Oñate
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Exploring planetary bodies using robot swarms can potentially increase the value of the exploration missions; enabling the execution of novel measurements and explorations previously deemed impractical or unattainable. Despite its potential, the technology readiness level of planetary swarms is not very mature. This work uses multi-agent reinforcement learning to find control policies that allow swarms to autonomously explore unknown areas in a decentralized manner, contributing towards the technology readiness of the field. A multi-agent proximal policy optimization (MAPPO) algorithm is proposed for this end, where the policy uses LIDAR perception information, and the input of the value function contains local and global environment information. The algorithm finds control policies that achieve cooperation behaviors and generalize to unseen swarm sizes and environments learning with simple, sparse reward functions. Moreover, different types of reward functions, value inputs, and environment configurations are investigated. Compared with the state-of-the-art in the field, MAPPO can learn with a larger number of agents, more complicated environments, and using sparse rewards instead of dense ones.
...
Exploring planetary bodies using robot swarms can potentially increase the value of the exploration missions; enabling the execution of novel measurements and explorations previously deemed impractical or unattainable. Despite its potential, the technology readiness level of planetary swarms is not very mature. This work uses multi-agent reinforcement learning to find control policies that allow swarms to autonomously explore unknown areas in a decentralized manner, contributing towards the technology readiness of the field. A multi-agent proximal policy optimization (MAPPO) algorithm is proposed for this end, where the policy uses LIDAR perception information, and the input of the value function contains local and global environment information. The algorithm finds control policies that achieve cooperation behaviors and generalize to unseen swarm sizes and environments learning with simple, sparse reward functions. Moreover, different types of reward functions, value inputs, and environment configurations are investigated. Compared with the state-of-the-art in the field, MAPPO can learn with a larger number of agents, more complicated environments, and using sparse rewards instead of dense ones.
Project Healios
Unmanned Vertical Lift for Medical Equipment Distribution
Bachelor thesis
(2021)
-
Y.S. Chung, M.V.M. Firlefyn, P. Gonzalez Martinez, Y.M. Hinssen, D.S. Lukens Ruiz, A. Menor de Oñate, J.T.E. Rademaker, A. Simonelli, B. Szekeres, D.A. van Wagensveld, M.D. Pavel, R.N.H.W. van Gent, N.C. Gomes de Paula