Performance of Decision Transformer in multi-task offline reinforcement learning
How does the introduction of sub-optimal data affect the performance of the model?
P.Z. Bieszczad (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.T.J. Spaan – Mentor (TU Delft - Sequential Decision Making)
M.R. Weltevrede – Mentor (TU Delft - Sequential Decision Making)
E. Congeduti – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
In the field of Artificial Intelligence (AI), techniques like Reinforcement Learning (RL) and Decision Transformer (DT) are utilized by machines to learn from experiences and solve problems. The distinction between offline and online learning determines whether the machine learns from a live environment or simply observes pre-recorded actions. The difference between single-task and multi-task settings indicates whether the machine can handle similar but not identical tasks. Multi-task, offline learning, the focus of this paper, allows machines to address a variety of related tasks, based on a pre-recorded set of experiences. This approach is particularly valuable in situations where traditional training methods are costly or challenging. For instance, in robotics, multi-task, offline learning enables robots to use experiences from various tasks, such as picking up objects, to solve new problems like placing them down. This research paper explores the effectiveness of Decision Transformers in multi-task environments through theoretical discussions and practical examples. It also tries to answer the question, of how introducing sub-optimal training data, affects the performance or generalisation ability of the model.