Pacing regulation for runners

None, None

Pacing regulation for runners

Master Thesis (2022)

Author(s)

J.E. Molano Valencia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Frans Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Matthijs T.J. Spaan – Coach (TU Delft - Algorithmics)

S. Feld – Coach (TU Delft - Quantum Circuit Architectures and Technology)

Rolf Starre – Graduation committee member (TU Delft - Interactive Intelligence)

A. Nijs – Graduation committee member (Vrije Universiteit Amsterdam)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning Pace regulator Runners

To reference this document use:

https://resolver.tudelft.nl/uuid:f18f7b9c-bc5f-422e-93e3-e182983069f3

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

17-08-2022

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

By increasing the step frequency of the runners, it is possible to reduce the risk of injuries due to overload. Techniques like auditory pacing help the athletes to have better control over their step frequency. Nevertheless, synchronizing to a continuous external rhythm costs energy. For this reason, the use of intermittent pacing may be more energy-efficient and more user-friendly for the athlete. We propose using experimental data from previous studies, that analyzed the response of runners to intermittent pacing, to find the most efficient approach for providing the pacing. For this purpose we use reinforcement learning techniques to learn and train our target behavior. This behavior is represented as the target policy and the experimental data is assumed to be sampled using a stochastic sampling policy. However, using only a single batch of initial training data presents a problem due to the continuously increasing difference between the initial sampling policy and the target policy being learned. The use of a batch off-policy algorithm with a standard deviation correction (OPPOSD) presented in (Liu et al., 2019) is then proposed. This algorithm benefits from the advantages of the sampling efficiency characteristic of the off-policy approaches and also introduces a fixing term to tackle the mismatch between the policies. To train and evaluate the learned policies based on the algorithm, a pace behavior simulator was developed from the data of the experiments. A Markov Decision Problem (MDP) was defined on top of the simulator that determines the rules of the pacing environment that the algorithm is set to learn. After translating the experimental data into MDP-like transitions, the OPPOSD algorithm is able to learn a relatively good target policy for the pacing problem. For a future application, the resulting trained model could be deployed for real runners while still having a continuous improvement of the policy in an on-policy or off-policy approach.

Files

Juan_Molano_TU_Delft_Master_Th... (pdf)

(pdf | 4.39 Mb)

License info not available