An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

None, None; None, None; None, None; None, None

An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

Journal Article (2023)

Author(s)

S. Avaei (Student TU Delft)

L.F. van der Spaa (Honda Research Institute Europe, TU Delft - Biomechatronics & Human-Machine Control)

Luka Peternel (TU Delft - Human-Robot Interaction)

Jens Kober (TU Delft - Learning & Autonomous Control)

Research Group

Biomechatronics & Human-Machine Control

Copyright

DOI related publication

https://doi.org/10.3390/robotics12020061

Human preferences Learning from demonstration Incremental inverse reinforcement learning Coactive learning Physical human–robot interaction

To reference this document use:

https://resolver.tudelft.nl/uuid:cfcf8ebd-8db4-4b5e-a174-6c0c16b21dfd

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Biomechatronics & Human-Machine Control

Issue number

2

Volume number

12

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.

Files

Robotics_12_00061_v2.pdf

(pdf | 3.38 Mb)