Learning Human Preferences for Physical Human-Robot Cooperation

Doctoral thesis (2024)

Authors

L.F. van der Spaa Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering

Research Group

Learning & Autonomous Control (Mechanical, Maritime and Materials Engineering) (TU Delft)

DOI: https://doi.org/10.4233/uuid:90f0c7fe-34db-45f3-bd2b-7fec91075d20

Inverse Reinforcement Learning Physical Human-Robot Interaction Human-Robot Collaboration Human preferences Human-centered planning

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:90f0c7fe-34db-45f3-bd2b-7fec91075d20

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical, Maritime and Materials Engineering

Department

Cognitive Robotics

Research Group

Learning & Autonomous Control

Abstract

Physical human-robot cooperation (pHRC) has the potential to combine human and robot strengths in a team that can achieve more than a human and a robot working on the task separately. However, how much of the potential can be realized depends on the quality of cooperation, in which awarenes of the partner’s intention and preferences plays an important role. Preferences tend to be highly personal, and additionally depend on the cooperation partner and the cooperation itself. They can be hard to define in terms a robot would understand, and may change over time. This thesis focuses on learning ‘useful models’ from observed behavior, to let our robot adapt its behavior to better match its human partner’s preferences, and thus improve the cooperation.
The aim is to capture personalized approximate models of human preferences –how a person likes to do something– from very few interactive observations, providing only small amounts of imprecise data, such that the robot can use the model to improve each user’s comfort. First, we learn a model to predict and optimize the human ergonomics in a pHRC task, such that our robot can ropose a plan, for both the human and itself, to solve the task in a way that is more ergonomic for its human partner. However, people do not necessarily prefer to act ergonomically, nor do we want to impose on them what a robot thinks best. Therefore, next, we apply inverse reinforcement learning (IRL), to capture less restrictive preference models: 1) path and velocity preferences for motion planning, and 2) on a higher level of abstraction, which (grasp or motion) action to initiate for proactive physical support. For learning to take the correct action in cooperation, we developed the disagreement-aware variable impedance (DAVI) controller to smoothly transition between providing active guidance and allowing the human to demonstrate alternative behavior.....

Files

LFvdSpaa_Dissertation.pdf

(.pdf | 154 Mb)