On Game-Theoretic Planning with Unknown Opponents' Objectives

Master thesis (2023)

Authors

X. LIU Mechanical Engineering

Contributors

J. Alonso-Mora Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (supervisor 1)

L. Peters Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (supervisor 1)

L. Ferranti Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (coach)

L. Laurenti Delft Center for Systems and Control (coach)

Faculty

Mechanical Engineering

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:2b08061c-1777-4ce5-81e6-85ab261d0d92

Published Date

10-07-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Many autonomous navigation tasks require mobile robots to operate in dynamic environments involving interactions between agents. Developing interaction-aware motion planning algorithms that enable safe and intelligent interactions remains challenging. Dynamic game theory renders a powerful mathematical framework to model these interactions rigorously as coupled optimization problems. By solving the resultant coupled optimization problems to equilibrium solutions, the game-theoretic models explicitly account for the interdependence of agents’ decisions and achieve simultaneous prediction and planning. Coupled constraints between players, such as collision avoidance, can also be handled explicitly. However, most existing game-theoretic motion planning approaches rely on known objective models of all agents. This assumption presents a key obstacle to real-world ego-centric planning applications of these methods, where only local information is available. This thesis investigates solution approaches to relax this assumption and explicitly account for the ego agent’s uncertainty about other agents’ objectives while adaptively conducting game-theoretic motion planning.

The main contribution of this work is an online adaptive model-predictive game-play (MPGP) framework that jointly infers other players’ objectives and computes corresponding generalized Nash equilibrium (GNE) strategies. These strategies are then used as predictions for other players and control strategies for the ego agent. The adaptivity of the proposed approach is enabled by differentiating through a trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents’ objectives. Compared with existing objective inference solutions in dynamic games, the proposed approach handles general inequality constraints in games and further supports direct integration with other differentiable modules, such as neural networks (NNs). Two simulation experiments indicate that the proposed approach performs closely to solving games with known objectives and outperforms the game-theoretic and model-predictive control (MPC) baselines. Two hardware experiments further demonstrate the real-time planning capability of the planner and its real-world applicability.

In addition to this main contribution, the second contribution of this work is a variational autoencoder (VAE) pipeline built upon the proposed differentiable game solver. This contribution aims at going beyond the point estimation in the first contribution and inferring potentially multi-modal beliefs about players’ objectives based on observations. The main idea is to employ variational inference (VI) to approximate Bayesian inference of players’ objectives. The variational autoencoder (VAE) framework is utilized for amortization to avoid per-sample optimization. Initial results on a single-player example show that after training, the proposed pipeline can: (i) generate a game objective distribution that resembles the underlying training data distribution and (ii) accurately predict a narrow, uni-modal posterior objective distribution when the observation is unambiguous based on seen data in the past and (iii) generate a multi-modal belief distribution of player’s objective to capture mostly likely modes in case of high uncertainty.