Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers

Bachelor thesis (2022)

Authors

A. Agrawal Electrical Engineering, Mathematics and Computer Science

Contributors

B.M. Renting Interactive Intelligence - (mentor)

P.K. Murukannaiah Interactive Intelligence - (mentor)

X. Zhang Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:924499b9-0edd-448b-a89b-989e36a6657e

Published Date

23-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

With the prospects of decentralized multi-agent systems becoming more prevalent in daily life, automated negotiation agents have made their place in these collaborative settings. They are an approach to promote communication between the agents in reaching solutions that are better for all involved.

Recent literature has shown great potential in using machine learning, particularly model-free deep reinforcement learning like Proximal Policy Optimization (PPO), to develop more performant automated negotiation strategies. This work focuses on using information from the opponent's sequence of offers in a bilateral negotiation to further improve a baseline PPO agent. This involves extracting and representing information from the opponent's sequence of offers into a state vector with a fixed dimension to modify the input to the agent's policy, and then comparing the utilities this modified agent achieves to the baseline PPO agent. Since there is a large variety of numerical measures to represent a sequence of offers, an ablation study is conducted to investigate the effectiveness of each.

The modified agents consistently reached solutions that had higher social welfare, although the agent's own utility did not improve or diminish significantly in comparison to the base PPO agent.

Files

Arpit_Agrawal_Research_Project... (.pdf)

(.pdf | 0.492 Mb)