Actor-critic reinforcement learning for bidding in bilateral negotiation

None, None; None, None

Actor-critic reinforcement learning for bidding in bilateral negotiation

Journal Article (2022)

Author(s)

Furkan Arslan (Özyeğin University)

Reyhan Aydoğan (TU Delft - Interactive Intelligence, Özyeğin University)

Research Group

Interactive Intelligence

Copyright

DOI related publication

https://doi.org/10.55730/1300-0632.3899

Multi-agent systems Imitation learning Deep reinforcement learning Automated bilateral negotiation Bidding strategy Entropy reinforcement learning

To reference this document use:

https://resolver.tudelft.nl/uuid:bee7c91f-916a-4756-b962-1fd995709341

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Interactive Intelligence

Issue number

5

Volume number

30

Pages (from-to)

1695-1714

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Designing an effective and intelligent bidding strategy is one of the most compelling research challenges in automated negotiation, where software agents negotiate with each other to find a mutual agreement when there is a conflict of interests. Instead of designing a hand-crafted decision-making module, this work proposes a novel bidding strategy adopting an actor-critic reinforcement learning approach, which learns what to offer in a bilateral negotiation. An entropy reinforcement learning framework called Soft Actor-Critic (SAC) is applied to the bidding problem, and a self-play approach is employed to train the model. Our model learns to produce the target utility of the coming offer based on previous offer exchanges and remaining time. Furthermore, an imitation learning approach called behavior cloning is adopted to speed up the learning process. Also, a novel reward function is introduced that does take not only the agent’s own utility but also the opponent’s utility at the end of the negotiation. The developed agent is empirically evaluated. Thus, a large number of negotiation sessions are run against a variety of opponents selected in different domains varying in size and opposition. The agent’s performance is compared with its opponents and the performance of the baseline agents negotiating with the same opponents. The empirical results show that our agent successfully negotiates against challenging opponents in different negotiation scenarios without requiring any former information about the opponent or domain in advance. Furthermore, it achieves better results than the baseline agents regarding the received utility at the end of the successful negotiations.

Files

Actor_critic_reinforcement_lea... (pdf)

(pdf | 0.947 Mb)