No-Regret Learning in Network Stochastic Zero-Sum Games
Shijie Huang (TU Delft - Mechanical Engineering)
Jinlong Lei (Tongji University)
Yiguang Hong (Tongji University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
No-regret learning has been widely used to compute a Nash equilibrium in two-person zero-sum games. However, there is still a lack of regret analysis for network stochastic zero-sum games, where players competing in two subnetworks only have access to some local information, and the cost functions are subject to stochastic uncertainty. Such a game model can be found in network interdiction problems, when a group of inspectors work together to detect a group of evaders. In this paper, the authors propose a distributed stochastic mirror descent (D-SMD) method, and establish the regret bounds O(T) and O(log T) in the expected sense for convex-concave and strongly convex-strongly concave costs, respectively. The proposed bounds match those of the best known first-order online optimization algorithms. The authors then prove the convergence of the time-averaged iterates of D-SMD to the set of Nash equilibria. Finally, the authors show that the actual iterates of D-SMD almost surely converge to the Nash equilibrium in the strictly convex-strictly concave setting.