CBMoS

Combinatorial Bandit Learning for Mode Selection and Resource Allocation in D2D Systems

Journal Article (2019)
Author(s)

Andrea Ortiz (Technische Universität Darmstadt)

Arash Asadi (Technische Universität Darmstadt)

Max Engelhardt (Vector Informatik GmbH, Technische Universität Darmstadt)

Anja Klein (Technische Universität Darmstadt)

Matthias Hollick (Technische Universität Darmstadt)

DOI related publication
https://doi.org/10.1109/JSAC.2019.2933764 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Journal title
IEEE Journal on Selected Areas in Communications
Issue number
10
Volume number
37
Article number
8790776
Pages (from-to)
2225-2238
Downloads counter
125

Abstract

The complexity of the mode selection and resource allocation (MSRA) problem has hampered the commercialization progress of Device-to-Device (D2D) communication in 5G networks. Furthermore, the combinatorial nature of MSRA has forced the majority of existing proposals to focus on constrained scenarios or offline solutions to contain the size of the problem. Given the real-time constraints in actual deployments, a reduction in computational complexity is necessary. Adaptability is another key requirement for mobile networks that are exposed to constant changes such as channel quality fluctuations and mobility. In this article, we propose an online learning technique (i.e., CBMoS) which leverages combinatorial multi-armed bandits (CMAB) to tackle the combinatorial nature of MSRA. Furthermore, our two-stage CMAB design results in a tight model, which eliminates the theoretically feasible but practicality invalid options from the solution space. We prototype the first SDR-based D2D testbed to verify the performance of CBMoS under real-world conditions. The simulations confirm that the fast learning speed of CBMoS leads to outperforming the benchmark schemes by up to 132%. In experiments, CBMoS exhibits even higher performance (up to 142%) than in the simulations. This stems from the adaptability/fast learning speed of CBMoS in presence of high channel dynamics which cannot be captured via statistical channel models used in the simulators.