On Learning for Node Selection in the Branch-and-Bound Algorithm using Reinforcement Learning
J.J. Groenheide (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Yorke-Smith – Mentor (TU Delft - Algorithmics)
D. de Laat – Graduation committee member (TU Delft - Discrete Mathematics and Optimization)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The branch-and-bound algorithm is used by solvers to efficiently find the optimal solution of discrete optimisation problems. It does so by sequentially partitioning parts of the search space based on the solution to the linear relaxation of the problem. This sequential decision-making is performed by the variable selection and node selection heuristics. The sequential nature of these heuristics makes them suitable for trajectory-based learning in the form of imitation learning and reinforcement learning. Learning problem-specific heuristics in this way has become increasingly popular in recent years. Despite their similarities, the two heuristics have very different dynamics during learning, and success has mainly been achieved for variable selection. In this work, we evaluate the node selection problem and formulate a learning to select paradigm for both imitation and reinforcement learning. We find that learning to select is generally more difficult due to the small margin of possible improvement over the current baselines, and the lack of informative features to distinguish nodes during ranking. These challenges are exacerbated by focusing on sibling comparisons, which are generally the most difficult due to the high similarity between the nodes. Sibling comparisons are also arguably the most important in node selection, however, due to the importance of plunging to reduce context switching overhead. The results indicate that both approaches fail to learn meaningful decision-making policies based on the limited fixed-size feature representation of the nodes. A code repository for reproducing and extending the experiments is publicly available at https://github.com/jgroenheide/rl2select.