On Learning for Node Selection in the Branch-and-Bound Algorithm using Reinforcement Learning

None, None

On Learning for Node Selection in the Branch-and-Bound Algorithm using Reinforcement Learning

Master Thesis (2024)

Author(s)

J.J. Groenheide (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

D. de Laat – Graduation committee member (TU Delft - Discrete Mathematics and Optimization)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:3b312a4b-d67f-42bf-8eae-af14bdf72c1c

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

15-07-2024

Awarding Institution

Delft University of Technology

Programme

Applied Mathematics

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The branch-and-bound algorithm is used by solvers to efficiently find the optimal solution of discrete optimisation problems. It does so by sequentially partitioning parts of the search space based on the solution to the linear relaxation of the problem. This sequential decision-making is performed by the variable selection and node selection heuristics. The sequential nature of these heuristics makes them suitable for trajectory-based learning in the form of imitation learning and reinforcement learning. Learning problem-specific heuristics in this way has become increasingly popular in recent years. Despite their similarities, the two heuristics have very different dynamics during learning, and success has mainly been achieved for variable selection. In this work, we evaluate the node selection problem and formulate a learning to select paradigm for both imitation and reinforcement learning. We find that learning to select is generally more difficult due to the small margin of possible improvement over the current baselines, and the lack of informative features to distinguish nodes during ranking. These challenges are exacerbated by focusing on sibling comparisons, which are generally the most difficult due to the high similarity between the nodes. Sibling comparisons are also arguably the most important in node selection, however, due to the importance of plunging to reduce context switching overhead. The results indicate that both approaches fail to learn meaningful decision-making policies based on the limited fixed-size feature representation of the nodes. A code repository for reproducing and extending the experiments is publicly available at https://github.com/jgroenheide/rl2select.

Files

_FINAL_VERSION_Thesis_-_Jeroen... (pdf)

(pdf | 5.11 Mb)

License info not available