Deep reinforcement learning approach to solving clustered vehicle routing problems

Journal Article (2026)
Author(s)

Yaoxin Wu (Eindhoven University of Technology)

Yue Yu (Chengdu University of Information Technology)

Lingxiao Wu (The Hong Kong Polytechnic University)

Tao Feng (Southwest Jiaotong University)

Lu Zhang (Chengdu University of Information Technology)

Zhenkun Wang (Southern University of Science and Technology )

Jie Gao (TU Delft - Civil Engineering & Geosciences)

Research Group
Transport, Mobility and Logistics
DOI related publication
https://doi.org/10.1016/j.tre.2026.104742 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Transport, Mobility and Logistics
Journal title
Transportation Research Part E: Logistics and Transportation Review
Volume number
209
Article number
104742
Downloads counter
40
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Clustered vehicle routing problems (CluVRPs) represent a complex class of combinatorial optimization problems with significant real-world relevance. They extend classic VRPs by introducing pre-specified customer clusters and requiring effective routing both between clusters and within each cluster. While numerous deep learning approaches have been developed to address the standard VRP, research on CluVRPs remains relatively limited, presenting opportunities and challenges for advancing solutions to more practical VRPs with cluster-related constraints. This paper offers a deep reinforcement learning (DRL) approach to solving CluVRPs. We propose a cluster-aware attention module in the encoder, along with inter-cluster and intra-cluster decoders to specialize the constructive policies within and between clusters. Symmetrical data augmentation is adopted in the training to improve the performance. Empirical results in different CluVRP variants manifest that the DRL method outperforms existing approaches, consistently offering advantages for various instances.