A timely match for ride-hailing and ride-pooling services using a deep reinforcement learning approach

Journal Article (2026)
Author(s)

Yiman Bao (Student TU Delft)

Jie Gao (TU Delft - Transport, Mobility and Logistics)

Jinke He (TU Delft - Sequential Decision Making)

Frans A. Oliehoek (TU Delft - Sequential Decision Making)

Oded Cats (TU Delft - Transport and Planning)

DOI related publication
https://doi.org/10.1016/j.trc.2026.105644 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Journal title
Transportation Research Part C: Emerging Technologies
Volume number
187
Article number
105644
Downloads counter
13
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Efficient matching in ride-hailing and ride-pooling services depends not only on how matches are constructed, but also on when the platform triggers a matching operation. Many systems use batched matching with a fixed time interval to accumulate requests before matching, which increases the candidate set but cannot adapt to real time supply-demand fluctuations and may induce unnecessary waiting. This paper proposes a reinforcement learning approach that learns when to trigger matching based on current system conditions. We formulate the timing problem as a finite-horizon Markov decision process and train the policy using the Proximal Policy Optimization algorithm. To address sparse and delayed feedback, we introduce a finite-horizon, potential-based reward shaping scheme that preserves the optimal policy while densifying the learning signal; the same framework applies to both ride-hailing and ride-pooling, where detour delay is incorporated into the reward for pooling. Using a data-driven simulator calibrated on NYC trip records, the learned policy adapts matching timing decisions to the current state of waiting requests and available drivers and outperforms fixed-interval, rule-based dynamic, and first-dispatch baselines. It reduces total waiting time by 3.1% in ride-hailing and 20.1% in ride-pooling, and detour delay by 36.1% in pooling, while maintaining short matching times.