A timely match for ride-hailing and ride-pooling services using a deep reinforcement learning approach

None, None; None, None; None, None; None, None; None, None

A timely match for ride-hailing and ride-pooling services using a deep reinforcement learning approach

Journal Article (2026)

Author(s)

Yiman Bao (Student TU Delft)

Jie Gao (TU Delft - Civil Engineering & Geosciences)

Jinke He (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Frans A. Oliehoek (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Oded Cats (TU Delft - Civil Engineering & Geosciences)

Research Group

Transport, Mobility and Logistics

Deep reinforcement learning Ride-pooling Ride-hailing Proximal policy optimization Matching timing Reward shaping

DOI related publication

https://doi.org/10.1016/j.trc.2026.105644 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:be42d8b1-ccad-4fa0-81df-3ac4c9af018a

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Transport, Mobility and Logistics

Journal title

Transportation Research Part C: Emerging Technologies

Volume number

187

Article number

105644

Downloads counter

36

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Efficient matching in ride-hailing and ride-pooling services depends not only on how matches are constructed, but also on when the platform triggers a matching operation. Many systems use batched matching with a fixed time interval to accumulate requests before matching, which increases the candidate set but cannot adapt to real time supply-demand fluctuations and may induce unnecessary waiting. This paper proposes a reinforcement learning approach that learns when to trigger matching based on current system conditions. We formulate the timing problem as a finite-horizon Markov decision process and train the policy using the Proximal Policy Optimization algorithm. To address sparse and delayed feedback, we introduce a finite-horizon, potential-based reward shaping scheme that preserves the optimal policy while densifying the learning signal; the same framework applies to both ride-hailing and ride-pooling, where detour delay is incorporated into the reward for pooling. Using a data-driven simulator calibrated on NYC trip records, the learned policy adapts matching timing decisions to the current state of waiting requests and available drivers and outperforms fixed-interval, rule-based dynamic, and first-dispatch baselines. It reduces total waiting time by 3.1% in ride-hailing and 20.1% in ride-pooling, and detour delay by 36.1% in pooling, while maintaining short matching times.

Files

1-s2.0-S0968090X26001324-main.... (pdf)

(pdf | 6.43 Mb)