Deep Reinforcement Learning for Ride-hailing Systems

An experimental study on optimizing matching radius for ride-hailing systems using Deep Reinforcement Learning

More Info
expand_more

Abstract

In the field of public transportation, environmentally friendly and convenient transportation modes are the future trends. The ride-hailing services is an important component of them. However, current ride-hailing systems, particularly the matching systems, still have issues related to low system efficiency and bad user experience. Although existing ride-hailing rider-driver matching system can allocate travel demands and drivers to a certain extent, they still have deficiencies in certain scenarios. For example, they cannot ensure effective rider-driver matching during peak hours, or they cannot find a good balance between pick-up distance and matching rate. As Reinforcement learning (RL) has been proven in many studies to be applicable and effective in solving complex and dynamic optimization problems. This study aims to explore how Reinforcement Learning (RL) can be adapted to the ride-hailing matching system to optimize system efficiency and user experience through a dynamic matching radius policy. The research objective of this study is to simulate an actual ride-hailing system and use RL to train a policy. This policy can output an optimized dynamic matching radius in real-time based on real-time rider-driver demand-supply relationship, hence achieving a higher matching rate, a shorter average pick-up distance, and a higher driver utilization rate of the ride-hailing system.
Adapting Reinforcement Learning (RL) to optimize the ride-hailing system's matching radius has several difficulties and challenges due to the uncertainties in the real-world rider-hailing market. Traditional approaches are normally static, solving the matching problem at specific times through mathematical models. However, these methods often perform inconsistently when dealing with fluctuating ride-hailing supply-demand relationships, particularly during peak hours. On the other hand, the dynamics and complexity of the ride-hailing market and the ride-hailing environment also make it difficult to model the ride-hailing system. The ride-hailing market is easily affected by many variables, such as weather conditions and local traffic conditions. When quantitatively optimizing the matching radius of the ride-hailing matching system, it is critical to reasonably control irrelevant variables. To address these challenges, this study models the ride-hailing matching problem as a Markov Decision Process (MDP). Based on the defined MDP, a ride-hailing matching simulator is developed. Some assumptions and simplifications are also made to ensure high realism while reasonably controlling irrelevant variables and uncertainties. Multi-replay-buffer Deep Deterministic Policy Gradient (MDDPG) algorithm is then applied to handle the optimization problem of the ride-hailing matching radius. Through the interactions between the MDDPG agent and the developed simulator, feedback rewards are received for the agent to improve the policy. The proposed method is then validated in a case study showcasing the application of the developed simulator and the RL algorithm in a real-world scenario in Austin, Texas. The case study includes an analysis of the current ride-hailing market in Austin, how to apply the simulator based on it, the implementation details of the RL algorithm, and the resulting performance improvements. The results of the case study show that the actions obtained from the proposed method outperform all the baselines in multiple scenarios, highlighting the benefits of using Reinforcement Learning to improve ride-hailing efficiency and user experience.
To conclude, the optimization method proposed in this study applies an advanced Reinforcement Learning approach to the ride-hailing system, successfully improving overall efficiency and user experience. The results of this research demonstrate the potential of Reinforcement Learning in optimizing ride-hailing matching systems, offering a promising direction for further exploration. This study lays a solid foundation for future research to build upon, encouraging the development of more optimization methods with RL technologies that can enhance the effectiveness and adaptability of ride-hailing system in increasingly complex and dynamic environments.

Files

Deep_Reinforcement_Learning_fo... (pdf)
warning

File under embargo until 20-10-2024