Reinforcement Learning for minimizing the total waiting time of passengers in a Taxi Dispatch Problem

More Info
expand_more

Abstract

A Taxi Dispatch Problem involves assigning taxis to requests of passengers who are waiting at different locations for a trip. In today's economy and society, the Taxi Dispatch Problem and other transport problems can be found everywhere. Not only in transporting people, but also in food delivery from restaurants and package delivery for all kind of companies. Even though the applications are different, they still have something in common: serving as much as requests as possible, because that means the highest income. In this thesis, we consider the problem in the actual taxi field. A taxi driver often chooses to serve the passenger that is closest, because he makes no money while the taxi is vacant. However, for companies such as Uber, this is probably not the best solution. They have an overview of the locations of the taxis and passengers, and therefore, are able to make an optimal assignment between the taxis and requests. Sometimes, waiting a little longer for new requests leads to an even better solution. Trying to optimize the problem for the long-run and predict where passengers appear and where taxis end up is perfectly suited for Reinforcement Learning (RL), a subfield of Machine Learning. To be able to solve an optimization problem such as a Taxi Dispatch Problem, there needs to be a goal. For a company, this is maximizing the income or profit, and a popular way to do that is by minimizing the travel time. This thesis takes a different approach by looking at the problem from the passenger's perspective, as satisfied passengers lead to more passengers. In this thesis, the goal is to find an optimal policy for assigning taxis to passengers such that the total waiting time over all passengers is minimized, by using Reinforcement Learning. In order to do that, we formulate the problem in terms of the elements of an RL problem, with the RL method Q-Learning as the learning algorithm and ϵ-Greedy as policy. Together with some restrictions and assumptions, we implement this in Java and use this program to make the agent learn and generate results, where the agent is the one that is responsible for assigning passengers to taxis and needs to learn how to make this assignment such that the total waiting time is minimized. We use the program to find the optimal policy.