Exploring the enhancement of predictive accuracy for minority classes in travel mode choice models

None, None

Exploring the enhancement of predictive accuracy for minority classes in travel mode choice models

Master Thesis (2024)

Author(s)

A. Panagiotidou (TU Delft - Technology, Policy and Management)

Contributor(s)

S. van Cranenburgh – Mentor (TU Delft - Technology, Policy and Management)

T. Verma – Graduation committee member (TU Delft - Technology, Policy and Management)

Gabriel Nova – Coach (TU Delft - Technology, Policy and Management)

Kingsley Adjenughwure – Coach (TNO)

Faculty

Technology, Policy and Management

To reference this document use

https://resolver.tudelft.nl/uuid:1df779ae-acf8-44a4-9a3e-714d95669deb

More Info

expand_more

Publication Year

2024

Language

English

Coordinates

52.2129919, 5.2793703

Graduation Date

15-01-2024

Awarding Institution

Delft University of Technology

Programme

Engineering and Policy Analysis

Faculty

Technology, Policy and Management

Downloads counter

316

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Transportation systems are pivotal in shaping the economic and social dynamics of contemporary societies, fostering connectivity and opportunities while reducing geographical distances. Despite these benefits, they also contribute to adverse effects such as emissions, congestion, and traffic fatalities. Effectively developing and maintaining transportation infrastructure and services that cater to evolving population needs and align with environmental goals requires accurate forecasting of travel demand. However, due to inherent uncertainty in individuals' behavior and data limitations, forecasting this demand is a complex task.
A common limitation often encountered in transport datasets is class imbalance, with regard to the utilization of the different modes. Class imbalance in this context refers to the uneven distribution of samples among the various modes. Modes with a higher number of samples are termed majority modes, while those with fewer instances are labeled as minority modes. The existence of class imbalance within the dataset has the potential to compromise the performance of classifiers, especially for the minority modes, leading to inaccurate forecasts. This, in turn, may result in insufficient investments and provisions for these modes, ultimately having adverse consequences for the population segments that rely on them. Existing studies in the literature have either entirely overlooked or only partially addressed the impact of class imbalance. Recognizing the significance of precise demand predictions and acknowledging the identified gaps within the literature, the primary research question of this study revolves around systematically identifying and addressing the impact of class imbalance in mode choice forecasting.
To address the main question, a framework was proposed. This framework encompassed various aspects including a) the measurement of class imbalance within a dataset and the assessment of its impact on classification performance, b) the investigation of other challenging factors coexisting in imbalanced datasets, with a specific focus on class overlap, and c) the proper evaluation of classification performance across classes. As an integral part of this framework, the 'Performance Gap Metric’ was introduced - a metric employed to evaluate the difference in classification performance between the majority and minority classes. Establishing a threshold of 20%, favorable classifier performance was determined when this metric fell below the threshold, signifying the classifier’s equitable treatment of both minority and majority classes. Subsequently, this framework was applied using the ODiN data as a case study to predict mode choices in the Netherlands. Mode choices encompassed car, bike, and transit, with car representing the majority and transit the minority class. Two modeling techniques, namely Random Forest and an MNL model, were employed in conjunction with various sampling techniques, including the SMOTENC, the Neighborhood-based Undersampling, and the Separation scheme...

Files

Thesis_A.Panagiotidou.pdf

(pdf | 2.09 Mb)

License info not available