Considering Airport Planners’ Preferences and Imbalanced Datasets when Predicting Flight Delays and Cancellations

Conference Paper (2021)
Author(s)

Rik Hendrickx (Student TU Delft)

M. Zoutendijk (TU Delft - Air Transport & Operations)

M.A. Mitici (TU Delft - Air Transport & Operations)

Jeffrey Schäfer (Royal Schiphol Group)

Research Group
Air Transport & Operations
Copyright
© 2021 Rik Hendrickx, M. Zoutendijk, M.A. Mitici, Jeffrey Schäfer
DOI related publication
https://doi.org/10.1109/DASC52595.2021.9594367
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Rik Hendrickx, M. Zoutendijk, M.A. Mitici, Jeffrey Schäfer
Research Group
Air Transport & Operations
ISBN (print)
978-1-6654-3421-8
ISBN (electronic)
978-1-6654-3420-1
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key part of efficient airport operational planning is to have insight into potential flight delays and cancellations. For airport planners, it is important to obtain flight delay or cancellation predictions with a high degree of certainty, i.e. a high precision. This allows planners to make sound decisions based on these predictions. To obtain such predictions, machine learning classification techniques are often applied. An important issue for classification problems is that of imbalanced class distributions: the number of actually cancelled/delayed flights is low. In general, the imbalance is addressed by resampling the data using one or more sampling techniques. However, resampling does not necessarily correspond to an imbalance ratio that leads to the best classification results. In this paper a systematic approach is presented to deal with imbalanced data for classification problems, while taking into account the preferences of airport planners. A range of feasible imbalance ratios, together with several classification algorithms and sampling techniques, are considered. An optimal imbalance ratio is identified with respect to relevant performance metrics. The approach is illustrated by performing binary classification of flight cancellations and delays at a large European airport. The results show that the highest prediction precision is obtained using a base imbalance ratio, whereas a higher imbalance ratio is needed to obtain the highest F1-score. Specifically, the cancellation prediction performance is increased by up to 243%, while its optimal imbalance ratio does not correspond to resampling. In general, the results underline the need to investigate the influence of varying data imbalance ratios on the performance of classification algorithms.

Files

License info not available