Considering Airport Planners’ Preferences and Imbalanced Datasets when Predicting Flight Delays and Cancellations

None, None; None, None; None, None; None, None

Considering Airport Planners’ Preferences and Imbalanced Datasets when Predicting Flight Delays and Cancellations

Conference Paper (2021)

Author(s)

Rik Hendrickx (Student TU Delft)

Mike Zoutendijk (TU Delft - Aerospace Engineering)

Mihaela Mitici (TU Delft - Aerospace Engineering)

Jeffrey Schäfer (Royal Schiphol Group)

Research Group

Air Transport & Operations

Machine learning Classification Imbalance Flight delay

DOI related publication

https://doi.org/10.1109/DASC52595.2021.9594367 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:f6e85861-ed01-4336-93c9-f398a1befd3c

More Info

expand_more

Publication Year

2021

Language

English

Research Group

Air Transport & Operations

Article number

9594367

ISBN (print)

978-1-6654-3421-8

ISBN (electronic)

978-1-6654-3420-1

Event

2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC) (2021-10-03 - 2021-10-07), Hybrid at San Antonio, United States

Downloads counter

269

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key part of efficient airport operational planning is to have insight into potential flight delays and cancellations. For airport planners, it is important to obtain flight delay or cancellation predictions with a high degree of certainty, i.e. a high precision. This allows planners to make sound decisions based on these predictions. To obtain such predictions, machine learning classification techniques are often applied. An important issue for classification problems is that of imbalanced class distributions: the number of actually cancelled/delayed flights is low. In general, the imbalance is addressed by resampling the data using one or more sampling techniques. However, resampling does not necessarily correspond to an imbalance ratio that leads to the best classification results. In this paper a systematic approach is presented to deal with imbalanced data for classification problems, while taking into account the preferences of airport planners. A range of feasible imbalance ratios, together with several classification algorithms and sampling techniques, are considered. An optimal imbalance ratio is identified with respect to relevant performance metrics. The approach is illustrated by performing binary classification of flight cancellations and delays at a large European airport. The results show that the highest prediction precision is obtained using a base imbalance ratio, whereas a higher imbalance ratio is needed to obtain the highest F1-score. Specifically, the cancellation prediction performance is increased by up to 243%, while its optimal imbalance ratio does not correspond to resampling. In general, the results underline the need to investigate the influence of varying data imbalance ratios on the performance of classification algorithms.

Files

2021_06_14_DASC_Considering_Ai... (pdf)

(pdf | 0.34 Mb)

License info not available