A Systematic Review of Artificial Intelligence Public Datasets for Railway Applications

Review (2021)
Author(s)

Mauro José Pappaterra (Uppsala University, Linnaeus University)

Francesco Flammini (Mälardalen University, Linnaeus University)

Valeria Vittorini (Università degli Studi di Napoli Federico II)

N Bešinović (TU Delft - Transport and Planning)

Transport and Planning
Copyright
© 2021 Mauro José Pappaterra, Francesco Flammini, Valeria Vittorini, Nikola Bešinović
DOI related publication
https://doi.org/10.3390/infrastructures6100136
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Mauro José Pappaterra, Francesco Flammini, Valeria Vittorini, Nikola Bešinović
Transport and Planning
Issue number
10
Volume number
6
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The aim of this paper is to review existing publicly available and open artificial intelligence (AI) oriented datasets in different domains and subdomains of the railway sector. The contribution of this paper is an overview of AI-oriented railway data published under Creative Commons (CC) or any other copyright type that entails public availability and freedom of use. These data are of great value for open research and publications related to the application of AI in the railway sector. This paper includes insights on the public railway data: we distinguish different subdomains, including maintenance and inspection, traffic planning and management, safety and security and type of data including numerical, string, image and other. The datasets reviewed cover the last three decades, from January 1990 to January 2021. The study revealed that the number of open datasets is very small in comparison with the available literature related to AI applications in the railway industry. Another shortcoming is the lack of documentation and metadata on public datasets, including information related to missing data, collection schemes and other limitations. This study also presents quantitative data, such as the number of available open datasets divided by railway application, type of data and year of publication. This review also reveals that there are openly available APIs—maintained by government organizations and train operating companies (TOCs)—that can be of great use for data harvesting and can facilitate the creation of large public datasets. These data are usually well-curated real-time data that can greatly contribute to the accuracy of AI models. Furthermore, we conclude that the extension of AI applications in the railway sector merits a centralized hub for publicly available datasets and open APIs.