From log files to train traffic reports : Using Natural Language Generation to explain anomalies from Train Control System log files

Master Thesis (2019)
Author(s)

B. Urumovska (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Tintarev – Mentor (TU Delft - Web Information Systems)

G.J.P.M. Houben – Graduation committee member (TU Delft - Web Information Systems)

M.T.J. Spaan – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2019 Bojana Urumovska
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Bojana Urumovska
Graduation Date
22-08-2019
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The Natural Language Generation field has advanced in generating human readable reports for domain experts in various fields. Nevertheless, Natural Language Generation and anomaly detection techniques have not been used in the rail domain yet. Currently, data analysis and incident reporting for log files from the train control system are performed manually which is very time consuming task that is prone to missing crucial information. The rail domain is safety critical domain where detailed analysis of the train control system may prevent incidents from happening as well as help improve the performance of the train control system. This research designs, implements and evaluates a Natural Language Generation model that successfully translates anomalies detected in log files into human readable reports.
This thesis presents the steps taken for developing a Natural Language Generation system in the rail domain. Additionally, we examine two representations of the train control system used for the Content Determination task of the Natural Language Generation system. Through a case study with domain experts, we evaluate the performance and preference between the reports generated based on the two representations of the train control system and the data retrieved from the log files. The goal is to find a representation that presents the used with a full/solid understanding of the anomalies detected in the log files.
Based on the case study performed to evaluate the system, we present the finding that when developing a Natural Language Generation system for the rail domain, reports generated using a more detailed representation of the train control system (more precisely, using both state names and state attributes that specify the step by step process of setting a route for a train) were preferred over the reports generated using a less detailed representation (only state names). The preference was based on readability, accuracy and understandability measures of the reports presented during the case study.

Files

License info not available