Data augmentation for graph based data

Improving representation of cycling trips with varying speed conditions using data augmentation

Bachelor Thesis (2025)
Author(s)

L.D. Petre (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T. Gao – Mentor (TU Delft - Traffic Systems Engineering)

Elvin Isufi – Graduation committee member (TU Delft - Multimedia Computing)

J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
27-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Accurate estimation of bicycle trip travel times remains a challenge due to the limited availability of structured cycling data. This paper investigates how graph-based data augmentation can be used to address this limitation, specifically within the context of the DG4B model, a Graph Convolutional Neural Network for travel time estimation. We explore and evaluate three augmentation techniques: Random Walk (with and without node revisiting), Dijkstra Walk and Subgraph Stitching. These methods generate new trips by traversing or recombining paths within an existing road network graph, aiming to expand the training dataset while preserving realistic routing behavior. The augmented data is evaluated both statistically, using metrics like mean, variance and Frobenius norm, and in terms of model performance using RMSE, MAE and MAPE. Experimental results show that Subgraph Stitching and Dijkstra Walk yield the most effective improvements in model accuracy, with each method exhibiting strengths across different trip duration ranges. This work demonstrates that carefully designed graph-based data augmentation can improve GCNN-based travel time predictions in settings with limited cycling trip data.

Files

Research_paper_final.pdf
(pdf | 1.57 Mb)
License info not available