Data augmentation for graph based data
Improving representation of cycling trips with varying speed conditions using data augmentation
L.D. Petre (TU Delft - Electrical Engineering, Mathematics and Computer Science)
T. Gao – Mentor (TU Delft - Traffic Systems Engineering)
Elvin Isufi – Graduation committee member (TU Delft - Multimedia Computing)
J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Accurate estimation of bicycle trip travel times remains a challenge due to the limited availability of structured cycling data. This paper investigates how graph-based data augmentation can be used to address this limitation, specifically within the context of the DG4B model, a Graph Convolutional Neural Network for travel time estimation. We explore and evaluate three augmentation techniques: Random Walk (with and without node revisiting), Dijkstra Walk and Subgraph Stitching. These methods generate new trips by traversing or recombining paths within an existing road network graph, aiming to expand the training dataset while preserving realistic routing behavior. The augmented data is evaluated both statistically, using metrics like mean, variance and Frobenius norm, and in terms of model performance using RMSE, MAE and MAPE. Experimental results show that Subgraph Stitching and Dijkstra Walk yield the most effective improvements in model accuracy, with each method exhibiting strengths across different trip duration ranges. This work demonstrates that carefully designed graph-based data augmentation can improve GCNN-based travel time predictions in settings with limited cycling trip data.