Data augmentation for graph based data

None, None

Data augmentation for graph based data

Improving representation of cycling trips with varying speed conditions using data augmentation

Bachelor Thesis (2025)

Author(s)

L.D. Petre (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T. Gao – Mentor (TU Delft - Traffic Systems Engineering)

Elvin Isufi – Graduation committee member (TU Delft - Multimedia Computing)

J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Graph neural network Graph-based data augmentation Bicycle travel time estimation GCNN Cyling data augmentation

To reference this document use:

https://resolver.tudelft.nl/uuid:754c843d-83a2-4e55-8cd4-bd56d82a7378

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Accurate estimation of bicycle trip travel times remains a challenge due to the limited availability of structured cycling data. This paper investigates how graph-based data augmentation can be used to address this limitation, specifically within the context of the DG4B model, a Graph Convolutional Neural Network for travel time estimation. We explore and evaluate three augmentation techniques: Random Walk (with and without node revisiting), Dijkstra Walk and Subgraph Stitching. These methods generate new trips by traversing or recombining paths within an existing road network graph, aiming to expand the training dataset while preserving realistic routing behavior. The augmented data is evaluated both statistically, using metrics like mean, variance and Frobenius norm, and in terms of model performance using RMSE, MAE and MAPE. Experimental results show that Subgraph Stitching and Dijkstra Walk yield the most effective improvements in model accuracy, with each method exhibiting strengths across different trip duration ranges. This work demonstrates that carefully designed graph-based data augmentation can improve GCNN-based travel time predictions in settings with limited cycling trip data.

Files

Research_paper_final.pdf

(pdf | 1.57 Mb)

License info not available