T.J. Viering | TU Delft Repository

How does sample weighting improve learning curve fitting?

Bachelor thesis (2025) - G.F.M. den Hollander, T.J. Viering, C. Yan, O.T. Turan, A. van Deursen

Learning curves plot the performance of a machine learning model against the size of the dataset used for training. Curve fitting is a process that attempts to optimize algorithm parameters by minimizing the error in its loss function, thereby achieving the best possible fit to the data. We apply various sample weighting techniques to the curve fitting process and evaluate whether the resulting weighted curves can significantly improve the performance of the model. We explore whether adjusting the magnitudes of these weights can further improve the fit of the curve. The results demonstrate that each sample weighting method, as well as larger weight magnitudes, can significantly improve error rate prediction for anchors beyond the range of the observed data. ...

Factors related to dataset that influence the shape of learning curves

Bachelor thesis (2022) - N.T. Bui, T.J. Viering, M. Loog, G. Smaragdakis

Although there are many promising applications of a learning curve in machine learning, such as model selection, we still know very little about what factors influence their behaviours. The aim is to study the impact of the inherent characteristics of the datasets on the learning shapes, which are noise, discretized input and dimensionality. We trained two classifiers with a panoply of datasets for the investigation to see how the learning curve behaves under different circumstances. Firstly, we found that the shapes of the curves varied with different levels of noise injected into the original datasets. Secondly, using the equal width interval binning technique to discretize continuous features did not make the classifiers learn exponentially but caused the learning curves to behave unpredictably; thus, it does not transform the continuous problem into the easier class of problems mentioned in [1]. Finally, the more dimension we reduced using the PCA technique, the learning curve showed strange behaviours.

...

Are CNNs that Learn to Predict Image Statistics Invariant to Domain Shifts?

Bachelor thesis (2021) - J.P. Biesheuvel, T.J. Viering, Z. Wang, D.M.J. Tax, M. Loog, K.A. Hildebrandt

Yes, convolutional neural networks are domain-invariant, albeit to some limited extent. We explored the performance impact of domain shift for convolutional neural networks. We did this by designing new synthetic tasks, for which the network’s task was to map images to their mean, median, standard deviation, and variance pixel intensities. We find that the performance drop due to domain shift is related to the shift in pixel values between source and target domain. Colour space transformations seemed to notably impact the network’s performance, opposed to geometric transformations. For the last domain shift we find that the network manages to beat a baseline, from which we can conclude the domain shift is not too severe. Additionally, the findings reveal a less dominant role for feature transferability, for our synthetic regression tasks. ...

Comparison of Linguistic Language Classification based on Origin and Data Driven Language Classification using the IPA and Clustering

Bachelor thesis (2021) - I.G.M. Rethans, T.J. Viering, S. MAKRODIMITRIS, A. Naseri Jahfari

Language similarity is very useful for enrichment data in both Natural Lanuguage Processing (NLP) and Automatic Speech Recognition (ASR). A clustering algorithm could provide an efficient means to define language similarity in a data-driven way. This research investigates the relation between linguistic classification by origin and data driven classification based on the pronunciation of languages using k-means clustering where the focus is placed
on the Indo-European languages. The results show large variation in cluster results and consequently large variation in correspondence with linguistic
classification. This is caused by a relatively even spread of the data over the feature space. Still, the results indicate significance in the relation between
the two classification methods. Furthermore, this research functions as a foundation and a source of inspiration for a lot of possible future research.
...