RO
R.A.A. Overwater
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Visual counting is an important task in computer vision with broad applications in areas such as crowd monitoring, agriculture, and environmental analysis. While deep learning has significantly advanced this field by enabling models to learn robust feature representations, deep learning approaches suffer from sensitivity to data imbalances, which occur in the distribution of object counts across counting datasets as a result of annotation effort. Most state-of-the-art counting models, categorized into clustering-, detection-, regression-, and density estimation-based methods, are built upon Convolutional Neural Networks (CNNs) and Transformers, both of which are known to be susceptible to imbalances in the training data. This study introduces a hybrid model that incorporates a programmatically guaranteed counting mechanism using the RASP language and the Tracr compiler, enabling the construction of Transformer-based models that can reliably execute predefined tasks, such as counting. By combining this exact counting mechanism with a trainable embedding module, we present a model that is capable of learning to count various tokens, even under significant data imbalance. We validate our approach on a synthetic, imbalanced dataset and compare its performance, training time, and data efficiency against standard CNN- and Transformer-based models. Results suggest that our method achieves strong generalization across the full spectrum of object counts while requiring less training data, highlighting the potential for this architecture to be further investigated and adapted to be used for robust and efficient visual counting.
...
Visual counting is an important task in computer vision with broad applications in areas such as crowd monitoring, agriculture, and environmental analysis. While deep learning has significantly advanced this field by enabling models to learn robust feature representations, deep learning approaches suffer from sensitivity to data imbalances, which occur in the distribution of object counts across counting datasets as a result of annotation effort. Most state-of-the-art counting models, categorized into clustering-, detection-, regression-, and density estimation-based methods, are built upon Convolutional Neural Networks (CNNs) and Transformers, both of which are known to be susceptible to imbalances in the training data. This study introduces a hybrid model that incorporates a programmatically guaranteed counting mechanism using the RASP language and the Tracr compiler, enabling the construction of Transformer-based models that can reliably execute predefined tasks, such as counting. By combining this exact counting mechanism with a trainable embedding module, we present a model that is capable of learning to count various tokens, even under significant data imbalance. We validate our approach on a synthetic, imbalanced dataset and compare its performance, training time, and data efficiency against standard CNN- and Transformer-based models. Results suggest that our method achieves strong generalization across the full spectrum of object counts while requiring less training data, highlighting the potential for this architecture to be further investigated and adapted to be used for robust and efficient visual counting.
Bachelor thesis
(2021)
-
A.W. van Roon, M. Bianconi, F.J. Schimmel, Wu Qiu, M.J. Kalsbeek, R.A.A. Overwater, G.F. Feenstra, V.J.M.J. Lechner, R.C.W. Roelofs, Edward Neate, A. Anisimov, M.J. Ribeiro, M. Fathi Azarkhavarani, F. Corte Vargas
The Last Hope drone will autonomously find a clear path into the sky from the ground and ascend to an altitude of up to two thousand meters. Within 20 minutes it transmits a call for help with exact location information to rescue operators via the Iridium satellite network...
...
The Last Hope drone will autonomously find a clear path into the sky from the ground and ascend to an altitude of up to two thousand meters. Within 20 minutes it transmits a call for help with exact location information to rescue operators via the Iridium satellite network...