Using frequency information to improve accuracy of object detectors

More Info
expand_more

Abstract

This research paper analyses the effect that using frequency information can have on object detectors. The latter are complex networks that learn information about objects from images and are then able to predict the location of these objects in new, unseen images. There are, however, certain datasets that are hard to learn on, partly because the environment in which images are taken is diverse and complex, and also because the objects to detect can appear in fairly different shapes. The dataset considered in this paper is called the Global Wheat Head Dataset (GWHD, provided by a Kaggle competition). An object detector is run on the original GWHD images and then the performance is compared to running the detector on a frequency filtered version of the images. A mathematical transform called Fourier Transform is used to map images from their spatial (pixel) domain to a new domain called the frequency domain, where certain non-informative frequencies are filtered out and then the images are mapped back to their spatial domain. Two experiments were conducted and results show that with this specific filtering methodology, no improvement is found on the GWHD dataset using an object detector called YoloV5. A pipeline was developed which allows for custom filtering strategy implementations and customs datasets. Similar work has shown that images in their frequency domain can speed up computational time and also increase the accuracy of an object detector, so this paper also gives the opportunity for further experiments with the created pipeline.