Injecting prior frequency information in DETR for wheat head detection

More Info
expand_more

Abstract

Wheat is among the most important grains worldwide. For the assessment of wheat fields, image detection of spikes atop the plant containing grain is used. Previous work in deep learning for precision agriculture employs the already established object detectors, Faster R-CNN and YOLO, adapted for the given context. However, these models suffer from the necessary duplicate-removal postprocessing and from the low performance on overlapping objects. On the other hand, the novel Detection Transformer (DETR) object detection approach manages to overcome such limitations, being an end-to-end anchor-free set predictor based on the transformer architecture, using the attention mechanism for modelling long-range dependencies. Consequently, the general sensitivity of this technique for small size objects in the wheat head domain is reduced. Nonetheless, previous research reflects the potential of frequency analysis techniques to increase the accuracy of a CNN. This paper aims to study the feasibility of adding frequency information as a pre-processing step to improve the performance of the DETR model for wheat head detection. Two variants of the original DETR with a mask based on the Fast Fourier Transform (FFT) of the power spectra of wheat heads and background patches are proposed and explored for improvements in prediction quality. Although promising, the best FFT-based DETR approach manages to deliver an average score of only 0.42, a slightly sub optimal performance compared to DETR’s one of 0.47. Additionally, as to grasp a sense of their capability among well-established detectors, YOLO-V3 and Faster R-CNN manage to achieve around 0.7 on the same wheat data set. Ultimately, a configurable automated overview of the development of wheat fields leads to a more efficient administration of the production process. To such end, this research explores the possible application of this new object detector in precision agriculture and provides insight into its limitations and potential ways of overcoming them.