Highly performing object detectors require large training datasets, which entail class and bounding box annotations. To reduce the labelling effort of curating such datasets, Weakly Supervised Object Detection is concerned with training object detectors from only class labels. Th
...
Highly performing object detectors require large training datasets, which entail class and bounding box annotations. To reduce the labelling effort of curating such datasets, Weakly Supervised Object Detection is concerned with training object detectors from only class labels. The most performant weakly supervised detectors (MIL-based) have high inference times, while faster methods (CAM-based) have been primarily studied in the context of localizing just one object in an image. This research proposes an extension to weakly supervised CAM-based detectors that allows them to detect multiple objects in an image and asseses their performance at localizing the full extent of objects with bounding boxes, as well as their general location with pin-points. VGG16 and a novel FPN-based classifier are experimented with as the backbone of the network, followed by GradCAM++ which indicates through heatmaps the locations of the objects predicted by the classifiers. Additionally, the proposed method is used to create pseudo-labels on which any fully supervised detector could be trained on. Results show that while the proposed method is not suitable for detecting the full extent of objects, it can accurately pin-point their general location in near real-time, thus showing the Object is Roughly There (ORT).