O.S. Kayhan | TU Delft Repository

Humans Disagree With the IoU for Measuring Object Detector Localization Error

Conference paper (2022) - O. Strafforello (author) , Vanathi Rajasekart (author) , Osman Semih Kayhan (author) , Oana Inel (author) , Jan C. Gemert (author)

The localization quality of automatic object detectors is typically evaluated by the Intersection over Union (IoU) score. In this work, we show that humans have a different view on localization quality. To evaluate this, we conduct a survey with more than 70 participants. Results ...

Locality in space and time for data-efficient visual recognition

Doctoral thesis (2022) - O.S. Kayhan (author)

Spatial localization in time is vital for humans. Therefore we desire that computer vision algorithms are also able to spatially and temporally localize objects and actions. These algorithms generally learn from given data and discover patterns, parts, motions, and their location ...

Spatial localization in time is vital for humans. Therefore we desire that computer vision algorithms are also able to spatially and temporally localize objects and actions. These algorithms generally learn from given data and discover patterns, parts, motions, and their locations by exploiting inductive biases that are essential for learning. However, localization is complex, error-prone and hard to inspect. In this thesis, we investigate location biases and how CNNs explore and exploit location and temporal information in the image and video domain. An interesting finding of the thesis is that heuristics about what is outside the image (border handling) enables CNNs to exploit absolute spatial location and break translation equivariance. The thesis proposes a simple solution to eliminate the spatial location biases. The proposed solution improves translation equivariance and provides data efficiency and robustness. Furthermore, the thesis investigates object and part locations on images. First, the thesis studies object-context relationships of modern object detectors and reveals insights about helpful location biases. In addition, the effect of unhelpful location biases is investigated for a visual verification task. These analyses show that object detectors can hallucinate the location of an object with high confidence score even if the object is not in the image. Based on these insights, the thesis provides suggestions for researchers on how to choose an object detector for their specific tasks. Another interesting finding of this thesis shows limitations of data augmentation techniques to resolve robustness issues of pose estimation methods when dealing with occlusions. Even if data augmentation alleviates some problems caused by sampling biases, it can only yield limited improvement and the performance saturates after applying a stack of augmentations. Finally, the thesis investigates temporal location information and demonstrates spatio-temporal location biases in video data. A time-efficient video labeling solution that uses latent space feature similarity is proposed to annotate long-untrimmed videos. Besides, using only keyframe labels with Positive-Unlabeled learning achieves highquality action proposals that can be utilized with many temporal action localization methods. The proposed method can provide data and label efficiency. Taken together, this thesis investigates how CNNs use location information and introduce location biases that can result in positive as well as negative outcomes on various computer vision tasks.

t-EVA

Time-Efficient t-SNE Video Annotation

Conference paper (2021) - Soroosh Poorgholi (author) , O.S. Kayhan (author) , Jan van Gemert (author)

Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets. However, annotating large-scale video datasets are cost-intensive. In this work, we propose a time-efficient video annotation method using spatio-t ...

Hallucination In Object Detection

A Study In Visual Part VERIFICATION

Conference paper (2021) - Osman Semih Kayhan (author) , Bart Vredebregt (author) , Jan van van Gemert (author)

We show that object detectors can hallucinate and detect missing objects; potentially even accurately localized at their expected, but non-existing, position. This is particularly problematic for applications that rely on visual part verification: detecting if an object part is p ...

PUNet

Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

Conference paper (2021) - Noor ul Sehr Zia (author) , O.S. Kayhan (author) , Jan van Gemert (author)

Popular approaches to classifying action segments in long, realistic, untrimmed videos start with high quality action proposals. Current action proposal methods based on deep learning are trained on labeled video segments. Obtaining annotated segments for untrimmed videos is time ...

On translation invariance in CNNs

Convolutional layers can exploit absolute spatial location

Conference paper (2020) - Osman Semih Kayhan (author) , J.C. Van Gemert (author)

In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting im ...