Y. Lin | TU Delft Repository

A step towards understanding why classification helps regression

Conference paper (2023) - Silvia Pintea (author) , Y. Lin (author) , Jouke Dijkstra (author) , J.C. Van Gemert (author)

A number of computer vision deep regression approaches report improved results when adding a classification loss to the regression loss. Here, we explore why this is useful in practice and when it is beneficial. To do so, we start from precisely controlled dataset variations and ...

Data-efficient learning of geometric structures from single-view images

Doctoral thesis (2022) - Y. Lin (author) , Marcel J.T. Reinders (promotor) , Jan van van Gemert (copromotor) , S. Pintea (copromotor)

The humanly constructed world is well-organized in space. A prominent feature of this artificial world is the presence of repetitive structures and coherent patterns, such as lines, junctions, wireframes of a building, and footprints of a city. These structures and patterns facil ...

The humanly constructed world is well-organized in space. A prominent feature of this artificial world is the presence of repetitive structures and coherent patterns, such as lines, junctions, wireframes of a building, and footprints of a city. These structures and patterns facilitate visual scene understanding by providing abundant geometry information. Humans can easily recognize diverse geometric structures. However, we wonder how to instruct an autonomous agent to interpret the visual world as we do? There has been great interest in automatic understanding of the geometric world from images in the past few decades. Conventional approaches first detect hand-crafted edge features and then group them into parametric shapes such as lines. Recently, this strategy has been gradually replaced by deep neural networks because learning features from large annotated datasets gives a richer representation compared to hand-designed features. Although the progress is inspiring, there are still several concerns on deploying neural networks in the real-world. A primary concern is the availability of massive labeled data, as the performance of deep networks deteriorates substantially when training data is scarce. This thesis introduces novel strategies to enhance the performance of neural networks in a small data regime, by adding geometric priors into learning. We start with the Hough Transform, a well-known prior for straight lines, and offer a principled way to add this prior into neural networks for data efficient end-to-end learning. On the wireframe parsing task, our model advances the state-of-the-art substantially on various subsets with much less training data. Subsequently, we extend the Hough Transform line priors to semi-supervised lane detection, only requiring a small amount of labeled data, and show that this approach improves the overall performance by leveraging a massive amount of unlabeled data. We explore a second geometric prior, the Gaussian sphere mapping, for vanishing point detection. We present an end-to-end framework for detecting multiple non-orthogonal vanishing points without relying on large quantities of training samples. Moreover, the proposed model exhibits consistent performance across multiple datasets without fine-tuning, thus demonstrating the effectiveness of geometric priors in tackling data variation. Next, we study detecting 3D mirror symmetry from single-view images. We explicitly incorporate 3D mirror geometry into identifying symmetry planes. To reduce the computational footprint, we design multi-stage spherical convolutions to hierarchically pinpoint the optimal plane in the parameter space. Our model not only improves overall performance but also reduces the inference latency substantially. Finally, we explore the possibility of detecting polygonal shapes from images by using transformers. We provide a full picture of the strength and weakness of the auto-regressive and parallel transformers on detecting polygons viewed as collections of points. we demonstrate on a toy dataset that the auto-regressive transformers can be a reasonable option for learning polygonal representations from real-world images. Taken together, with this thesis we show that incorporating geometric priors into modern deep learning allows reducing the need for expensive, manually annotated data.

Deep Vanishing Point Detection

Geometric priors make dataset variations vanish

Conference paper (2022) - Y. Lin (author) , Ruben Wiersma (author) , Silvia Pintea (author) , Klaus Hildebrandt (author) , Elmar Eisemann (author) , Jan van van Gemert (author)

Deep learning has improved vanishing point detection in images. Yet, deep networks require expensive annotated datasets trained on costly hardware and do not generalize to even slightly different domains, and minor problem variants. Here, we address these issues by injecting deep ...

Semi-Supervised Lane Detection With Deep Hough Transform

Conference paper (2021) - Y. Lin (author) , S. Pintea (author) , J.C. Gemert (author)

Current work on lane detection relies on large manually annotated datasets. We reduce the dependency on annotations by leveraging massive cheaply available unlabelled data. We propose a novel loss function exploiting geometric knowledge of lanes in Hough space, where a lane can b ...

Investigating transformers in the decomposition of polygonal shapes as point collections

Conference paper (2021) - Andrea Alfieri (author) , Y. Lin (author) , Jan van van Gemert (author)

Transformers can generate predictions in two approaches: 1. auto-regressively by conditioning each sequence element on the previous ones, or 2. directly produce an output sequences in parallel. While research has mostly explored upon this difference on sequential tasks in NLP, we ...

Deep Hough-Transform Line Priors

Conference paper (2020) - Y. Lin (author) , S. Pintea (author) , J.C. Gemert (author)

Classical work on line segment detection is knowledge-based; it uses carefully designed geometric priors using either image gradients, pixel groupings, or Hough transform variants. Instead, current deep learning methods do away with all prior knowledge and replace priors by train ...