Seeing the Gaps: Improving Object Segmentation for Abstract Visual Reasoning in Julia

Grammar Extensions and Structural Ranking for the BEN Agent on the ARC Benchmark

Bachelor Thesis (2026)
Author(s)

F.G. Howard (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D.Z. Zak – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Dumančić – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
26-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure general-purpose skill acquisition, requiring solvers to infer transformation rules from very few examples. Program synthesis approaches such as the Divide, Align and Conquer (DA&C) framework have shown promise, but their segmentation stage, which decomposes input grids into objects, remains a bottleneck in both computational cost and task coverage. This work presents a reimplementation of the BEN agent in Julia, integrated with the Herb.jl program synthesis ecosystem, alongside two targeted improvements to the segmentation module. First, we extend the segmentation grammar with a proximity;N mode that groups same-color pixels within Chebyshev distance N [1], enabling correct decomposition of objects with small internal gaps. Second, we replace the original full-pipeline mode selection with a lightweight structural ranking that scores all candidate modes on their segmentation output alone, using all training examples rather than only the first. Evaluated on the 400-task ARC training set with a 140-second budget, the Julia reimplementation solves 46 tasks, of which 5 are solved via proximity modes absent from the original grammar and are therefore likely attributable to the grammar extension. Analysis of the ranking reveals that the top-ranked mode solves 59% of solvable tasks. A deliberate fallback mechanism compensates for the heuristic's imperfection by guaranteeing that a reliable base mode is always attempted second. Grammar extensions account for some improvement (5 out of 13 tasks likely solved exclusively by Julia).

Files

Final_Final_Paper.pdf
(pdf | 0.513 Mb)
License info not available