Seeing the Gaps: Improving Object Segmentation for Abstract Visual Reasoning in Julia
Grammar Extensions and Structural Ranking for the BEN Agent on the ARC Benchmark
F.G. Howard (TU Delft - Electrical Engineering, Mathematics and Computer Science)
D.Z. Zak – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Dumančić – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure general-purpose skill acquisition, requiring solvers to infer transformation rules from very few examples. Program synthesis approaches such as the Divide, Align and Conquer (DA&C) framework have shown promise, but their segmentation stage, which decomposes input grids into objects, remains a bottleneck in both computational cost and task coverage. This work presents a reimplementation of the BEN agent in Julia, integrated with the Herb.jl program synthesis ecosystem, alongside two targeted improvements to the segmentation module. First, we extend the segmentation grammar with a proximity;N mode that groups same-color pixels within Chebyshev distance N [1], enabling correct decomposition of objects with small internal gaps. Second, we replace the original full-pipeline mode selection with a lightweight structural ranking that scores all candidate modes on their segmentation output alone, using all training examples rather than only the first. Evaluated on the 400-task ARC training set with a 140-second budget, the Julia reimplementation solves 46 tasks, of which 5 are solved via proximity modes absent from the original grammar and are therefore likely attributable to the grammar extension. Analysis of the ranking reveals that the top-ranked mode solves 59% of solvable tasks. A deliberate fallback mechanism compensates for the heuristic's imperfection by guaranteeing that a reliable base mode is always attempted second. Grammar extensions account for some improvement (5 out of 13 tasks likely solved exclusively by Julia).