D.Z. Zak | TU Delft Repository

Seeing the Gaps: Improving Object Segmentation for Abstract Visual Reasoning in Julia

Grammar Extensions and Structural Ranking for the BEN Agent on the ARC Benchmark

Bachelor thesis (2026) - F.G. Howard, D.Z. Zak, S. Dumančić, A. van Deursen

The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure general-purpose skill acquisition, requiring solvers to infer transformation rules from very few examples. Program synthesis approaches such as the Divide, Align and Conquer (DA&C) framework have shown promise, but their segmentation stage, which decomposes input grids into objects, remains a bottleneck in both computational cost and task coverage. This work presents a reimplementation of the BEN agent in Julia, integrated with the Herb.jl program synthesis ecosystem, alongside two targeted improvements to the segmentation module. First, we extend the segmentation grammar with a proximity;N mode that groups same-color pixels within Chebyshev distance N [1], enabling correct decomposition of objects with small internal gaps. Second, we replace the original full-pipeline mode selection with a lightweight structural ranking that scores all candidate modes on their segmentation output alone, using all training examples rather than only the first. Evaluated on the 400-task ARC training set with a 140-second budget, the Julia reimplementation solves 46 tasks, of which 5 are solved via proximity modes absent from the original grammar and are therefore likely attributable to the grammar extension. Analysis of the ranking reveals that the top-ranked mode solves 59% of solvable tasks. A deliberate fallback mechanism compensates for the heuristic's imperfection by guaranteeing that a reliable base mode is always attempted second. Grammar extensions account for some improvement (5 out of 13 tasks likely solved exclusively by Julia). ...

Optimized Concept Learning in BEN

Better identifying when transformations apply in a program synthesis agent

Bachelor thesis (2026) - J.S. Duifs, D.Z. Zak, S. Dumančić, A. van Deursen

The Abstraction and Reasoning Corpus (ARC) challenges AI systems to induce general rules from as few as two to four examples. BEN is a program synthesis agent that tackles ARC through a Divide, Align, and Conquer pipeline inspired by analogical reasoning, but its concept learning phase, which synthesizes Boolean rules governing when transformations apply, must frequently choose between candidate rules of equal structural complexity. Because ARC's extreme data scarcity makes such ties highly prevalent, the system's coarse notion of complexity provides insufficient nuance to distinguish between candidates, leaving it vulnerable to overfitting.

We present three optimizations to BEN's concept learner, built on a reimplementation of BEN in Julia. These optimization provide better nuance in complexity at different stages of the pipeline, helping the system more accurately mimic human analogical reasoning. A \emph{Rule Generality} heuristic uses cross-transformation frequency analysis to discount broadly applicable features. A \emph{DNF Model Score} leverages cumulative solver objective values as a semantic tie-breaker between structurally equivalent rules. A \emph{Simple Minimal Transformation Coverage} heuristic extends the greedy set-cover procedure, with secondary criteria based on correspondence ambiguity and transformation complexity. An ablation study on 400 ARC-AGI-1 tasks shows these optimizations increase accuracy from 32 to 36 solved tasks with no regressions and negligible computational overhead. Analysis of rule complexity reveals that generated DNFs consistently require only a single clause, suggesting that the current performance bottleneck lies in the upstream pipeline rather than in concept complexity itself. ...

Efficient learning through programmatic representations

How can we better find the matches between input and output objects?

Bachelor thesis (2026) - A. Zaghăr, S. Dumančić, D.Z. Zak, A. van Deursen

The Abstraction and Reasoning Corpus (ARC) serves as a challenging benchmark designed to measure human-like artificial intelligence by only providing a few training examples for each task. Program synthesis offers a new approach to solving this benchmark by generating rule-based transformation programs. A recent approach, BEN, tackles these tasks by making use of program synthesis in a Divide, Align and Conquer strategy. The Align component identifies correspondences between the input and output objects by using a Structure Mapping Engine (SME) rooted in analogical reasoning. Despite its success, each individual part of the algorithm can be further improved, in particular the features of objects used in Align.

We present an enhanced object-matching methodology of the Align component to improve the quality of the correspondences found. First, the BEN algorithm is re-implemented in Julia, making use of an Answer Set Programming (ASP) solver using Clingo and Prolog for the SME to offer better efficiency. Second, we augment the structural representation of the objects by introducing features that capture the spatial relations between them, specifically through forms of ranked coordinates. Furthermore, we refine how multi-coloured objects are propositionally encoded. Finally, we introduce a weighting heuristic for the features: the significance of individual visual attributes is minimized when the input and output grids contain the same number of objects, and simple coordinates are eliminated when the input and output grids have different sizes.

The proposed changes were evaluated by isolating the Align component across subsets of the ARC-AGI-1 benchmark. In a sample of 25 manually selected tasks, the number of perfectly matched tasks improved significantly from 11 to 23. In a randomly selected sample of 25 tasks, the modifications give better or identical matches in 22 tasks, with only 3 showing degradations. When running the entire benchmark with the full BEN algorithm, one extra task is solved and two no longer are. Average times are similar and the number of searched transformations decreases. These findings suggest that including spatial relations and contextual weighting of the features improves the accuracy of finding correct correspondences for the ARC benchmark. ...

The Abstraction and Reasoning Corpus (ARC) serves as a challenging benchmark designed to measure human-like artificial intelligence by only providing a few training examples for each task. Program synthesis offers a new approach to solving this benchmark by generating rule-based transformation programs. A recent approach, BEN, tackles these tasks by making use of program synthesis in a Divide, Align and Conquer strategy. The Align component identifies correspondences between the input and output objects by using a Structure Mapping Engine (SME) rooted in analogical reasoning. Despite its success, each individual part of the algorithm can be further improved, in particular the features of objects used in Align.

We present an enhanced object-matching methodology of the Align component to improve the quality of the correspondences found. First, the BEN algorithm is re-implemented in Julia, making use of an Answer Set Programming (ASP) solver using Clingo and Prolog for the SME to offer better efficiency. Second, we augment the structural representation of the objects by introducing features that capture the spatial relations between them, specifically through forms of ranked coordinates. Furthermore, we refine how multi-coloured objects are propositionally encoded. Finally, we introduce a weighting heuristic for the features: the significance of individual visual attributes is minimized when the input and output grids contain the same number of objects, and simple coordinates are eliminated when the input and output grids have different sizes.

The proposed changes were evaluated by isolating the Align component across subsets of the ARC-AGI-1 benchmark. In a sample of 25 manually selected tasks, the number of perfectly matched tasks improved significantly from 11 to 23. In a randomly selected sample of 25 tasks, the modifications give better or identical matches in 22 tasks, with only 3 showing degradations. When running the entire benchmark with the full BEN algorithm, one extra task is solved and two no longer are. Average times are similar and the number of searched transformations decreases. These findings suggest that including spatial relations and contextual weighting of the features improves the accuracy of finding correct correspondences for the ARC benchmark.

Efficient learning through programmatic representations

Improving transformation search in BEN

Bachelor thesis (2026) - D. Condratov, D.Z. Zak, S. Dumančić, A. van Deursen

The Abstraction and Reasoning Corpus (ARC) challenges systems to recognise patterns from just a few input-output examples, which proves to be a setting where most current approaches struggle. Program synthesis systems like BEN tackle this by decomposing the images into objects rather than pixels and then searching for transformation programs that explain the examples. However, its reliance on exhaustive enumeration makes the transformation search computationally expensive and difficult to scale.

This project aims to improve the transformation search and, in doing so, asks whether correspondence-tailored grammar pruning can reduce the search space and improve the efficiency of BEN’s conquer step. BEN is reimplemented in Julia using the Herb.jl library. On top of that, a pruning strategy that exploits structural similarities between matched input-output object pairs is added. Both the baseline and the improved version are evaluated on the 400-task ARC training set.

The pruning reduces the average number of candidate programs evaluated by 77%, but it has a seemingly slight negative impact on the tasks that get solved. The number of correctly solved tasks decreases from 31 to 30. These results show that simple structural observations about the matched object pairs substantially reduce the search space. More informed search looks like a promising direction for improving program synthesis systems on ARC. ...

Anytime Program Synthesis for the BEN ARC Solver

Exploring heuristically-driven backtracking for the DA&C paradigm

Bachelor thesis (2026) - J.M. Florek, S. Dumančić, D.Z. Zak, A. van Deursen

Program synthesis for the Abstraction and Reasoning Corpus (ARC) remains challenging due to large search spaces, limited computational budgets, and the propagation of early synthesis decisions throughout the solver pipeline. This paper investigates whether internal solver metrics can guide retriggerable transformation search within the BEN Divide, Align, and Conquer (DA&C) architecture.To address this question, we introduce an anytime transformation-refinement mechanism that resumes synthesis from previously explored search frontiers, along with a threshold-based retriggering strategy driven by transformation overfitting and solver-state metrics. The proposed approach was implemented in a Julia-based reimplementation of BEN using the Herb.jl program synthesis framework and evaluated on the ARC benchmark.The experimental results show that retriggerable synthesis substantially changes the distribution of search effort across correspondences, enabling deeper exploration of highly ranked alignments while preserving exploration of lower-ranked alternatives. Compared to the baseline solver, the proposed method explores significantly fewer candidate transformations. It produces a different set of solved tasks, demonstrating that it redirects synthesis toward different regions of the search space. However, these changes do not translate into a large improvement in overall ARC solve rate, with both approaches achieving comparable aggregate performance.These findings suggest that transformation synthesis is not the primary bottleneck in the current BEN architecture. Instead, overall performance appears to be constrained by earlier stages of the pipeline, including segmentation, correspondence generation, and the expressiveness of the transformation language. While retriggerable anytime synthesis does not increase aggregate benchmark runtime performance, it provides a structured mechanism for adaptive search control. It lays the foundation for future work on stateful, dynamically allocated program synthesis. ...

Program synthesis for the Abstraction and Reasoning Corpus (ARC) remains challenging due to large search spaces, limited computational budgets, and the propagation of early synthesis decisions throughout the solver pipeline. This paper investigates whether internal solver metrics can guide retriggerable transformation search within the BEN Divide, Align, and Conquer (DA&C) architecture.To address this question, we introduce an anytime transformation-refinement mechanism that resumes synthesis from previously explored search frontiers, along with a threshold-based retriggering strategy driven by transformation overfitting and solver-state metrics. The proposed approach was implemented in a Julia-based reimplementation of BEN using the Herb.jl program synthesis framework and evaluated on the ARC benchmark.The experimental results show that retriggerable synthesis substantially changes the distribution of search effort across correspondences, enabling deeper exploration of highly ranked alignments while preserving exploration of lower-ranked alternatives. Compared to the baseline solver, the proposed method explores significantly fewer candidate transformations. It produces a different set of solved tasks, demonstrating that it redirects synthesis toward different regions of the search space. However, these changes do not translate into a large improvement in overall ARC solve rate, with both approaches achieving comparable aggregate performance.These findings suggest that transformation synthesis is not the primary bottleneck in the current BEN architecture. Instead, overall performance appears to be constrained by earlier stages of the pipeline, including segmentation, correspondence generation, and the expressiveness of the transformation language. While retriggerable anytime synthesis does not increase aggregate benchmark runtime performance, it provides a structured mechanism for adaptive search control. It lays the foundation for future work on stateful, dynamically allocated program synthesis.