CoCA: Extending GitHub Copilot with a Context-Aware Agentic Framework for Large and Domain-Specific Repositories at ASML
K. Hoxha (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Oguzhan Yildiz – Mentor (ASML)
B. Özkan – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
P.K. Murukannaiah – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Repository-level code generation remains difficult in industrial systems because tasks span multiple files, internal APIs, architectural conventions, tests, and quality constraints. We present CoCA (Copilot-Orchestrated Contextual Agents), an IDE-constrained framework currently instantiated for Java repositories that extends GitHub Copilot Chat with task decomposition, deterministic repository-context retrieval, optional Test-Driven Generation, and persistent domain-context injection for enterprise settings where external embeddings, fine-tuning, and third-party LLM services are not permitted.
We evaluate CoCA at ASML using CoCABench, an internal suite with a long-horizon task focus composed of 5 epics from 2 proprietary Java repositories with 44 developer-identified subtasks, ranging from a 2-day bug fix to 3-month feature work. Full CoCA is associated with higher ground-truth alignment than the single-agent baseline, from 0.25 to 0.44, on the LLM-judge metric with the strongest inter-rater reliability (Krippendorff's α=0.46). However, it achieves only 0.20 pass@1 despite 0.60 build@1, while the single-agent baseline achieves the highest pass@1.
These research findings suggest that IDE-constrained agentic workflows can move generated implementations closer to the intended developer solution, but do not yet solve reliable executable integration. CoCA is therefore best understood as a developer-in-the-loop assistance workflow rather than a fully autonomous implementation system or a replacement for direct Copilot prompting. It appears most appropriate for long, integration-heavy feature epics where planning, context continuity, and repository awareness are valuable. For small localized fixes, the orchestration overhead may outweigh these gains.
Files
File under embargo until 01-01-2027