J. Smits | TU Delft Repository

A Multi-Stage Proof Logging Framework to Certify the Correctness of CP Solvers

Conference paper (2024) - Maarten Flippo, Konstantin Sidorov, Imko Marijnissen, Jeff Smits, Emir Demirović

Proof logging is used to increase trust in the optimality and unsatisfiability claims of solvers. However, to this date, no constraint programming solver can practically produce proofs without significantly impacting performance, which hinders mainstream adoption. We address this issue by introducing a novel proof generation framework, together with a CP proof format and proof checker. Our approach is to divide the proof generation into three steps. At runtime, we require the CP solver to only produce a proof sketch, which we call a scaffold. After the solving is done, our proof processor trims and expands the scaffold into a full CP proof, which is subsequently verified. Our framework is agnostic to the solver and the verification approach. Through MiniZinc benchmarks, we demonstrate that with our framework, the overhead of logging during solving is often less than 10%, significantly lower than other approaches, and that our proof processing step can reduce the overall size of the proof by orders of magnitude and by extension the proof checking time. Our results demonstrate that proof logging has the potential to become an integral part of the CP community. ...

Strategic Language Workbench Improvements

Doctoral thesis (2023) - J. Smits

Computers execute software to do the tasks we expect from them. This software is written by human beings, we call this programming. The most common way to program is by writing text in a programming language. A programming language is very structured so we can be precise, but ultimately these languages are still for humans to read and write. In order to execute the written program, we need to translate it to a list of tiny instruction steps that the hardware of the computer can execute. This translation is also automated with software. The most common forms this software takes is (1) interpreters that execute a program live as they read it, or (2) compilers that translate the entire program for later execution. Interpreters and compilers are tools of the domain of Programming Languages (PL). Apart from interpreters and compilers, there is more support software available around programming languages. This includes smart text editors, program analysis, running-program observers, etc. The requirements for PL tools are high: they should not get in the way when used to create software. In particular, they should support useful features, be fast enough in interaction, and not make mistakes. Given these requirements, it is not a simple task to make PL tools. In an effort to make it easier to create PL tools, Language Workbenches (LWBs) were created: a suite of tools specifically for creating PL tools. In this dissertation, you can find several improvements I made to a particular language workbench. I have—in multiple ways—sped up the language development cycle in this workbench: in terms of improved development, feedback, and execution speed. Throughout my research, I have worked on and in the Spoofax language workbench, a research language workbench used for programming language research at TU Delft. Spoofax splits up the specification of programming languages into different domains, and captures each of those domains in a meta-language. For example, to describe the structure of the text of a programming language, Spoofax uses a formalism based on context-free grammars, extended with different useful features, which is called the Syntax Definition Formalism 3 or SDF3 for short. Similarly, there are meta-languages for the description of names, references and types; for what it means to execute a program; for defining assumptions and behaviour by example for testing purposes; and for transforming programs, which is a catch-all, but still a fairly high-level language. This language for transforming programs, called Stratego, is particularly relevant to this dissertation. Contributions. Firstly, we introduce a new meta-language specialised in control- and data-flow analysis: FlowSpec. FlowSpec improves the development speed of programming languages in Spoofax, and the feedback in Spoofax and in the PL tools generated by Spoofax.Secondly, we improve the compilation speed of Stratego on successive compilations with an incremental compiler. This compiler improves the speed at which you receive feedback inside Spoofax on changes to a Stratego program, and the speed at which you can see the results of tests and other short program executions after a change. Thirdly, we add a gradual type system to Stratego to improve the feedback that can be given without executing Stratego programs. A gradual type system does not require a user of Stratego to add types to their program, but if they choose to, the gradual type system will be able to reason about the parts of the program that are typed, and give certain errors at compilation time instead of run time. Finally, we develop a pattern matching optimisation that work for Stratego’s pattern matching. This improves the execution speed of Stratego programs. Since all PL tools created in Spoofax include at least some of those Stratego programs, this also speeds up the execution of the Spoofax meta-languages themselves. ...

Computers execute software to do the tasks we expect from them. This software is written by human beings, we call this programming. The most common way to program is by writing text in a programming language. A programming language is very structured so we can be precise, but ultimately these languages are still for humans to read and write. In order to execute the written program, we need to translate it to a list of tiny instruction steps that the hardware of the computer can execute. This translation is also automated with software. The most common forms this software takes is (1) interpreters that execute a program live as they read it, or (2) compilers that translate the entire program for later execution. Interpreters and compilers are tools of the domain of Programming Languages (PL). Apart from interpreters and compilers, there is more support software available around programming languages. This includes smart text editors, program analysis, running-program observers, etc. The requirements for PL tools are high: they should not get in the way when used to create software. In particular, they should support useful features, be fast enough in interaction, and not make mistakes. Given these requirements, it is not a simple task to make PL tools. In an effort to make it easier to create PL tools, Language Workbenches (LWBs) were created: a suite of tools specifically for creating PL tools. In this dissertation, you can find several improvements I made to a particular language workbench. I have—in multiple ways—sped up the language development cycle in this workbench: in terms of improved development, feedback, and execution speed. Throughout my research, I have worked on and in the Spoofax language workbench, a research language workbench used for programming language research at TU Delft. Spoofax splits up the specification of programming languages into different domains, and captures each of those domains in a meta-language. For example, to describe the structure of the text of a programming language, Spoofax uses a formalism based on context-free grammars, extended with different useful features, which is called the Syntax Definition Formalism 3 or SDF3 for short. Similarly, there are meta-languages for the description of names, references and types; for what it means to execute a program; for defining assumptions and behaviour by example for testing purposes; and for transforming programs, which is a catch-all, but still a fairly high-level language. This language for transforming programs, called Stratego, is particularly relevant to this dissertation. Contributions. Firstly, we introduce a new meta-language specialised in control- and data-flow analysis: FlowSpec. FlowSpec improves the development speed of programming languages in Spoofax, and the feedback in Spoofax and in the PL tools generated by Spoofax.Secondly, we improve the compilation speed of Stratego on successive compilations with an incremental compiler. This compiler improves the speed at which you receive feedback inside Spoofax on changes to a Stratego program, and the speed at which you can see the results of tests and other short program executions after a change. Thirdly, we add a gradual type system to Stratego to improve the feedback that can be given without executing Stratego programs. A gradual type system does not require a user of Stratego to add types to their program, but if they choose to, the gradual type system will be able to reason about the parts of the program that are typed, and give certain errors at compilation time instead of run time. Finally, we develop a pattern matching optimisation that work for Stratego’s pattern matching. This improves the execution speed of Stratego programs. Since all PL tools created in Spoofax include at least some of those Stratego programs, this also speeds up the execution of the Spoofax meta-languages themselves.

Optimising First-Class Pattern Matching

Conference paper (2022) - J. Smits, Toine Hartman, Jesper Cockx

Pattern matching is a high-level notation for programs to analyse the shape of data, and can be optimised to efficient low-level instructions. The Stratego language uses first-class pattern matching, a powerful form of pattern matching that traditional optimisation techniques do not apply to directly.
In this paper, we investigate how to optimise programs that use first-class pattern matching. Concretely, we show how to map first-class pattern matching to a form close to traditional pattern matching, on which standard optimisations can be applied.
Through benchmarks, we demonstrate the positive effect of these optimisations on the run-time performance of Stratego programs. We conclude that the expressive power of first-class pattern matching does not hamper the optimisation potential of a language that features it. ...

Constructing Hybrid Incremental Compilers for Cross-Module Extensibility with an Internal Build System

Journal article (2020) - J. Smits, G.D.P. Konat, Eelco Visser

Context Compilation time is an important factor in the adaptability of a software project. Fast recompilation enables cheap experimentation with changes to a project, as those changes can be tested quickly. Separate and incremental compilation has been a topic of interest for a long time to facilitate fast recompilation. Inquiry Despite the benefits of an incremental compiler, such compilers are usually not the default. This is because incrementalization requires cross-cutting, complicated, and error-prone techniques such as dependency tracking, caching, cache invalidation, and change detection. Especially in compilers for languages with cross-module definitions and integration, correctly and efficiently implementing an incremental compiler can be a challenge. Retrofitting incrementality into a compiler is even harder. We address this problem by developing a compiler design approach that reuses parts of an existing non-incremental compiler to lower the cost of building an incremental compiler. It also gives an intuition into compiling difficult-to-incrementalize language features through staging. Approach We use the compiler design approach presented in this paper to develop an incremental compiler for the Stratego term-rewriting language. This language has a set of features that at first glance look incompatible with incremental compilation. Therefore, we treat Stratego as our critical case to demonstrate the approach on. We show how this approach decomposes the original compiler and has a solution to compile Stratego incrementally. The key idea on which we build our incremental compiler is to internally use an incremental build system to wire together the components we extract from the original compiler. Knowledge The resulting compiler is already in use as a replacement of the original whole-program compiler. We find that the incremental build system inside the compiler is a crucial component of our approach. This allows a compiler writer to think in multiple steps of compilation, and combine that into an incremental compiler almost effortlessly. Normally, separate compilation à la C is facilitated by an external build system, where the programmer is responsible for managing dependencies between files. We reuse an existing sound and optimal incremental build system, and integrate its dependency tracking into the compiler. Grounding The incremental compiler for Stratego is available as an artefact along with this article. We evaluate it on a large Stratego project to test its performance. The benchmark replays edits to the Stratego project from version control. These benchmarks are part of the artefact, packaged as a virtual machine image for easy reproducibility. Importance Although we demonstrate our design approach on the Stratego programming language, we also describe it generally throughout this paper. Many currently used programming languages have a compiler that is much slower than necessary. Our design provides an approach to change this, by reusing an existing compiler and making it incremental within a reasonable amount of time. ...

Context Compilation time is an important factor in the adaptability of a software project. Fast recompilation enables cheap experimentation with changes to a project, as those changes can be tested quickly. Separate and incremental compilation has been a topic of interest for a long time to facilitate fast recompilation. Inquiry Despite the benefits of an incremental compiler, such compilers are usually not the default. This is because incrementalization requires cross-cutting, complicated, and error-prone techniques such as dependency tracking, caching, cache invalidation, and change detection. Especially in compilers for languages with cross-module definitions and integration, correctly and efficiently implementing an incremental compiler can be a challenge. Retrofitting incrementality into a compiler is even harder. We address this problem by developing a compiler design approach that reuses parts of an existing non-incremental compiler to lower the cost of building an incremental compiler. It also gives an intuition into compiling difficult-to-incrementalize language features through staging. Approach We use the compiler design approach presented in this paper to develop an incremental compiler for the Stratego term-rewriting language. This language has a set of features that at first glance look incompatible with incremental compilation. Therefore, we treat Stratego as our critical case to demonstrate the approach on. We show how this approach decomposes the original compiler and has a solution to compile Stratego incrementally. The key idea on which we build our incremental compiler is to internally use an incremental build system to wire together the components we extract from the original compiler. Knowledge The resulting compiler is already in use as a replacement of the original whole-program compiler. We find that the incremental build system inside the compiler is a crucial component of our approach. This allows a compiler writer to think in multiple steps of compilation, and combine that into an incremental compiler almost effortlessly. Normally, separate compilation à la C is facilitated by an external build system, where the programmer is responsible for managing dependencies between files. We reuse an existing sound and optimal incremental build system, and integrate its dependency tracking into the compiler. Grounding The incremental compiler for Stratego is available as an artefact along with this article. We evaluate it on a large Stratego project to test its performance. The benchmark replays edits to the Stratego project from version control. These benchmarks are part of the artefact, packaged as a virtual machine image for easy reproducibility. Importance Although we demonstrate our design approach on the Stratego programming language, we also describe it generally throughout this paper. Many currently used programming languages have a compiler that is much slower than necessary. Our design provides an approach to change this, by reusing an existing compiler and making it incremental within a reasonable amount of time.

Gradually typing strategies

Conference paper (2020) - J. Smits, Eelco Visser

The Stratego language supports program transformation by means of term rewriting with programmable rewriting strategies. Stratego's traversal primitives support concise definition of generic tree traversals. Stratego is a dynamically typed language because its features cannot be captured fully by a static type system. While dynamic typing makes for a flexible programming model, it also leads to unintended type errors, code that is harder to maintain, and missed opportunities for optimization. In this paper, we introduce a gradual type system for Stratego that combines the flexibility of dynamically typed generic programming, where needed, with the safety of statically declared and enforced types, where possible. To make sure that statically typed code cannot go wrong, all access to statically typed code from dynamically typed code is protected by dynamic type checks (casts). The type system is backwards compatible such that types can be introduced incrementally to existing Stratego programs. We formally define a type system for Core Gradual Stratego, discuss its implementation in a new type checker for Stratego, and present an evaluation of its impact on Stratego programs. ...

FlowSpec

A declarative specification language for intra-procedural flow-Sensitive data-flow analysis

Journal article (2020) - Jeff Smits, Guido Wachsmuth, Eelco Visser

Data-flow analysis is the static analysis of programs to estimate their approximate run-time behavior or approximate intermediate run-time values. It is an integral part of modern language specifications and compilers. In the specification of static semantics of programming languages, the concept of data-flow allows the description of well-formedness such as definite assignment of a local variable before its first use. In the implementation of compiler back-ends, data-flow analyses inform optimizations. Data-flow analysis has an established theoretical foundation. What lags behind is implementations of data-flow analysis in compilers, which are usually ad-hoc. This makes such implementations difficult to extend and maintain. In previous work researchers have proposed higher-level formalisms suitable for whole-program analysis in a separate tool, incremental analysis within editors, or bound to a specific intermediate representation. In this paper, we present FlowSpec, an executable formalism for specification of data-flow analysis. FlowSpec is a domain-specific language that enables direct and concise specification of data-flow analysis for programming languages, designed to express flow-sensitive, intra-procedural analyses. We define the formal semantics of FlowSpec in terms of monotone frameworks. We describe the design of FlowSpec using examples of standard analyses. We also include a description of our implementation of FlowSpec. In a case study we evaluate FlowSpec with the static analyses for Green-Marl, a domain-specific programming language for graph analytics. ...

FlowSpec

Declarative Dataflow Analysis Specification

Conference paper (2017) - Jeff Smits, Eelco Visser

We present FlowSpec, a declarative specification language for the domain of dataflow analysis. FlowSpec has declarative support for the specification of control flow graphs of programming languages, and dataflow analyses on these control flow graphs. We define the formal semantics of FlowSpec, which is rooted in Monotone Frameworks. We also discuss a prototype implementation of the language, built in the Spoofax Language Workbench. Finally, we evaluate the expressiveness and conciseness of the language with two case studies. These case studies are performed using Green-Marl, an industrial, domain-specific language for graph processing. The first case study is a classical dataflow analysis, scaled to this full language. The second case study is a domain-specific analysis of Green-Marl. ...