S.T. Erdweg | TU Delft Repository

Compositional soundness proofs of abstract interpreters

Conference paper (2018) - Sven Keidel, Casper Poulsen, Sebastian Erdweg

Abstract interpretation is a technique for developing static analyses. Yet, proving abstract interpreters sound is challenging for interesting analyses, because of the high proof complexity and proof effort. To reduce complexity and effort, we propose a framework for abstract interpreters that makes their soundness proof compositional. Key to our approach is to capture the similarities between concrete and abstract interpreters in a single shared interpreter, parameterized over an arrow-based interface. In our framework, a soundness proof is reduced to proving reusable soundness lemmas over the concrete and abstract instances of this interface; the soundness of the overall interpreters follows from a generic theorem. To further reduce proof effort, we explore the relationship between soundness and parametricity. Parametricity not only provides us with useful guidelines for how to design non-leaky interfaces for shared interpreters, but also provides us soundness of shared pure functions as free theorems. We implemented our framework in Haskell and developed a k-CFA analysis for PCF and a tree-shape analysis for Stratego. We were able to prove both analyses sound compositionally with manageable complexity and effort, compared to a conventional soundness proof. ...

PIE

A Domain-Specific Language for Interactive Software Development Pipelines

Journal article (2018) - Gabriël Konat, Michael Steindorfer, Sebastian Erdweg, Eelco Visser

Context.
Software development pipelines are used for automating essential parts of software engineering processes, such as build automation and continuous integration testing. In particular, interactive pipelines, which process events in a live environment such as an IDE, require timely results for low-latency feedback, and persistence to retain low-latency feedback between restarts.
Inquiry.
Developing an incrementalized and persistent version of a pipeline is one way to reduce feedback latency, but requires implementation of dependency tracking, cache invalidation, and other complicated and error-prone techniques. Therefore, interactivity complicates pipeline development if timeliness and persistence become responsibilities of the pipeline programmer, rather than being supported by the underlying system. Systems for programming incremental and persistent pipelines exist, but do not focus on ease of development, requiring a high degree of boilerplate, increasing development and maintenance effort.
Approach.
We develop Pipelines for Interactive Environments (PIE), a Domain-Specific Language (DSL), API, and runtime for developing interactive software development pipelines, where ease of development is a focus. The PIE DSL is a statically typed and lexically scoped language. PIE programs are compiled to programs implementing the API, which the PIE runtime executes in an incremental and persistent way.
Knowledge.
PIE provides a straightforward programming model that enables direct and concise expression of pipelines without boilerplate, reducing the development and maintenance effort of pipelines. Compiled pipeline programs can be embedded into interactive environments such as code editors and IDEs, enabling timely feedback at a low cost.
Grounding.
Compared to the state of the art, PIE reduces the code required to express an interactive pipeline by a factor of 6 in a case study on syntax-aware editors. Furthermore, we evaluate PIE in two case studies of complex interactive software development scenarios, demonstrating that PIE can handle complex interactive pipelines in a straightforward and concise way.
Importance.
Interactive pipelines are complicated software artifacts that power many important systems such as continuous feedback cycles in IDEs and code editors, and live language development in language workbenches. New pipelines, and evolution of existing pipelines, is frequently necessary. Therefore, a system for easily developing and maintaining interactive pipelines, such as PIE, is important. ...

Context.
Software development pipelines are used for automating essential parts of software engineering processes, such as build automation and continuous integration testing. In particular, interactive pipelines, which process events in a live environment such as an IDE, require timely results for low-latency feedback, and persistence to retain low-latency feedback between restarts.
Inquiry.
Developing an incrementalized and persistent version of a pipeline is one way to reduce feedback latency, but requires implementation of dependency tracking, cache invalidation, and other complicated and error-prone techniques. Therefore, interactivity complicates pipeline development if timeliness and persistence become responsibilities of the pipeline programmer, rather than being supported by the underlying system. Systems for programming incremental and persistent pipelines exist, but do not focus on ease of development, requiring a high degree of boilerplate, increasing development and maintenance effort.
Approach.
We develop Pipelines for Interactive Environments (PIE), a Domain-Specific Language (DSL), API, and runtime for developing interactive software development pipelines, where ease of development is a focus. The PIE DSL is a statically typed and lexically scoped language. PIE programs are compiled to programs implementing the API, which the PIE runtime executes in an incremental and persistent way.
Knowledge.
PIE provides a straightforward programming model that enables direct and concise expression of pipelines without boilerplate, reducing the development and maintenance effort of pipelines. Compiled pipeline programs can be embedded into interactive environments such as code editors and IDEs, enabling timely feedback at a low cost.
Grounding.
Compared to the state of the art, PIE reduces the code required to express an interactive pipeline by a factor of 6 in a case study on syntax-aware editors. Furthermore, we evaluate PIE in two case studies of complex interactive software development scenarios, demonstrating that PIE can handle complex interactive pipelines in a straightforward and concise way.
Importance.
Interactive pipelines are complicated software artifacts that power many important systems such as continuous feedback cycles in IDEs and code editors, and live language development in language workbenches. New pipelines, and evolution of existing pipelines, is frequently necessary. Therefore, a system for easily developing and maintaining interactive pipelines, such as PIE, is important.

Declarative specification of indentation rules

A tooling perspective on parsing and pretty-printing layout-sensitive languages

Conference paper (2018) - Luís Eduardo de Souza Amorim, Sebastian Erdweg, Michael J. Steindorfer, Eelco Visser

In layout-sensitive languages, the indentation of an expression or statement can influence how a program is parsed. While some of these languages (e.g., Haskell and Python) have been widely adopted, there is little support for software language engineers in building tools for layout-sensitive languages. As a result, parsers, pretty-printers, program anal-yses, and refactoring tools often need to be handwritten, which decreases the maintainability and extensibility of these tools. Even state-of-the-art language workbenches have little support for layout-sensitive languages, restricting the development and prototyping of such languages. In this paper, we introduce a novel approach to declarative specification of layout-sensitive languages using layout declarations. Layout declarations are high-level specifications of indentation rules that abstract from low-level technicalities. We show how to derive an efficient layout-sensitive generalized parser and a corresponding pretty-printer automatically from a language specification with layout declarations. We validate our approach in a case-study using a syntax definition for the Haskell programming language, investigating the performance of the generated parser and the correctness of the generated pretty-printer against 22191 Haskell files. ...

System Description

An Infrastructure for Combining Domain Knowledge with Automated Theorem Provers

Conference paper (2018) - Sylvia Grewe, Sebastian Erdweg, André Pacak, Mira Mezini

Computer science has seen much progress in the area of automated verification in the last decades. Yet, there are many domains where abstract strategies for verifying standard properties are well-understood by domain experts, but still not automated to a satisfactory degree. One example for such a domain are type soundness proofs. Being able to express domain-specific verification strategies using domain-specific terminology and concepts can help to narrow down this gap toward more automated verification. We present the requirements, design, and implementation of a configurable verification infrastructure that allows for expressing domain knowledge about proofs and for interfacing with existing automated theorem provers and solvers to verify individual proof steps. As an application scenario for our infrastructure, we present the development of a standard type soundness proof for a typed subset of SQL. ...

Incrementalizing Lattice-Based Program Analyses in Datalog

Journal article (2018) - Tamas Szabo, Gábor Bergmann, Sebastian Erdweg, Markus Voelter

Program analyses detect errors in code, but when code changes frequently as in an IDE, repeated re-analysis from-scratch is unnecessary: It leads to poor performance unless we give up on precision and recall. Incremental program analysis promises to deliver fast feedback without giving up on precision or recall by deriving a new analysis result from the previous one. However, Datalog and other existing frameworks for incremental program analysis are limited in expressive power: They only support the powerset lattice as representation of analysis results, whereas many practically relevant analyses require custom lattices and aggregation over lattice values. To this end, we present a novel algorithm called DRedL that supports incremental maintenance of recursive lattice-value aggregation in Datalog. The key insight of DRedL is to dynamically recognize increasing replacements of old lattice values by new ones, which allows us to avoid the expensive deletion of the old value. We integrate DRedL into the analysis framework IncA and use IncA to realize incremental implementations of strong-update points-to analysis and string analysis for Java. As our performance evaluation demonstrates, both analyses react to code changes within milliseconds. ...

Incremental Overload Resolution in Object-Oriented Programming Languages

Conference paper (2018) - Tamas Szabo, Edlira Kuci, Matthijs Bijman, Mira Mezini, Sebastian Erdweg

Object-oriented programming languages feature static and dynamic overloading: Multiple methods share the same name but provide different implementations. Dynamic overloading (also know as dynamic dispatch) is resolved at run time based on the type of the receiver object. In this paper, we focus on static overloading, which is resolved at compile time based on the types of the method arguments. The challenge this paper addresses is to incrementalize static overload resolution in IDEs. IDEs resolve overloaded methods for the developer to help them discern which implementation a method call refers to. However, as the code changes, the IDE has to reconsider previously resolved method calls when they are affected by the code change. This paper clarifies when a method call is affected by a code change and how to re-resolve method calls with minimal computational effort. To this end, we explore and compare two approaches to incremental type checking: co-contextual type checking and IncA. ...

Scalable Incremental Building with Dynamic Task Dependencies

Conference paper (2018) - Gabriël Konat, Sebastian Erdweg, Eelco Visser

Incremental build systems are essential for fast, reproducible software builds. Incremental build systems enable short feedback cycles when they capture dependencies precisely and selectively execute build tasks efficiently. A much overlooked feature of build systems is the expressiveness of the scripting language, which directly influences the maintainability of build scripts. In this paper, we present a new incremental build algorithm that allows build engineers to use a full-fledged programming language with explicit task invocation, value and file inspection facilities, and conditional and iterative language constructs. In contrast to prior work on incrementality for such programmable builds, our algorithm scales with the number of tasks affected by a change and is independent of the size of the software project being built. Specifically, our algorithm accepts a set of changed files, transitively detects and re-executes affected build tasks, but also accounts for new task dependencies discovered during building. We have evaluated the performance of our algorithm in a real-world case study and confirm its scalability. ...

Exploration of language specifications by compilation to first-order logic

Journal article (2018) - Sylvia Grewe, Sebastian Erdweg, André Pacak, Michael Raulf, Mira Mezini

Exploration of language specifications helps to discover errors and inconsistencies early during the development of a programming language. We propose exploration of language specifications via application of existing automated first-order theorem provers (ATPs). To this end, we translate language specifications and exploration tasks to first-order logic, which many ATPs accept as input. However, there are several different strategies for compiling a language specification to first-order logic, and even small variations in the translation may have a large impact on the time it takes ATPs to find proofs. In this paper, we first present a systematic empirical study on how to best compile language specifications to first-order logic such that existing ATPs can solve typical exploration tasks efficiently. We have developed a compiler product line that implements 36 different compilation strategies and used it to feed language specifications to 4 existing first-order theorem provers. As benchmarks, we developed language specifications for typed SQL and for a Questionnaire Language (QL), with 50 exploration goals each. Our study empirically confirms that the choice of a compilation strategy greatly influences prover performance in general and shows which strategies are advantageous for prover performance. Second, we extend our empirical study with 4 domain-specific strategies for axiom selection and find that axiom selection does not influence prover performance in our benchmark specifications. ...

Versatile Event Correlation with Algebraic Effects

Journal article (2018) - Oliver Bračevac, Nada Amin, Guido Salvaneschi, Sebastian Erdweg, Patrick Eugster, Mira Mezini

We present the first language design to uniformly express variants of n-way joins over asynchronous event streams from different domains, e.g., stream-relational algebra, event processing, reactive and concurrent programming. We model asynchronous reactive programs and joins in direct style, on top of algebraic effects and handlers. Effect handlers act as modular interpreters of event notifications, enabling fine-grained control abstractions and customizable event matching. Join variants can be considered as cartesian product computations with ”degenerate” control flow, such that unnecessary tuples are not materialized a priori. Based on this computational interpretation, we decompose joins into a generic, naive enumeration procedure of the cartesian product, plus variant-specific extensions, represented in terms of user-supplied effect handlers. Our microbenchmarks validate that this extensible design avoids needless materialization. Alongside a formal semantics for joining and prototypes in Koka and multicore OCaml, we contribute a systematic comparison of the covered domains and features.
...

A Module-System Discipline for Model-Driven Software Development

Journal article (2017) - Sebastian Erdweg, Klaus Ostermann

Model-driven development is a pragmatic approach to software development that embraces domain-specific languages (DSLs), where models correspond to DSL programs. A distinguishing feature of model-driven development is that clients of a model can select from an open set of alternative semantics of the model by applying different model transformation. However, in existing model-driven frameworks, dependencies between models, model transformations, and generated code artifacts are either implicit or globally declared in build scripts, which impedes modular reasoning, separate compilation, and programmability in general.
We propose the design of a new module system that incorporates models and model transformations as modules. A programmer can apply transformations in import statements, thus declaring a dependency on generated code artifacts. Our design enables modular reasoning and separate compilation by preventing hidden dependencies, and it supports mixing modeling artifacts with conventional code artifacts as well as higher-order transformations. We have formalized our design and the aforementioned properties and have validated it by an implementation and case studies that show that our module system successfully integrates model-driven development into conventional programming languages. ...

Toward Abstract Interpretation of Program Transformations

Conference paper (2017) - Sven Keidel, Sebastian Erdweg

Developers of program transformations often reason about transformations to assert certain properties of the generated code. We propose to apply abstract interpretation to program transformations in order to automate and support such reasoning. In this paper, we present work in progress on the development and application of an abstract interpreter for the program transformation language Stratego. In particular, we present challenges encountered during the development of the abstract Stratego interpreter and how we intend to solve these challenges. ...

A Co-contextual Type Checker for Featherweight Java

Conference paper (2017) - Edlira Kuci, Sebastian Erdweg, Oliver Bračevac, Andi Bejleri, Mira Mezini

This paper addresses compositional and incremental type checking for object-oriented programming languages. Recent work achieved incremental type checking for structurally typed functional languages through co-contextual typing rules, a constraint-based formulation that removes any context dependency for expression typings. However, that work does not cover key features of object-oriented languages: Subtype polymorphism, nominal typing, and implementation inheritance. Type checkers encode these features in the form of class tables, an additional form of typing context inhibiting incrementalization. In the present work, we demonstrate that an appropriate co-contextual notion to class tables exists, paving the way to efficient incremental type checkers for object-oriented languages. This yields a novel formulation of Igarashi et al.'s Featherweight Java (FJ) type system, where we replace class tables by the dual concept of class table requirements and class table operations by dual operations on class table requirements. We prove the equivalence of FJ's type system and our co-contextual formulation. Based on our formulation, we implemented an incremental FJ type checker and compared its performance against javac on a number of realistic example programs. ...

IncA

A DSL for the Definition of Incremental Program Analyses

Conference paper (2016) - Tamás Szabó, Sebastian Erdweg, Markus Voelter

Program analyses support software developers, for example, through error detection, code-quality assurance, and by enabling compiler optimizations and refactorings. To provide real-time feedback to developers within IDEs, an analysis must run efficiently even if the analyzed code base is large.

To achieve this goal, we present a domain-specific language called IncA for the definition of efficient incremental program analyses that update their result as the program changes. IncA compiles analyses into graph patterns and relies on existing incremental matching algorithms. To scale IncA analyses to large programs, we describe optimizations that reduce caching and prune change propagation. Using IncA, we have developed incremental control flow and points-to analysis for C, well-formedness checks for DSLs, and 10 FindBugs checks for Java. Our evaluation demonstrates significant speedups for all analyses compared to their non-incremental counterparts. ...

An Extensible Framework for Variable-Precision Data-flow Analyses in MPS

Conference paper (2016) - Tamás Szabó, Simon Alperovich, Markus Voelter, Sebastian Erdweg

Data-flow analyses are used as part of many software engineering tasks: they are the foundations of program under- standing, refactorings and optimized code generation. Similar to general-purpose languages (GPLs), state-of-the-art domain-specific languages (DSLs) also require sophisticated data-flow analyses. However, as a consequence of the different economies of DSL development and their typically relatively fast evolution, the effort for developing and evolving such analyses must be lowered compared to GPLs. This tension can be resolved with dedicated support for data-flow analyses in language workbenches. In this tool paper we present MPS-DF, which is the component in the MPS language workbench that supports the definition of data-flow analyses for DSLs. Language developers can define data-flow graph builders declaratively as part of a language definition and compute analysis results efficiently based on these data-flow graphs. MPS-DF is extensible such that it does not compromise the support for language composition in MPS. Additionally, clients of MPS-DF analyses can run the analyses with variable precision thus trading off precision for performance. This allows clients to tailor an analysis to a particular use case. ...

CPL

A Core Language for Cloud Computing

Conference paper (2016) - Oliver Bračevac, Sebastian Erdweg, Guido Salvaneschi, Mira Mezini

Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. However, existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not provide extension points for custom implementations of existing cloud services such as application-specific load balancing policies. To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We provide a formal semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance. ...

Principled Syntactic Code Completion using Placeholders

Conference paper (2016) - Luís Eduardo De Souza Amorim, Sebastian Erdweg, Guido Wachsmuth, Eelco Visser

Principled syntactic code completion enables developers to change source code by inserting code templates, thus increasing developer efficiency and supporting language exploration. However, existing code completion systems are ad-hoc and neither complete nor sound. They are not complete and only provide few code templates for selected programming languages. They also are not sound and propose code templates that yield invalid programs when inserted. This paper presents a generic framework that automatically derives complete and sound syntactic code completion from the syntax definition of arbitrary languages. A key insight of our work is to provide an explicit syntactic representation for incomplete programs using placeholders. This enables us to address the following challenges for code completion separately: (i) completing incomplete programs by replacing placeholders with code templates, (ii) injecting placeholders into complete programs to make them incomplete, and (iii) introducing lexemes and placeholders into incorrect programs through error-recovery parsing to make them correct so we can apply one of the previous strategies. We formalize our framework and provide an implementation in Spoofax. ...

Automating Proof Steps of Progress Proofs

Comparing Vampire and Dafny

Conference paper (2016) - Sylvia Grewe, Sebastian Erdweg, Mira Mezini

Developing provably sound type systems is a non-trivial task which, as of today, typically requires expert skills in formal methods and a considerable amount of time. Our Veritas [3] project aims at providing support for the development of soundness proofs of type systems and efficient type checker implementations from type system specifications. To this end, we investigate how to best automate typical steps within type soundness proofs. In this paper, we focus on progress proofs for type systems of domain-specific languages. As a running example for such a type system, we model a subset SQL and augment it with a type system. We compare two different approaches for automating proof steps of the progress proofs for this type system against each other: firstly, our own tool Veritas, which translates proof goals and specifications automatically to TPTP [13] and calls Vampire [8] on them, and secondly, the programming language Dafny [6], which translates proof goals and specifications to the intermediate verification language Boogie 2 [5] and calls the SMT solver Z3 [9] on them. We find that Vampire and Dafny are equally well-suited for automatically proving simple steps within progress proofs. ...

Sound Type-Dependent Syntactic Language Extension

Journal article (2016) - Florian Lorenzen, Sebastian Erdweg

Syntactic language extensions can introduce new facilities into a programming language while requiring little implementation effort and modest changes to the compiler. It is typical to desugar language extensions in a distinguished compiler phase after parsing or type checking, not affecting any of the later compiler phases. If desugaring happens before type checking, the desugaring cannot depend on typing information and type errors are reported in terms of the generated code. If desugaring happens after type checking, the code generated by the desugaring is not type checked and may introduce vulnerabilities. Both options are undesirable. We propose a system for syntactic extensibility where desugaring happens after type checking and desugarings are guaranteed to only generate well-typed code. A major novelty of our work is that desugarings operate on typing derivations instead of plain syntax trees. This provides desugarings access to typing information and forms the basis for the soundness guarantee we provide, namely that a desugaring generates a valid typing derivation. We have implemented our system for syntactic extensibility in a language-independent fashion and instantiated it for a substantial subset of Java, including generics and inheritance. We provide a sound Java extension for Scala-like for-comprehensions. ...

Exploration of Language Specifications by Compilation to First-order Logic

Conference paper (2016) - Sylvia Grewe, Sebastian Erdweg, Michael Raulf, Mira Mezini

Exploration of language specifications helps to discover errors and inconsistencies early during the development of a programming language. We propose exploration of language specifications via application of existing automated first-order theorem provers (ATPs). To this end, we translate language specifications and exploration tasks to first-order logic, which many ATPs accept as input. However, there are several different strategies for compiling a language specification to first-order logic, and even small variations in the translation may have a large impact on the time it takes ATPs to find proofs. In this paper, we present a systematic empirical study on how to best compile language specifications to first-order logic such that existing ATPs can solve typical exploration tasks efficiently. We have developed a compiler product line that implements 36 different compilation strategies and used it to feed language specifications to 4 existing first-order theorem provers. As a benchmark, we developed a language specification for typed SQL with 50 exploration goals. Our study empirically confirms that the choice of a compilation strategy in general greatly influences prover performance and shows which strategies are advantageous for prover performance. ...

Efficient Development of Consistent Projectional Editors Using Grammar Cells

Conference paper (2016) - Markus Voelter, Tamás Szabó, Sascha Lisson, Bernd Kolb, Sebastian Erdweg, Thorsten Berger

The definition of a projectional editor does not just specify the notation of a language, but also how users interact with the notation. Because of that it is easy to end up with different interaction styles within one and between multiple languages. The resulting inconsistencies have proven to be a major usability problem. To address this problem, we introduce grammar cells, an approach for declaratively specifying textual notations and their interactions for projectional editors. In the paper we motivate the problem, give a formal definition of grammar cells, and define their mapping to low-level editor behaviors. Our evaluation based on project experience shows that grammar cells improve editing experience by providing a consistent and intuitive ``text editor-like'' user experience for textual notations. At the same time they do not limit language composability and the use of non-textual notations, the primary benefits of projectional editors. We have implemented grammar cells for Jetbrains MPS, but they can also be used with other projectional editors. ...