Declarative Syntax Definition for Modern Language Workbenches

More Info


Programming languages are one of the key components of computer science,
allowing programmers to control, define, and change the behaviour of computer
systems. However, programming languages require considerable effort to design, implement, and maintain. Fortunately, declarative approaches can be used to define programming languages facilitating their development and implementation.
Commonly, the first step to define a programming language consists of specifying its syntax. Syntax definition formalisms are based on grammars, defining rules that specify the words that belong to a language and how these words must be structured to construct valid programs. Grammars are multipurpose, i.e., they provide an understandable source of documentation,
and can also be used to derive language implementations. Language workbenches assist language engineers to develop and prototype programming languages by deriving syntactic services from a syntax definition formalism.
Many challenging problems still exist when using declarative syntax definitions
in a language workbench. To enable truly declarative syntax definitions, parsers and other tools must support grammars in their natural form, i.e., they must be able to handle ambiguous grammars. Parsers that support ambiguous grammars lack a clear semantics for disambiguation, restricting their parsing performance and the languages they can successfully implement. Complementary to parsing, editor services such as pretty-printing and code completion, often need to be implemented by hand, increasing the cost of maintaining and evolving a language.
Our goal is to use declarative syntax definitions to effectively define the syntax of programming languages and generate efficient tools. To address the above problems, we propose a new semantics for disambiguating context-free grammars, particularly the subset of grammars that define expressions. We study how often these ambiguities occur in real programs, showing the need for efficient disambiguation. Finally, we implement this semantics, generating a parser that performs disambiguation with near-zero performance overhead.
Moreover, we develop a technique to automatically derive parsers and pretty printers for layout-sensitive languages from the syntax definition. By enabling the declarative specification of layout-sensitive languages, we tackle important issues, including usability, performance, and tool support, which prevent the adoption of these languages in tools such as language workbenches.
Finally, we propose a principled approach to derive syntactic code completion from the syntax definition. The current implementation of completion services is often ad-hoc, unsound, and incomplete. By using a principled approach, we are able to reason about soundness and completeness of code completion, opening up a path to richer editing services in language implementations.