Designing a Programmable Stream Editor
Diomidis Spinellis (Athens University of Economics and Business, TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The Unix sed stream editor is a programmable text processing filter, first written in C in the 1970s.1 In the previous installment of this column, I described the tool and how I reimplemented it in Rust with some help from generative AI.2 Here, I describe the new implementation’s design to show how we can create a simple programmable tool. In the next “Adventures in Code” column, I’ll present optimizations that substantially increased the tool’s throughput.
What’s behind a programmable tool, such as Python, sed, or SQLite? It turns out that the key component is the data structures used to represent the code. As a student, I came across the equation “Algorithms + Data Structures = Programs.”3 I can still remember that at the time I knew what algorithms were, but data structures were to me a fuzzy concept of academic only interest. “Why would anyone need something more than the numeric and string arrays supported by the BASIC language?” I thought. Over the years, I’ve come to appreciate the value and significance of how we organize our data. Now, I believe that data structures are far more important than algorithms. First, in modern systems, bespoke sophisticated algorithms play only a minor, if any, role; data structures rule! Second, in many cases the algorithm’s choice is a clear-cut decision, whereas deciding on the data structure to use requires a deep understanding of context and a careful weighing of tradeoffs. Third, type systems have matured to offer us powerful practical aid in dealing with data; corresponding formal models for algorithms less so.
Consequently, the design of many systems should often start by considering how data will be structured: a database’s schema, key classes and their fields, types and operations on them.