Design and Implementation of Parallelized AWK

None, None

Design and Implementation of Parallelized AWK

Master Thesis (2026)

Author(s)

I. Kravcevs (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D. Spinellis – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Static analysis Parallel Data aggregation Interpreters Parallel Computing

To reference this document use

https://resolver.tudelft.nl/uuid:608b74d6-0126-4673-8311-05c6d5340f57

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

19-06-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

20

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The project presents the design and implementation of a system for automatic parallelization of AWK programs. AWK remains a widely used language for text processing and data transformation. It is included as a standard utility tool on most Unix-like systems. The execution model of AWK is traditionally sequential, which limits scalability on multi-core hardware. The goal of this work is to investigate whether static program analysis can identify AWK scripts that can be executed in parallel and to integrate this capability into an AWK interpreter.
The proposed solution introduces a static analyzer that evaluates AWK programs based on variable dependencies, control flow, and other behaviors that impact data dependencies. The analyzer identifies reduction patterns for global variables and determines whether program semantics can be preserved under parallel execution. These results are then integrated into the interpreter, which enables deterministic multi-threaded execution.
The project adopts the MapReduce programming model to enable parallel execution of AWK. The main processing phase of a script is treated as the map stage, where independent partitions of the input are processed concurrently by multiple workers. Intermediate thread-local results are then combined in a reduce stage using aggregation strategies derived from static analysis. This model provides a structured way to preserve AWK’s sequential semantics in the parallelized environment.
The implementation was evaluated on a dataset of real-world AWK scripts and through performance benchmarks on large text-processing workloads. The results show that a significant subset of AWK programs can be parallelized automatically, achieving execution speedups and state-of-the-art AWK performance.
The project provides a practical path for improving efficiency in text-processing workflows. This work also demonstrates that scripting languages can often benefit from modern parallel execution techniques, extending their practical relevance and performance in data-processing tasks.

Files

MscThesis_AWK_Kravcevs.pdf

(pdf | 0.374 Mb)

License info not available