MLIR Profiling Framework for Compilation Optimization on Neural-Network Accelerator

Master Thesis (2025)
Author(s)

H. Wu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.S. Chakraborty – Mentor (TU Delft - Programming Languages)

R.K. Bishnoi – Mentor (TU Delft - Computer Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
17-10-2025
Awarding Institution
Delft University of Technology
Programme
Electrical Engineering, Embedded Systems
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
51
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Conventional neural network hardware faces dual challenges: the diminishing returns of Moore’s Law and the intensifying constraints of the memory wall. In contrast, an emerging specialized hardware architecture, namely compute-in-memory (CIM), is gaining attention, inherently mitigating the above challenges imposed by its in-memory computing capability. However, unlike the conventional computing hardware ecosystem, which benefits from mature toolchains and well-established methodologies, current CIM architectures still lack robust infrastructure for exploring optimization opportunities and pinpointing performance bottlenecks. To address this, this thesis proposes a two-tier static profiling framework that designs and combines a Multi-Level Intermediate Representation (MLIR) profiler with a hardware-oriented profiler to identify instruction scale, memory pressure, and parallelism ceilings with low compilation overhead. It also introduces a weight-based convolutional adaptation method that effectively addresses non-sequential access issues in convolutional compilation, reducing latency and energy consumption at the cost of additional hardware overhead. Comparative experiments demonstrate that the optimized framework outperforms state-of-the-art conventional hardware in terms of latency and energy consumption for small convolutional neural network models. This result presents an exploratory path for compilation optimization on CIM architectures.

Files

License info not available
warning

File under embargo until 23-02-2027