MLIR Profiling Framework for Compilation Optimization on Neural-Network Accelerator

None, None

MLIR Profiling Framework for Compilation Optimization on Neural-Network Accelerator

Master Thesis (2025)

Author(s)

H. Wu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.S. Chakraborty – Mentor (TU Delft - Programming Languages)

R.K. Bishnoi – Mentor (TU Delft - Computer Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Computing-in-Memory Static profiling Compiler optimization Multi-Level Interme- diate Representation (MLIR)

To reference this document use

https://resolver.tudelft.nl/uuid:b0acd55f-88c2-4771-818b-a8e843df126a

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

17-10-2025

Awarding Institution

Delft University of Technology

Programme

Electrical Engineering, Embedded Systems

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

51

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Conventional neural network hardware faces dual challenges: the diminishing returns of Moore’s Law and the intensifying constraints of the memory wall. In contrast, an emerging specialized hardware architecture, namely compute-in-memory (CIM), is gaining attention, inherently mitigating the above challenges imposed by its in-memory computing capability. However, unlike the conventional computing hardware ecosystem, which benefits from mature toolchains and well-established methodologies, current CIM architectures still lack robust infrastructure for exploring optimization opportunities and pinpointing performance bottlenecks. To address this, this thesis proposes a two-tier static profiling framework that designs and combines a Multi-Level Intermediate Representation (MLIR) profiler with a hardware-oriented profiler to identify instruction scale, memory pressure, and parallelism ceilings with low compilation overhead. It also introduces a weight-based convolutional adaptation method that effectively addresses non-sequential access issues in convolutional compilation, reducing latency and energy consumption at the cost of additional hardware overhead. Comparative experiments demonstrate that the optimized framework outperforms state-of-the-art conventional hardware in terms of latency and energy consumption for small convolutional neural network models. This result presents an exploratory path for compilation optimization on CIM architectures.

Files

Thesis_huixuan_Final.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 23-02-2027