Efficient Embedded Intelligence

None, None

Efficient Embedded Intelligence

Exploring the Width-Precision Trade-Off in Binary-Quantized Vision Transformers

Bachelor Thesis (2026)

Author(s)

I.S. van Loon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B. Refalo – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Q. Wang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

I.M. Olkhovskaia – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Learning Computer Vision Embedded Quantization Binary Neural Networks Transformer Model

To reference this document use

https://resolver.tudelft.nl/uuid:8a789372-7f74-4edb-9c9d-b576bb92af50

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

4

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Vision Transformers perform strongly across computer vision tasks but often require too much compute and memory for embedded deployment. Binary quantization cuts these costs by constraining weights and activations to a single bit, at the expense of accuracy. We investigate whether the budget freed by binarization can be reinvested into additional model width to recover that lost accuracy. Using the BHViT-Tiny architecture on the Oxford-IIIT Pet dataset, we first isolate the accuracy gap caused by quantization alone by comparing a full-precision reference against its binarized counterpart at identical width, and then scale width within the freed budget to measure how much of this gap can be recovered by width. We find that binarization at the base width costs 7.1 points of Top-1 accuracy, and that tripling the width recovers 4.9 of these points while remaining at a theoretical 3.5× and 6.7× reduction in memory and compute relative to the full-precision reference. The wider binary model thus approaches full-precision accuracy at a fraction of its cost. Additionally, keeping the downsampling layers in full precision recovers a further 1.1 points at a cost still well within budget, narrowing the gap to 1.1 points and indicating that part of the residual loss stems from a precision bottleneck rather than from a global lack of capacity. Our results establish width scaling as an effective strategy for reducing the binarization accuracy gap, offering a promising path toward the resource-constrained deployment of Vision Transformers.

Files

RP_Final_Research_Paper_Ivar_v... (pdf)

(pdf | 0.326 Mb)

License info not available