Training Strategies for Binary/Ternary Neural Networks

Bachelor Thesis (2026)
Author(s)

R.B. Kiemes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Q. Wang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

B. Refalo – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

I.M. Olkhovskaia – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
26-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
4
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Binary and ternary neural networks offer substantial reductions in memory and computational cost, making them attractive for deployment on resource-constrained devices. Training these networks remains challenging because quantization functions are non-differentiable, requiring gradient approximations such as the Straight-Through Estimator (STE).

This work presents a systematic ablation study of the effects of different training configurations on ResNet-20 on CIFAR-10. We evaluated eleven STE variants and independently examined the effects of weight clipping and batch normalization. All ternary variants perform within 0.73 percentage points of the 91.61% full-precision baseline, with the polynomial STE achieving the best result of 91.23%. For binary, all variants reach 1.66 percentage points below the baseline, with tanh STE being the highest performer (90.35%). We find that the choice of STE has only a minor impact on final accuracy; however, STEs differ in training stability, with smoother estimators providing more consistent convergence.

Batch normalization had the greatest effect on performance; removing it reduced accuracy by up to 8.66 percentage points. Weight clipping yielded a smaller but consistent benefit, with an optimal clipping factor of f = 4.0, improving accuracy by 0.26 and 0.5 percentage points, respectively. Combining these findings, we identified effective training configurations for both ternary and binary networks: the optimal ternary setup (Using Trained Ternary Quantization) achieved 91.52% accuracy on ResNet-20/CIFAR-10, while the optimal binary configuration (Using XNOR-Net quantization) reached 90.78% accuracy, an improvement over prior baselines in both cases.

Files

Kiemes_thesis_final.pdf
(pdf | 5.89 Mb)
License info not available