Training Strategies for Binary/Ternary Neural Networks

None, None

Training Strategies for Binary/Ternary Neural Networks

Bachelor Thesis (2026)

Author(s)

R.B. Kiemes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Q. Wang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

B. Refalo – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

I.M. Olkhovskaia – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use

https://resolver.tudelft.nl/uuid:9622a209-0f1a-4d7c-888f-5ffd251b6a80

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

4

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Binary and ternary neural networks offer substantial reductions in memory and computational cost, making them attractive for deployment on resource-constrained devices. Training these networks remains challenging because quantization functions are non-differentiable, requiring gradient approximations such as the Straight-Through Estimator (STE).

This work presents a systematic ablation study of the effects of different training configurations on ResNet-20 on CIFAR-10. We evaluated eleven STE variants and independently examined the effects of weight clipping and batch normalization. All ternary variants perform within 0.73 percentage points of the 91.61% full-precision baseline, with the polynomial STE achieving the best result of 91.23%. For binary, all variants reach 1.66 percentage points below the baseline, with tanh STE being the highest performer (90.35%). We find that the choice of STE has only a minor impact on final accuracy; however, STEs differ in training stability, with smoother estimators providing more consistent convergence.

Batch normalization had the greatest effect on performance; removing it reduced accuracy by up to 8.66 percentage points. Weight clipping yielded a smaller but consistent benefit, with an optimal clipping factor of f = 4.0, improving accuracy by 0.26 and 0.5 percentage points, respectively. Combining these findings, we identified effective training configurations for both ternary and binary networks: the optimal ternary setup (Using Trained Ternary Quantization) achieved 91.52% accuracy on ResNet-20/CIFAR-10, while the optimal binary configuration (Using XNOR-Net quantization) reached 90.78% accuracy, an improvement over prior baselines in both cases.

Files

Kiemes_thesis_final.pdf

(pdf | 5.89 Mb)

License info not available