Adapting Mono-Forward with Zeroth-Order Gradient Estimation for Automatic Differentiation-Free Training

None, None

Adapting Mono-Forward with Zeroth-Order Gradient Estimation for Automatic Differentiation-Free Training

Bachelor Thesis (2026)

Author(s)

A. Görpelioğlu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Stephanie Tan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Yaqi Guo – Mentor (TU Delft - Mechanical Engineering)

R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Backpropagation-free learning Directional Derivatives Zeroth Order Optimization

To reference this document use

https://resolver.tudelft.nl/uuid:2892b6b3-6deb-4112-8e5f-c090c7653790

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

24-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The aim of this paper is to explore the potential of adapting the Mono-Forward algorithm with Zeroth-Order Optimization for backpropagation (BP) and automatic-differentiation(AD)-free image classification, assessing its feasibility in scenarios where exact gradients are unavailable. The Mono-Forward method introduces a novel approach to training neural networks without the need for backpropagation or multiple forward passes typically required in forward-forward algorithms; however it still relies on AD for local training of model layers when implemented with modern deep learning frameworks. This work proposes MF+DD, which replaces AD in Mono-Forward with zeroth-order gradient estimation via directional derivatives, resulting in a training algorithm that is free of AD and global BP. This paper also introduces a random projection based modification to adress the limitation of Mono-Forward in architectures with large intermediate activation tensors, for increased computational efficiency. Experiments on MNIST, FashionMNIST, CIFAR-10, and CIFAR-100 with both MLP and CNN architectures show that MF+DD achieves comparable accuracy to MF with AD on simpler datasets, while the accuracy gap widens on more complex benchmarks, suggesting that the noise introduced by the directional derivative estimator becomes more impactful as task difficulty increases. Results further show that increasing the number of perturbation directions P improves both accuracy and training stability with a downside of increased computational cost.

Files

Ates-monodd_2.pdf

(pdf | 0.515 Mb)

License info not available