Teaching Gradient Descent
An Exploratory Study on Classic Textbook vs. Multiple Representations Approaches
F. Severin (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Gosia Migut – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Ilinca Rențea – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Jorge Martinez Castaneda – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Machine learning is increasingly important in computer science education, but introductory concepts can be difficult because they combine mathematical notation, algorithmic reasoning, and conceptual understanding. Gradient descent is one such concept: students may reproduce the update rule while still struggling to explain the role of the loss function, gradient, learning rate, and repeated parameter updates.
This paper investigates whether multiple representations can support beginner understanding compared with a classic textbook-style explanation.
A small-scale exploratory experiment was conducted with students who had little or no prior machine learning experience. Participants completed a prerequisite pre-test, studied gradient descent using either a text-and-formula-based explanation or a multiple-representations explanation, completed a post-test, and answered an experience survey.
The multiple-representations condition showed higher post-test performance, especially on computation and application tasks, as well as higher confidence, clarity, usefulness, and engagement. Perceived cognitive load remained similar across conditions.
These findings suggest that aligned multiple representations can help beginners connect formal notation with concrete calculations and intuitive understanding, although the results should be interpreted cautiously because of the small sample size.