Clustering Scratch projects by code complexity traits and project traits

Bachelor Thesis (2024)
Author(s)

B.A.J. Meeusen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E.A. Aivaloglou – Mentor (TU Delft - Web Information Systems)

Maria Soledad Pera – Mentor (TU Delft - Web Information Systems)

Jorge Martinez – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Brent Meeusen
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Brent Meeusen
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Scratch is a popular, visual programming language aimed at children, and is used by teachers and after school code clubs to teach their students about programming. Measuring whether they understand the underlying concepts, however, is a difficult task. In this research, we tried clustering Scratch projects by complexity to help students improve their programming skills. We did this by selecting an existing data set to extract features that indicate code complexity. Before, researchers attempted clustering on one metric that globalises the project’s complexity. Different researchers set out to measure the growth of the students by clustering the projects the students created.
With that in mind, we adopt a partition-based clustering algorithm to cluster the projects, as this method indicates outliers. We examine the quality of these clusters using the silhouette coefficient. We set up five experiments with different input vectors to make out the impact each input has on the clusters. We did not find a clear indication of the projects being clustered by the selected features. This could mean that Scratch projects are not suitable to measure a high-level understanding of programming concepts. Including the project name in the input vector had a negligible effect on the outcome of the experiments.

Files

License info not available