How well does GPT-3.5 perform on course assignments from the TU Delft Computer science and engineering Bachelor?

Finding themes in course assignments GPT-3.5 performs well on and does not perform well on

Bachelor Thesis (2023)
Author(s)

M. Segers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Fenia Aivaloglou – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Xiaoling Zhang – Mentor

Tom Viering – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2023
Language
English
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
161
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Since large language models (LLMs) have emerged, they have taken a prominent role in today’s society. From society, they have also found their way into the field of education; that is why in this research paper, we looked into assignments and exams from the TU Delft Computer Science and Engineering bachelor’s programme and assessed which problems Generative Pre-trained Transformer (GPT) version 3.5, the current version used by ChatGPT, performs well on (i.e. at least above a pass rate) and on which problems it performs less well (i.e. below pass rate). For our research, we collected assignments by asking professors for consent to make sure our research was ethically correct. Upon receiving consent, professors had the option to send material, which allowed a deeper analysis, or they could also allow a Brightspace (site where TU Delft courses are hosted) course page scraping. Once all the questions were gathered, we processed them by prompting them into ChatGPT. We gathered the results and categorized them as wrong or right. We did this all with as few modifications to the questions as possible. The only modifications we made were corrections of copy errors from a PDF, for example: C becoming e after copying. From the results, we found that ChatGPT has its limitations, particularly in large code understanding and complex mathematical reasoning. However, the model performed well in defining concepts and connecting different ideas. We suggest that GPT lacks a comprehensive understanding of coding principles, which hinders its ability to comprehend code. Future work could include exploring other LLMs like GPT-4 and comparing their performance. Further work could also look at assignments from other universities, possibly in different educational fields. Additionally, investigating different prompting techniques to enhance the model’s accuracy and reliability could also be done.

Files

CSE3000_Final_Paper_Mike.pdf
(pdf | 0.209 Mb)
License info not available