How well does GPT-3.5 perform on course assignments from the TU Delft Computer science and engineering Bachelor?

None, None

How well does GPT-3.5 perform on course assignments from the TU Delft Computer science and engineering Bachelor?

Finding themes in course assignments GPT-3.5 performs well on and does not perform well on

Bachelor Thesis (2023)

Author(s)

M. Segers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E.A. Aivaloglou – Mentor (TU Delft - Web Information Systems)

Xiaoling Zhang – Mentor

T.J. Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:4f33dfab-289d-435c-a47e-c2d069ee0578

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Since large language models (LLM) have been emerged, they took a present role in today’s soci- ety. From society, they also found their way into the field of education that is why in this research paper, we looked into assignments and exams from the TU Delft Computer science and engineering bach- elor and assessed which problems Generative pre- trained transformer (GPT) version 3.5, the current version used by ChatGPT, performs well on (i.e. at least above a pass rate) and on which problems it performs less good (i.e. below pass rate). For our research, we collected assignments by asking professors for consent, to make sure our research was ethically correct. Upon receiving consent, pro- fessors had the option to send material, which al- lowed a deeper analysis, or they could also allow a Brightspace (site where TU Delft courses are hosted) course page scrapping. Once all the ques- tions were gathered, we processed them by prompt- ing them into ChatGPT. We gathered the results and categorized them as wrong or right. We did this all with as few modifications to the questions as pos- sible. The only modifications we did were correc- tions of copy errors from a PDF, for example: C becoming e after copying. From the results, we found that ChatGPT has its limitations, particularly in large code understanding and complex mathe- matical reasoning. However, the model performed well in defining concepts and connecting different ideas. We suggest that GPT lacks a comprehensive understanding of coding principles, which hinders its ability to comprehend code. Future work could include exploring other LLMs like GPT-4 and com- paring their performance. Further work could also look at assignments from other universities, pos- sibly in different educational fields. Additionally, investigating different prompting techniques to en- hance the model’s accuracy and reliability could be done as well.

Files

CSE3000_Final_Paper_Mike.pdf

(pdf | 0.209 Mb)

License info not available