Empirical assessment of ChatGPT’s answering capabilities in natural science and engineering

None, None; None, None; None, None; None, None; None, None; None, None

Empirical assessment of ChatGPT’s answering capabilities in natural science and engineering

Journal Article (2024)

Author(s)

L. Schulze Balhorn (TU Delft - ChemE/Process Systems Engineering)

J. Weber (TU Delft - Pattern Recognition and Bioinformatics)

S.N.R. Buijsman (TU Delft - Ethics & Philosophy of Technology)

Julian R. Hildebrandt (RWTH Aachen University)

Martina Ziefle (RWTH Aachen University)

A.M. Schweidtmanna (TU Delft - ChemE/Process Systems Engineering)

Research Group

ChemE/Process Systems Engineering

Copyright

DOI related publication

https://doi.org/10.1038/s41598-024-54936-7

To reference this document use:

https://resolver.tudelft.nl/uuid:56ad355d-48e7-414d-b431-09e62987db79

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Research Group

ChemE/Process Systems Engineering

Issue number

1

Volume number

14

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to greatly impact society, research, and education. An essential step to understand ChatGPT’s expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions on natural science and engineering topics from 198 faculty members across five faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are, on average, perceived as “mostly correct”. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the educational level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.

Files

S41598-024-54936-7.pdf

(pdf | 1.41 Mb)