TY - JOUR
T1 - We Can Rely on ChatGPT as an Educational Tutor
T2 - A Cross-Sectional Study of its Performance, Accuracy, and Limitations in University Admission Tests
AU - Beltozar-Clemente, Saul
AU - Díaz-Vega, Enrique
AU - Tejeda-Navarrete, Raul
AU - Zapata-Paulini, Joselyn
N1 - Publisher Copyright:
© 2024 by the authors of this article.
PY - 2024/1/30
Y1 - 2024/1/30
N2 - The aim of this research was to evaluate the performance of ChatGPT in answering multiple-choice questions without images in the entrance exams to the National University of Engineering (UNI) and the Universidad Nacional Mayor de San Marcos (UNMSM) over the past five years. In this prospective exploratory study, a total of 1182 questions were gathered from the UNMSM exams and 559 questions from the UNI exams, encompassing a wide range of topics including academic aptitude, reading comprehension, humanities, and scientific knowledge. The results indicate a significant (p < 0.001) and higher proportion of correct answers for UNMSM, with 72% (853/1182) of questions answered correctly. In contrast, there is no significant difference (p = 0.168) in the proportion of correct and incorrect answers for UNI, with 52% (317/552) of questions answered correctly. Similarly, in the World History course (p = 0.037), ChatGPT achieved its highest performance at a general level, with an accuracy of 91%. However, this was not the case in the language course (p = 0.172), where it achieved the lowest score of 55%. In conclusion, to fully harness the potential of ChatGPT in the educational setting, continuous evaluation of its performance, ongoing feedback to enhance its accuracy and minimize biases, and tailored adaptations for its use in educational settings are essential.
AB - The aim of this research was to evaluate the performance of ChatGPT in answering multiple-choice questions without images in the entrance exams to the National University of Engineering (UNI) and the Universidad Nacional Mayor de San Marcos (UNMSM) over the past five years. In this prospective exploratory study, a total of 1182 questions were gathered from the UNMSM exams and 559 questions from the UNI exams, encompassing a wide range of topics including academic aptitude, reading comprehension, humanities, and scientific knowledge. The results indicate a significant (p < 0.001) and higher proportion of correct answers for UNMSM, with 72% (853/1182) of questions answered correctly. In contrast, there is no significant difference (p = 0.168) in the proportion of correct and incorrect answers for UNI, with 52% (317/552) of questions answered correctly. Similarly, in the World History course (p = 0.037), ChatGPT achieved its highest performance at a general level, with an accuracy of 91%. However, this was not the case in the language course (p = 0.172), where it achieved the lowest score of 55%. In conclusion, to fully harness the potential of ChatGPT in the educational setting, continuous evaluation of its performance, ongoing feedback to enhance its accuracy and minimize biases, and tailored adaptations for its use in educational settings are essential.
KW - ChatGPT
KW - entrance exams
KW - performance
KW - university
UR - http://www.scopus.com/inward/record.url?scp=85188423826&partnerID=8YFLogxK
U2 - 10.3991/ijep.v14i1.46787
DO - 10.3991/ijep.v14i1.46787
M3 - Original Article
AN - SCOPUS:85188423826
SN - 2192-4880
VL - 14
SP - 50
EP - 60
JO - International Journal of Engineering Pedagogy
JF - International Journal of Engineering Pedagogy
IS - 1
ER -