The model student: GPT-4 performance on graduate biomedical science exams
- Daniel Stribling 1,2,3, Yuxing Xia 4,5, Maha K Amer 6, Kiley S Graim 7, Connie J Mulligan 8,9, Rolf Renne 10,11,12
- Daniel Stribling 1,2,3, Yuxing Xia 4,5, Maha K Amer 6
- 1Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA. ds@ufl.edu.
- 2UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA. ds@ufl.edu.
- 3UF Health Cancer Center, University of Florida, Gainesville, FL, 32610, USA. ds@ufl.edu.
- 4Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, College of Medicine, University of Florida, Gainesville, FL, 32610, USA.
- 5Department of Neurology, UCLA, Los Angeles, CA, 90095, USA.
- 6Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA.
- 7Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32610, USA.
- 8UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
- 9Department of Anthropology, University of Florida, Gainesville, FL, 32610, USA.
- 10Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA. rrenne@ufl.edu.
- 11UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA. rrenne@ufl.edu.
- 12UF Health Cancer Center, University of Florida, Gainesville, FL, 32610, USA. rrenne@ufl.edu.
- 0Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA. ds@ufl.edu.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.Large language models like GPT-4 show strong performance on biomedical science exams, outscoring students on many. However, limitations in handling figures and potential plagiarism require careful consideration for academic use.
Area Of Science
- Biomedical Sciences
- Artificial Intelligence
- Educational Technology
Background
- Large language models (LLMs) like GPT-4 and ChatGPT are increasingly capable text generators.
- GPT-4 has demonstrated proficiency in standardized tests, but its trustworthiness in diverse knowledge domains needs evaluation.
- Assessing AI performance in specialized fields like biomedical sciences is crucial for understanding its potential and limitations.
Purpose Of The Study
- To evaluate the performance and accuracy of the GPT-4 large language model on graduate-level biomedical science examinations.
- To identify specific question formats and content types where GPT-4 excels or struggles.
- To inform the future design of academic assessments in the context of advanced AI tools.
Main Methods
- GPT-4 was tested on nine graduate-level biomedical science exams, including seven that were blinded.
- Performance was analyzed across different question types: fill-in-the-blank, short-answer, essay, and questions involving figures.
- Responses were assessed for accuracy, plagiarism, and instances of hallucination.
Main Results
- GPT-4 surpassed the student average score in seven out of nine exams and exceeded all student scores in four exams.
- The model performed well on text-based questions and questions with figures from published manuscripts.
- Poor performance was observed on questions with figures containing simulated data and those requiring hand-drawn answers. Plagiarism and hallucinations were noted in some responses.
Conclusions
- GPT-4 demonstrates significant capabilities in answering graduate-level biomedical science questions, often outperforming human students.
- The model's limitations, particularly with visual data and potential for generating inaccurate or plagiarized content, highlight the need for careful integration into academic settings.
- Future academic assessments may need adaptation to account for AI capabilities and mitigate potential misuse.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

