Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cochran's Q Test01:17

Cochran's Q Test

276
Cochran's Q Test is a nonparametric statistical test used to determine if there are potential differences in the outcomes of three or more related groups on a binary (yes/no) or dichotomous outcome. It is essentially an extension of the McNemar Test, which is limited to two related samples - Cochran's Q test can handle three or more related samples, making it more versatile in scenarios where subjects are measured under multiple conditions. The test statistic follows a Chi-Square...
276
Multiple Comparison Tests01:13

Multiple Comparison Tests

3.9K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
3.9K
Surveys02:16

Surveys

14.7K
Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally. Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.
14.7K
Testing a Claim about Population Proportion01:24

Testing a Claim about Population Proportion

3.3K
A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...
3.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Unexpected aberrant data patterns on slope graphs to examine article characteristics: Say good-bye to the burst bar chart in bibliometrics.

Medicine·2025
Same author

Enhancing English abstract quality for non-English speaking authors using ChatGPT: A comparative study of Taiwan, Japan, China, and South Korea with slope graphs.

Medicine·2024
Same author

Identifying authorial roles in research: A Kano model-based bibliometric analysis for the Journal of Medicine (Baltimore) 2023.

Medicine·2024
Same author

Analyzing collaboration and impact: A bibliometric review of four highly published authors' research profiles on collaborative maps.

Medicine·2024
Same author

Development of mobile CAT for patient feedback on pediatric consultations based on Rasch analysis of online techniques.

Medicine·2024
Same author

Developing a novel algorithm for comparing cluster patterns in networks on journal articles during and after COVID-19: Bibliometric analysis.

Medicine·2024
Same journal

Knowledge, Attitudes, and Practices Related to AI in Learning and Research Among Medical Students in Vietnam: Cross-Sectional Study.

JMIR formative research·2026
Same journal

Access to an mHealth Tool for Symptom Management in Pediatric Oncology Care: Triangulation Study.

JMIR formative research·2026
Same journal

Agreement Between Reasoning-Oriented Generative AI Models and Clinical Educators in Evaluating Japanese Objective Structured Clinical Examination Transcripts: Preliminary Comparative Study.

JMIR formative research·2026
Same journal

Automated Optic Disc Tilt Classification in Fundus Photographs Using Segmentation and the Elliptical Ratio: External Clinical Validation Study.

JMIR formative research·2026
Same journal

Effects of Virtual Reality on Postoperative Pain Management Following Minimally Invasive Gynecologic Surgery: Randomized Controlled Trial.

JMIR formative research·2026
Same journal

Prediction of Clinically Significant Depressive Symptoms at 2-Year Follow-Up in Older Adults: Machine Learning Study Using the English Longitudinal Study of Ageing.

JMIR formative research·2026
See all related articles

Related Experiment Video

Updated: Jun 17, 2025

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

5.8K

Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.

Julie Chi Chow1,2, Teng Yun Cheng3, Tsair-Wei Chien4

  • 1Department of Pediatrics, Chi Mei Medical Center, Tainan, Taiwan.

JMIR Formative Research
|August 8, 2024
PubMed
Summary
This summary is machine-generated.

ChatGPT demonstrated an "A" grade proficiency in answering multiple-choice questions (MCQs) from the 2023 Taiwan college entrance exams. This study utilized Rasch analysis (RaschOnline) to evaluate the AI

Keywords:
ChatGPTKIDMAPRaschOnlineWright mapapplicationartificial intelligencecollegedifferential item functioningevaluation toolmultiple choice questionsscoringstudentstestingtoolwebsite tool

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

733
Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.9K

Related Experiment Videos

Last Updated: Jun 17, 2025

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

5.8K
Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

733
Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.9K

Area of Science:

  • Artificial Intelligence
  • Educational Measurement
  • Psychometrics

Background:

  • ChatGPT, a leading large language model, shows promise in specialized applications.
  • Limited research exists on AI's performance in multiple-choice questions (MCQs) using Rasch analysis.
  • KIDMAP within Rasch analysis is a tool to evaluate AI's MCQ answering competence.

Purpose of the Study:

  • To demonstrate the utility of RaschOnline for evaluating AI performance.
  • To assess ChatGPT's performance on MCQs against a normal sample.
  • To determine the academic grade achieved by ChatGPT.

Main Methods:

  • ChatGPT's responses to 10 MCQs from the 2023 Taiwan college entrance exams were analyzed.
  • 300 simulated students were generated using a Rasch model to compare with ChatGPT.
  • RaschOnline was employed to generate visual presentations including item difficulty, DIF, ICC, Wright map, and KIDMAP.

Main Results:

  • Item difficulties showed a monotonic increase, with logits ranging from -2.43 to 2.47.
  • Differential item functioning (DIF) was noted for item 5 between gender groups (P=.04).
  • ChatGPT achieved an 'A' grade, outperforming simulated students across grades B to E.

Conclusions:

  • RaschOnline effectively evaluates AI performance in MCQ answering.
  • ChatGPT exhibits excellent proficiency in answering English MCQs from standardized tests.
  • The study confirms ChatGPT's capability to achieve a high academic grade when benchmarked against human performance.