Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cochran's Q Test

Cochran's Q Test

Cochran's Q Test is a nonparametric statistical test used to determine if there are potential differences in the outcomes of three or more related groups on a binary (yes/no) or dichotomous outcome. It is essentially an extension of the McNemar Test, which is limited to two related samples - Cochran's Q test can handle three or more related samples, making it more versatile in scenarios where subjects are measured under multiple conditions. The test statistic follows a Chi-Square...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Surveys

Surveys

Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally. Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.

Testing a Claim about Population Proportion

Testing a Claim about Population Proportion

A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Unexpected aberrant data patterns on slope graphs to examine article characteristics: Say good-bye to the burst bar chart in bibliometrics.

Medicine·2025

Same author

Enhancing English abstract quality for non-English speaking authors using ChatGPT: A comparative study of Taiwan, Japan, China, and South Korea with slope graphs.

Medicine·2024

Same author

Identifying authorial roles in research: A Kano model-based bibliometric analysis for the Journal of Medicine (Baltimore) 2023.

Medicine·2024

Same author

Analyzing collaboration and impact: A bibliometric review of four highly published authors' research profiles on collaborative maps.

Medicine·2024

Same author

Development of mobile CAT for patient feedback on pediatric consultations based on Rasch analysis of online techniques.

Medicine·2024

Same author

Developing a novel algorithm for comparing cluster patterns in networks on journal articles during and after COVID-19: Bibliometric analysis.

Medicine·2024

Same journal

Knowledge, Attitudes, and Practices Related to AI in Learning and Research Among Medical Students in Vietnam: Cross-Sectional Study.

JMIR formative research·2026

Same journal

Access to an mHealth Tool for Symptom Management in Pediatric Oncology Care: Triangulation Study.

JMIR formative research·2026

Same journal

Agreement Between Reasoning-Oriented Generative AI Models and Clinical Educators in Evaluating Japanese Objective Structured Clinical Examination Transcripts: Preliminary Comparative Study.

JMIR formative research·2026

Same journal

Automated Optic Disc Tilt Classification in Fundus Photographs Using Segmentation and the Elliptical Ratio: External Clinical Validation Study.

JMIR formative research·2026

Same journal

Effects of Virtual Reality on Postoperative Pain Management Following Minimally Invasive Gynecologic Surgery: Randomized Controlled Trial.

JMIR formative research·2026

Same journal

Prediction of Clinically Significant Depressive Symptoms at 2-Year Follow-Up in Older Adults: Machine Learning Study Using the English Longitudinal Study of Ageing.

JMIR formative research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 17, 2025

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.

Julie Chi Chow^1,2, Teng Yun Cheng³, Tsair-Wei Chien⁴

¹Department of Pediatrics, Chi Mei Medical Center, Tainan, Taiwan.

JMIR Formative Research

|August 8, 2024

Summary

This summary is machine-generated.

ChatGPT demonstrated an "A" grade proficiency in answering multiple-choice questions (MCQs) from the 2023 Taiwan college entrance exams. This study utilized Rasch analysis (RaschOnline) to evaluate the AI

Keywords:

ChatGPT KIDMAP RaschOnline Wright map application artificial intelligence college differential item functioning evaluation tool multiple choice questions scoring students testing tool website tool

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Related Experiment Videos

Last Updated: Jun 17, 2025

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Area of Science:

Artificial Intelligence
Educational Measurement
Psychometrics

Background:

ChatGPT, a leading large language model, shows promise in specialized applications.
Limited research exists on AI's performance in multiple-choice questions (MCQs) using Rasch analysis.
KIDMAP within Rasch analysis is a tool to evaluate AI's MCQ answering competence.

Purpose of the Study:

To demonstrate the utility of RaschOnline for evaluating AI performance.
To assess ChatGPT's performance on MCQs against a normal sample.
To determine the academic grade achieved by ChatGPT.

Main Methods:

ChatGPT's responses to 10 MCQs from the 2023 Taiwan college entrance exams were analyzed.
300 simulated students were generated using a Rasch model to compare with ChatGPT.
RaschOnline was employed to generate visual presentations including item difficulty, DIF, ICC, Wright map, and KIDMAP.

Main Results:

Item difficulties showed a monotonic increase, with logits ranging from -2.43 to 2.47.
Differential item functioning (DIF) was noted for item 5 between gender groups (P=.04).
ChatGPT achieved an 'A' grade, outperforming simulated students across grades B to E.

Conclusions:

RaschOnline effectively evaluates AI performance in MCQ answering.
ChatGPT exhibits excellent proficiency in answering English MCQs from standardized tests.
The study confirms ChatGPT's capability to achieve a high academic grade when benchmarked against human performance.