Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Proficiency order invariance of MLE, MAP, EAP, and WLE in item response theory.

The British journal of mathematical and statistical psychology·2026

Same author

Toward Precision Cardiac Rehabilitation: Current Limitations and Future Opportunities of Omics and Artificial Intelligence.

Sports medicine (Auckland, N.Z.)·2026

Same author

An Experimental Design to Investigate Item Parameter Drift.

Applied psychological measurement·2025

Same author

Proximity to Practice: The Role of Technology in the Next Era of Assessment.

Perspectives on medical education·2024

Same author

A Comparison of Remote vs In-Person Proctored In-Training Examination Administration for Internal Medicine.

Academic medicine : journal of the Association of American Medical Colleges·2024

Same author

A Clear Cell Sarcoma Case: A Diagnostic and Treatment Challenge, with a Promising Response to Trabectedin.

Case reports in oncology·2023

Same journal

The "Twilight Zone" Is a Danger Zone: Why the Occupational-Clinical Divide in Burnout Assessment Is a False Dichotomy.

Evaluation & the health professions·2026

Same journal

Evaluating Equity in AI-Supported Functional Assessment: Agreement Between Clinician Judgment and Digital Metrics in Stroke Rehabilitation.

Evaluation & the health professions·2026

Same journal

Psychometric Properties of the Arabic Version of the PROMIS Sleep Disturbance 8b Short Form Among Nurses.

Evaluation & the health professions·2026

Same journal

Commentary: Systemic Inequities in Japan's Technical Intern Training Program (TITP): Health, Labor, and Legal Vulnerabilities of Foreign Trainees.

Evaluation & the health professions·2026

Same journal

Application of Patient-Reported Outcome Measurements in Traditional Chinese Medicine Clinical Trials for Musculoskeletal Disorders in China: A Registry-Based Analysis.

Evaluation & the health professions·2026

Same journal

Divergent Socioeconomic Pathways to Biologically Uncontrolled Diabetes by Gender: A Bayesian Analysis of NHANES 2021-2023.

Evaluation & the health professions·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 2, 2026

Irrelevant Stimuli and Action Control: Analyzing the Influence of Ignored Stimuli via the Distractor-Response Binding Paradigm

Irrelevant Stimuli and Action Control: Analyzing the Influence of Ignored Stimuli via the Distractor-Response Binding Paradigm

Published on: May 14, 2014

A Natural-Language-Processing-Based Procedure for Generating Distractors for Multiple-Choice Questions.

Peter Baldwin¹, Janet Mee¹, Victoria Yaneva¹

¹National Board of Medical Examiners, Philadelphia, PA, USA.

Evaluation & the Health Professions

|November 10, 2021

Summary

This summary is machine-generated.

This study introduces an automated method for generating multiple-choice test question distractors using natural language processing. The system successfully identified plausible distractors, aiding human item writers in test development.

Keywords:

automatic item generation item writing large-scale testing natural language processing test development

More Related Videos

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Advancing Dyslexia Assessment in Children Through Computerized Testing

Advancing Dyslexia Assessment in Children Through Computerized Testing

Published on: August 16, 2024

Related Experiment Videos

Last Updated: Jul 2, 2026

Irrelevant Stimuli and Action Control: Analyzing the Influence of Ignored Stimuli via the Distractor-Response Binding Paradigm

Irrelevant Stimuli and Action Control: Analyzing the Influence of Ignored Stimuli via the Distractor-Response Binding Paradigm

Published on: May 14, 2014

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Advancing Dyslexia Assessment in Children Through Computerized Testing

Advancing Dyslexia Assessment in Children Through Computerized Testing

Published on: August 16, 2024

Area of Science:

Medical Education
Natural Language Processing
Psychometrics

Background:

Developing effective multiple-choice questions (MCQs) is challenging, particularly in creating plausible incorrect response options (distractors).
Existing item banks represent a valuable resource for generating new assessment items.

Purpose of the Study:

To introduce and evaluate a procedure for automatically mining item banks to generate potential distractors for new MCQs.
To assess the utility of system-generated distractors for human item writers.

Main Methods:

Utilized natural language processing (NLP) to measure semantic similarity between new item stems/answers and existing item bank content.
Developed a distractor generation model requiring a substantial pool of items.
Evaluated system-produced distractors against human-produced distractors using United States Medical Licensing Examination (USMLE) data.
Assessed the quality and relevance of system-generated distractors with experienced item writers.

Main Results:

For approximately 50% of items, at least one top system-generated distractor matched a human-produced distractor.
For about 25% of items, two of the top three system-generated distractors matched human-produced distractors.
Item writers rated 81% of system-generated distractors as on-topic and 56% as helpful for distractor development.

Conclusions:

Automated distractor generation using NLP is a feasible approach to support MCQ development.
The proposed method shows promise in identifying relevant and plausible distractors, assisting item writers in medical education and other fields.
Further refinement of NLP techniques can enhance the efficiency and effectiveness of creating high-quality assessment items.