Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Self-Report Tests of Personality01:22

Self-Report Tests of Personality

893
Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.
893
Local Anesthetics: Differential Sensitivity of Nerve Fibers01:24

Local Anesthetics: Differential Sensitivity of Nerve Fibers

1.5K
Local anesthetics (LAs) block the sodium channels of nerve trunks, sensory nerve endings, and neuromuscular junctions. Although LAs can block all kinds of nerves, the sensitivity of nerve fibers differs according to nerve types and structures. LAs are known to block myelinated fibers faster than unmyelinated ones. Also, they block pain or sensory neurons at low concentrations without affecting the motor neurons involved in muscle contractions. This helps relieve labor pain without affecting the...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Tjap1/Pilt Is a cis-Golgi-Associated Protein Required for Golgi Integrity and Normal Drug Transporter Expression in Brain Microvascular Endothelial Cells In Vitro.

Pharmaceutics·2026
Same author

Comparing the Weighted Gain Score and a Rasch-Based Approach for Estimating Learning Outcomes in Medical Education: Quantitative Study.

JMIR medical education·2026
Same author

Thermal Safety of Forced-Air Warming During Balloon Occlusion in Isolated Perfusion Chemotherapy: A Prospective Feasibility Study Using Multisite Temperature Monitoring.

Cancers·2026
Same author

Mobile Learning in Medical Education: Quasi-Experimental Realist Evaluation of Usage, Context, and Examination Performance in a Curricular Setting.

JMIR medical education·2026
Same author

Evaluation of a Cognitive Aid Application to Improve Non-Technical Skills in Simulated Cardiopulmonary Resuscitation (CPR): A Randomised Controlled Trial.

Clinics and practice·2026
Same author

The Knockout of Protocadherin Gamma C3 (PCDHGC3) in Breast Cancer and Melanoma Cell Lines Leads to Increased Adhesion of Knockout Cells to Brain Microvascular Endothelial Cells.

NeuroSci·2026
Same journal

Stakeholder Experiences With the Pneumococcal Conjugate Vaccine Chatbot as a Complementary Capacity-Building Tool for Frontline Health Workers in India: Qualitative Study.

JMIR formative research·2026
Same journal

Acceptability and Perceived Usefulness of a Digital Gambling Harm Minimisation Tool: A Cross-Sectional Study.

JMIR formative research·2026
Same journal

Knowledge Graphs Based on Meta-Analysis Papers Improve the Quality of Case Formulation: Mixed Methods Design.

JMIR formative research·2026
Same journal

Expedited Transition to Digital Delivery of Recovery Support Services Due to the COVID-19 Pandemic: Mixed Methods Needs Assessment.

JMIR formative research·2026
Same journal

Impact of an mHealth App on Digital Transformation: Randomized Clinical Trial on Strengthening Digital Skills in Older Women.

JMIR formative research·2026
Same journal

Emotion Classification in Japanese Cancer Survivor Interview Narratives Using Sentiment Polarity and Plutchik Emotion Frameworks: Model Development and Evaluation Study.

JMIR formative research·2026
See all related articles

Related Experiment Video

Updated: Feb 20, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.2K

Fine-Tuned Large Language Models for Generating Multiple-Choice Questions in Anesthesiology: Psychometric Comparison

Carlos Ramon Hölzing1, Charlotte Meynhardt1, Patrick Meybohm1

  • 1Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, University Hospital Würzburg, Oberdürrbacher Str. 6, Würzburg, 97080, Germany.

JMIR Formative Research
|February 18, 2026
PubMed
Summary
This summary is machine-generated.

Fine-tuned large language models (LLMs) can create multiple-choice questions (MCQs) in anesthesiology with similar psychometric properties to those written by faculty experts. Automated item generation can complement, not replace, traditional methods for developing high-quality medical education assessments.

Keywords:
anesthesiologyartificial intelligenceassessmentfine-tuninglarge language modelsmedical educationmultiple-choice questionspsychometrics

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy
10:06

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy

Published on: May 18, 2019

6.0K

Related Experiment Videos

Last Updated: Feb 20, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.2K
Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy
10:06

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy

Published on: May 18, 2019

6.0K

Area of Science:

  • Medical Education
  • Artificial Intelligence in Assessment
  • Psychometrics

Background:

  • Multiple-choice questions (MCQs) are crucial for standardized medical assessment.
  • Developing high-quality MCQs requires subject expertise and rigorous methodology.
  • Large language models (LLMs) present opportunities for automated MCQ generation, but evaluations are limited.

Purpose of the Study:

  • To assess if a fine-tuned LLM can generate anesthesiology MCQs with psychometric properties comparable to faculty-written items.

Main Methods:

  • A fine-tuned GPT-4 model was trained on anesthesiology materials.
  • The model generated 15 MCQs, which were analyzed alongside 15 faculty-written MCQs.
  • Item analysis followed psychometric standards, comparing difficulty, point-biserial correlation, and discrimination index.

Main Results:

  • No significant differences were found in difficulty, point-biserial correlation, or discrimination index between LLM-generated and faculty-written MCQs.
  • Both sets of MCQs demonstrated modest overall psychometric quality.
  • LLM-generated items (mean difficulty 0.79, point-biserial 0.17, discrimination 0.08) were comparable to expert items (mean difficulty 0.81, point-biserial 0.19, discrimination 0.09).

Conclusions:

  • Supervised fine-tuned LLMs can produce MCQs with psychometric quality similar to expert faculty.
  • Automated item generation should supplement, not replace, manual MCQ development.
  • Further research is needed for generalizability and optimizing LLM integration in assessment.