Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: May 27, 2026

Utilizing a 3D Printed Laparoscopic Nissen Fundoplication Model to Shorten a Resident's Learning Curve
08:21

Utilizing a 3D Printed Laparoscopic Nissen Fundoplication Model to Shorten a Resident's Learning Curve

Published on: August 15, 2025

Evaluating Injection Laryngoplasty Skills Using a Foundation Model: A Feasibility Study.

Alex T Cheng1, Abdulla Elkhadrawy1, Sean A Setzen1

  • 1Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA.

The Laryngoscope
|May 26, 2026
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Extended Reality in Otolaryngology-Head & Neck Surgery: A State-of-the-Art Review.

Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery·2026
Same author

Public Perceptions of Ankyloglossia on Reddit: A Cross-Sectional Thematic and Sentiment Analysis.

Laryngoscope investigative otolaryngology·2026
Same author

Women Pioneers in Laryngology: The First Female Fellows of the American Laryngological Association.

The Laryngoscope·2026
Same author

Fresh-Frozen Costal Cartilage in Rhinoplasty: A Six-Year Experience.

Facial plastic surgery & aesthetic medicine·2026
Same author

Human Papillomavirus Vaccine Discourse and Sentiment on Reddit Before and After COVID-19: Mixed Methods Retrospective Cross-Sectional Study.

Journal of medical Internet research·2026
Same author

Gender and Academic Rank Disparities in Electronic Health Record Burden Among Otolaryngologists.

The Laryngoscope·2026
Same journal

Laryngeal IgG4-Related Disease: A Systematic Review of Clinical Features and Management.

The Laryngoscope·2026
Same journal

Elevated BMI Is Not Associated With Adverse Outcomes in Open Airway Reconstruction.

The Laryngoscope·2026
Same journal

What is the Most Effective Treatment Approach for Vocal Fold Granuloma?

The Laryngoscope·2026
Same journal

ATP6V1B1-A Novel Genetic Association Between Pendred Imaging Phenotype and Renal Tubular Acidosis.

The Laryngoscope·2026
Same journal

Effects of Ferrostatin-1 on Vocal Folds in Aging Rats.

The Laryngoscope·2026
Same journal

What Is the Role of Uvulopalatopharyngoplasty in Contemporary Sleep Surgery?

The Laryngoscope·2026
See all related articles

Few-shot prompting with Google Gemini 2.5 Pro successfully assessed surgical skill in simulated injection laryngoplasty, distinguishing expert from trainee performance. Averaging repeat evaluations can mitigate model variability for this promising assessment tool.

Area of Science:

  • Artificial Intelligence in Medicine
  • Surgical Skill Assessment
  • Medical Simulation

Background:

  • Assessing surgical proficiency is critical for patient safety and training effectiveness.
  • Objective and reliable methods for evaluating procedural skills are needed.
  • Multimodal foundation models offer potential for automated performance analysis.

Purpose of the Study:

  • To evaluate the construct validity of Google Gemini 2.5 Pro for assessing simulated injection laryngoplasty.
  • To compare zero-shot versus few-shot prompting strategies for skill assessment.
  • To determine model reliability and stability in performance evaluation.

Main Methods:

  • Thirty simulated injection laryngoplasty videos were stratified by operator experience (novice, intermediate, expert).
Keywords:
artificial intelligenceinjection laryngoplastylaryngologyskill assessmentsurgical education

More Related Videos

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy
10:06

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy

Published on: May 18, 2019

Learning Modern Laryngeal Surgery in a Dissection Laboratory
07:30

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

Related Experiment Videos

Last Updated: May 27, 2026

Utilizing a 3D Printed Laparoscopic Nissen Fundoplication Model to Shorten a Resident's Learning Curve
08:21

Utilizing a 3D Printed Laparoscopic Nissen Fundoplication Model to Shorten a Resident's Learning Curve

Published on: August 15, 2025

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy
10:06

Manufacture of a Multi-Purpose Low-Cost Animal Bench-Model for Teaching Tracheostomy

Published on: May 18, 2019

Learning Modern Laryngeal Surgery in a Dissection Laboratory
07:30

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

  • Google Gemini 2.5 Pro evaluated videos using zero-shot and few-shot prompting strategies.
  • Model performance was compared against operator experience, with reliability assessed via 90 repeated trials.
  • Main Results:

    • Zero-shot prompting failed to discriminate between skill levels (Spearman's ρ = 0.12, p = 0.52).
    • Few-shot prompting showed strong correlation with experience (Spearman's ρ = 0.66, p = 0.0002) and stratified skill levels.
    • Few-shot model significantly differentiated experts from novices and intermediates, improving precision and reducing error.

    Conclusions:

    • General-purpose multimodal models require calibration (e.g., few-shot prompting) for surgical judgment.
    • Few-shot prompting effectively calibrated Gemini 2.5 Pro to distinguish expert from trainee performance.
    • Model variability necessitates mitigation strategies, such as averaging repeated evaluations, for scalable assessment.