Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Assessment of Airway, Skin Color, and Use of Accessory Muscles01:30

Assessment of Airway, Skin Color, and Use of Accessory Muscles

1.9K
A thorough assessment of respiratory health is paramount in clinical settings to identify and manage respiratory distress and ensure adequate oxygenation. This article elaborates on the critical aspects of respiratory evaluation, including airway assessment, skin color examination, and the observation of accessory muscle use, which are integral to effectively diagnosing and managing patients with respiratory conditions.
Introduction
The initial evaluation of a patient's respiratory system...
1.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Generative Artificial Intelligence and Prompt Engineering in Asthma-Related Settings.

Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology·2026
Same author

Implementation of a Novel Case-Based Session for Medical Students Focused on Artificial Intelligence Ethics.

MedEdPORTAL : the journal of teaching and learning resources·2026
Same author

Effect of Speech Recognition Software on Provider Documentation Characteristics Within an Electronic Health Record System.

Applied clinical informatics·2026
Same author

The effects of adjuvant radiotherapy on survival outcomes in polymorphous adenocarcinoma: a retrospective cohort study.

Translational cancer research·2026
Same author

Early Identification of Gastrostomy Tube Placement in the Surgical Treatment of Head and Neck Cancer.

Head & neck·2026
Same author

Feasibility of water vapor thermal therapy for treating lower urinary tract symptoms in men with localized prostate cancer on active surveillance: a case series.

The Canadian journal of urology·2026
Same journal

Disparities in Activation and Use of Patient Portals Among Spanish-Speaking Patients.

Applied clinical informatics·2026
Same journal

Real-World Utilization of a Hospital-Integrated Internet Hospital in Henan Province, China: A 1-Year Observational Study.

Applied clinical informatics·2026
Same journal

From Pandemic Response to Kill the Clipboard: Patient-Controlled Sharing of Health Data Using International Patient Summary (IPS) and QR codes.

Applied clinical informatics·2026
Same journal

Usage of and Satisfaction with Artificial Intelligence-Generated Draft Replies to Patient Portal Messages.

Applied clinical informatics·2026
Same journal

Automating Ambulatory Central Line Data Capture and Calculations.

Applied clinical informatics·2026
Same journal

Effectiveness of Interruptive Clinical Decision Support Alerts on Intravenous vs. Oral Acetaminophen Prescribing.

Applied clinical informatics·2026
See all related articles

Related Experiment Video

Updated: Apr 8, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

Comparing Large Language Models' Performances on Otolaryngology Knowledge Assessment Questions.

Ryan Cook1, Abner Kahan1, Thomas Scharfenberger1

  • 1Albert Einstein College of Medicine, Bronx, New York, United States.

Applied Clinical Informatics
|April 6, 2026
PubMed
Summary
This summary is machine-generated.

This study tested large language models (LLMs) on otolaryngology knowledge. Top models achieved around 76% accuracy, indicating a plateau for general LLMs in specialized medical fields.

More Related Videos

Learning Modern Laryngeal Surgery in a Dissection Laboratory
07:30

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

8.8K
Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

3.4K

Related Experiment Videos

Last Updated: Apr 8, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Learning Modern Laryngeal Surgery in a Dissection Laboratory
07:30

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

8.8K
Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

3.4K

Area of Science:

  • Medical Education
  • Artificial Intelligence
  • Otolaryngology

Background:

  • Large language models (LLMs) show potential in medical education.
  • Evaluating LLM performance on specialized medical knowledge is crucial.
  • Otolaryngology knowledge assessment requires accurate AI tools.

Purpose of the Study:

  • To assess the performance of OpenAI's GPT-4 Turbo and 10 other commercial large language models (LLMs) on specialized otolaryngology knowledge.
  • To compare the utility of these LLMs in otolaryngology medical education.
  • To identify the current capabilities and limitations of general-purpose LLMs in a specific medical domain.

Main Methods:

  • 1,075 otolaryngology questions from OTO QUEST were administered to GPT-4 Turbo using a zero-shot approach.
  • Accuracy was analyzed using logistic regression, controlling for question difficulty, year, and subspecialty.
  • Comparative analysis involved 10 commercial LLMs, including Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o, using Cochran's Q test and McNemar's pairwise comparison.

Main Results:

  • GPT-4 Turbo achieved 72.09% accuracy, excelling in Practice Management but declining with moderate and hard difficulty questions.
  • In comparative analysis, Grok-3 (76.3%), Claude-3.5-Sonnet (73.0%), and GPT-4o (69.9%) showed higher accuracy than GPT-4 Turbo.
  • The top-performing models demonstrated an accuracy plateau around 73-76% on this specialized medical knowledge dataset.

Conclusions:

  • Current general-purpose LLMs demonstrate promising but limited capabilities in assessing specialized otolaryngology knowledge.
  • An accuracy plateau exists for these models, suggesting a need for domain-specific training.
  • Further research into specialized training for LLMs is recommended to improve performance in medical education and practice.