Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Assessment of Airway, Skin Color, and Use of Accessory Muscles

Assessment of Airway, Skin Color, and Use of Accessory Muscles

A thorough assessment of respiratory health is paramount in clinical settings to identify and manage respiratory distress and ensure adequate oxygenation. This article elaborates on the critical aspects of respiratory evaluation, including airway assessment, skin color examination, and the observation of accessory muscle use, which are integral to effectively diagnosing and managing patients with respiratory conditions.
Introduction
The initial evaluation of a patient's respiratory system...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Generative Artificial Intelligence and Prompt Engineering in Asthma-Related Settings.

Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology·2026

Same author

Implementation of a Novel Case-Based Session for Medical Students Focused on Artificial Intelligence Ethics.

MedEdPORTAL : the journal of teaching and learning resources·2026

Same author

Effect of Speech Recognition Software on Provider Documentation Characteristics Within an Electronic Health Record System.

Applied clinical informatics·2026

Same author

The effects of adjuvant radiotherapy on survival outcomes in polymorphous adenocarcinoma: a retrospective cohort study.

Translational cancer research·2026

Same author

Early Identification of Gastrostomy Tube Placement in the Surgical Treatment of Head and Neck Cancer.

Head & neck·2026

Same author

Feasibility of water vapor thermal therapy for treating lower urinary tract symptoms in men with localized prostate cancer on active surveillance: a case series.

The Canadian journal of urology·2026

Same journal

Disparities in Activation and Use of Patient Portals Among Spanish-Speaking Patients.

Applied clinical informatics·2026

Same journal

Real-World Utilization of a Hospital-Integrated Internet Hospital in Henan Province, China: A 1-Year Observational Study.

Applied clinical informatics·2026

Same journal

From Pandemic Response to Kill the Clipboard: Patient-Controlled Sharing of Health Data Using International Patient Summary (IPS) and QR codes.

Applied clinical informatics·2026

Same journal

Usage of and Satisfaction with Artificial Intelligence-Generated Draft Replies to Patient Portal Messages.

Applied clinical informatics·2026

Same journal

Automating Ambulatory Central Line Data Capture and Calculations.

Applied clinical informatics·2026

Same journal

Effectiveness of Interruptive Clinical Decision Support Alerts on Intravenous vs. Oral Acetaminophen Prescribing.

Applied clinical informatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 8, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Comparing Large Language Models' Performances on Otolaryngology Knowledge Assessment Questions.

Ryan Cook¹, Abner Kahan¹, Thomas Scharfenberger¹

¹Albert Einstein College of Medicine, Bronx, New York, United States.

Applied Clinical Informatics

|April 6, 2026

Summary

This summary is machine-generated.

This study tested large language models (LLMs) on otolaryngology knowledge. Top models achieved around 76% accuracy, indicating a plateau for general LLMs in specialized medical fields.

More Related Videos

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Related Experiment Videos

Last Updated: Apr 8, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Learning Modern Laryngeal Surgery in a Dissection Laboratory

Published on: March 18, 2020

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Area of Science:

Medical Education
Artificial Intelligence
Otolaryngology

Background:

Large language models (LLMs) show potential in medical education.
Evaluating LLM performance on specialized medical knowledge is crucial.
Otolaryngology knowledge assessment requires accurate AI tools.

Purpose of the Study:

To assess the performance of OpenAI's GPT-4 Turbo and 10 other commercial large language models (LLMs) on specialized otolaryngology knowledge.
To compare the utility of these LLMs in otolaryngology medical education.
To identify the current capabilities and limitations of general-purpose LLMs in a specific medical domain.

Main Methods:

1,075 otolaryngology questions from OTO QUEST were administered to GPT-4 Turbo using a zero-shot approach.
Accuracy was analyzed using logistic regression, controlling for question difficulty, year, and subspecialty.
Comparative analysis involved 10 commercial LLMs, including Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o, using Cochran's Q test and McNemar's pairwise comparison.

Main Results:

GPT-4 Turbo achieved 72.09% accuracy, excelling in Practice Management but declining with moderate and hard difficulty questions.
In comparative analysis, Grok-3 (76.3%), Claude-3.5-Sonnet (73.0%), and GPT-4o (69.9%) showed higher accuracy than GPT-4 Turbo.
The top-performing models demonstrated an accuracy plateau around 73-76% on this specialized medical knowledge dataset.

Conclusions:

Current general-purpose LLMs demonstrate promising but limited capabilities in assessing specialized otolaryngology knowledge.
An accuracy plateau exists for these models, suggesting a need for domain-specific training.
Further research into specialized training for LLMs is recommended to improve performance in medical education and practice.