Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 27, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evaluating Source-Based Large Language Models for Preclinical Dermatology Education: Comparative Study.

Frank Je-Min Lin¹, Sunghun Cho²

¹F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD, 20814, United States, 1 2532733100.

JMIR Formative Research

|June 25, 2026

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Structural and molecular characteristics of weight-bearing volar skin can be reconstituted by micro skin tissue column grafting.

FASEB journal : official publication of the Federation of American Societies for Experimental Biology·2024

Same journal

Effects of Virtual Reality on Postoperative Pain Management Following Minimally Invasive Gynecologic Surgery: Randomized Controlled Trial.

JMIR formative research·2026

Same journal

Prediction of Clinically Significant Depressive Symptoms at 2-Year Follow-Up in Older Adults: Machine Learning Study Using the English Longitudinal Study of Ageing.

JMIR formative research·2026

Same journal

Awareness, Educational Needs, and Curriculum Preferences Regarding AI and Medical Big Data Education Among Clinical Medicine Undergraduates: Cross-Sectional Survey Study.

JMIR formative research·2026

Same journal

Stakeholder Experiences With the Pneumococcal Conjugate Vaccine Chatbot as a Complementary Capacity-Building Tool for Frontline Health Workers in India: Qualitative Study.

JMIR formative research·2026

Same journal

Acceptability and Perceived Usefulness of a Digital Gambling Harm Minimisation Tool: A Cross-Sectional Study.

JMIR formative research·2026

Same journal

Knowledge Graphs Based on Meta-Analysis Papers Improve the Quality of Case Formulation: Mixed Methods Design.

JMIR formative research·2026

See all related articles

Adding student notes to source-based large language models (LLMs) improved response consistency in medical education but may limit accuracy on difficult questions. This highlights challenges in using LLMs for personalized learning.

Area of Science:

Medical Education Technology
Artificial Intelligence in Healthcare
Cognitive Load Theory Applications

Background:

Large language models (LLMs) show promise in medical education, aligning with cognitive load theory.
Source-based LLMs using retrieval-augmented generation (RAG) can leverage student materials for learning.
Dermatology education could benefit from LLMs to address healthcare disparities.

Purpose of the Study:

To evaluate the accuracy, reproducibility, and similarity of freely available LLMs on dermatology questions.
To determine if student-generated notes impact a source-based LLM's performance characteristics.

Main Methods:

Four LLMs (NotebookLM with/without notes, ChatGPT-4o Mini, Gemini 1.5 Flash) were tested.
121 text-based USMLE Step 1 dermatology questions were administered across three trials per model.

Keywords:

AI AI in the classroom LLM NotebookLM artificial intelligence cognitive load theory large language model retrieval-augmented generation source-based LLM

Related Experiment Videos

Last Updated: Jun 27, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Performance metrics included accuracy, reproducibility (Fleiss κ), and intermodel agreement.

Main Results:

ChatGPT-4o Mini achieved the highest overall accuracy (84.3%).
NotebookLM with notes showed superior reproducibility (κ=0.927) but lower accuracy on difficult questions.
NotebookLM without notes had significantly higher omission rates; accuracy improved when omissions were excluded.

Conclusions:

Student notes significantly enhance source-based LLM response reproducibility.
Note-grounding may hinder performance on complex questions due to RAG limitations.
Developing effective educational LLMs requires balancing source utilization, internal reasoning, and assessment of learning gaps.