Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 27, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evaluating Source-Based Large Language Models for Preclinical Dermatology Education: Comparative Study.

Frank Je-Min Lin1, Sunghun Cho2

  • 1F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD, 20814, United States, 1 2532733100.

JMIR Formative Research
|June 25, 2026
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Structural and molecular characteristics of weight-bearing volar skin can be reconstituted by micro skin tissue column grafting.

FASEB journal : official publication of the Federation of American Societies for Experimental Biology·2024
Same journal

Effects of Virtual Reality on Postoperative Pain Management Following Minimally Invasive Gynecologic Surgery: Randomized Controlled Trial.

JMIR formative research·2026
Same journal

Prediction of Clinically Significant Depressive Symptoms at 2-Year Follow-Up in Older Adults: Machine Learning Study Using the English Longitudinal Study of Ageing.

JMIR formative research·2026
Same journal

Awareness, Educational Needs, and Curriculum Preferences Regarding AI and Medical Big Data Education Among Clinical Medicine Undergraduates: Cross-Sectional Survey Study.

JMIR formative research·2026
Same journal

Stakeholder Experiences With the Pneumococcal Conjugate Vaccine Chatbot as a Complementary Capacity-Building Tool for Frontline Health Workers in India: Qualitative Study.

JMIR formative research·2026
Same journal

Acceptability and Perceived Usefulness of a Digital Gambling Harm Minimisation Tool: A Cross-Sectional Study.

JMIR formative research·2026
Same journal

Knowledge Graphs Based on Meta-Analysis Papers Improve the Quality of Case Formulation: Mixed Methods Design.

JMIR formative research·2026
See all related articles

Adding student notes to source-based large language models (LLMs) improved response consistency in medical education but may limit accuracy on difficult questions. This highlights challenges in using LLMs for personalized learning.

Area of Science:

  • Medical Education Technology
  • Artificial Intelligence in Healthcare
  • Cognitive Load Theory Applications

Background:

  • Large language models (LLMs) show promise in medical education, aligning with cognitive load theory.
  • Source-based LLMs using retrieval-augmented generation (RAG) can leverage student materials for learning.
  • Dermatology education could benefit from LLMs to address healthcare disparities.

Purpose of the Study:

  • To evaluate the accuracy, reproducibility, and similarity of freely available LLMs on dermatology questions.
  • To determine if student-generated notes impact a source-based LLM's performance characteristics.

Main Methods:

  • Four LLMs (NotebookLM with/without notes, ChatGPT-4o Mini, Gemini 1.5 Flash) were tested.
  • 121 text-based USMLE Step 1 dermatology questions were administered across three trials per model.
Keywords:
AIAI in the classroomLLMNotebookLMartificial intelligencecognitive load theorylarge language modelretrieval-augmented generationsource-based LLM

Related Experiment Videos

Last Updated: Jun 27, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

  • Performance metrics included accuracy, reproducibility (Fleiss κ), and intermodel agreement.
  • Main Results:

    • ChatGPT-4o Mini achieved the highest overall accuracy (84.3%).
    • NotebookLM with notes showed superior reproducibility (κ=0.927) but lower accuracy on difficult questions.
    • NotebookLM without notes had significantly higher omission rates; accuracy improved when omissions were excluded.

    Conclusions:

    • Student notes significantly enhance source-based LLM response reproducibility.
    • Note-grounding may hinder performance on complex questions due to RAG limitations.
    • Developing effective educational LLMs requires balancing source utilization, internal reasoning, and assessment of learning gaps.