Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Language and Cognition01:27

Language and Cognition

918
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.
918
Improving Translational Accuracy02:07

Improving Translational Accuracy

15.3K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
15.3K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.7K
3.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Large Language Model-Based Classification of Case Report Abstracts: A Pilot Study on Interactions Between Radiotherapy and Systemic Therapies.

JCO clinical cancer informatics·2026
Same author

Automatically detecting trends and open questions from mental health publications: a Wellcome-funded GALENOS project.

BMJ mental health·2026
Same author

Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Comparative Analysis of GPT Models and Prompts.

JMIR cancer·2026
Same author

A comparative performance analysis of regular expressions and a large language model-based approach to extract the BI-RADS score from radiological reports.

JAMIA open·2025
Same author

Predicting outcomes of smoking cessation interventions in novel scenarios using ontology-informed, interpretable machine learning.

Wellcome open research·2025
Same author

Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels.

JCO clinical cancer informatics·2025
Same journal

Effectiveness of Artificial Intelligence-Assisted Peer Teaching in Orthopedic Clinical Education: Historical Cohort Study.

JMIR medical education·2026
Same journal

Exploring the Role of Early Career Medical Professionals From a Digital-Oriented University in Germany in Promoting Digital Health in Professional Settings: Qualitative Interview Study.

JMIR medical education·2026
Same journal

Impact of a Practical, Hands-On, Continuing Professional Development Course About AI in Health Care Professions Education on the Perceptions and Behaviors of Health Care Educators: Qualitative Case Study.

JMIR medical education·2026
Same journal

Andragogic Model Curriculum for One-Year ACGME-Accredited Fellowship Programs: Single-Center Educational Improvement Project.

JMIR medical education·2026
Same journal

Co-Designing and Evaluating a 1-Day Quality Improvement Workshop for Medical Students and Resident Physicians: Tutorial on Applying Kern's Curriculum Development Framework.

JMIR medical education·2026
Same journal

Implementation of Emotional Connection Training in Pediatric Primary Care: Mixed Methods Study.

JMIR medical education·2026
See all related articles

Related Experiment Video

Updated: Mar 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

Performance Evaluation of Large Language Models in Multilingual Medical Multiple-Choice Questions: Mixed Methods

Livia Maria Strasser1, Wilma Anschuetz2, Fabio Dennstädt1,3

  • 1Medical Knowledge and Decision Support, School of Medicine, University of St.Gallen, St.Jakobstrasse 21, St.Gallen, 9000, Switzerland, +41 71 224 32 00.

JMIR Medical Education
|March 11, 2026
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) show varied accuracy on medical questions across languages, with German performing best. Prompting in English generally improved results, but human oversight is crucial for reliable integration into medical education.

Keywords:
LLMLLM evaluationeducationlarge language modelmedical question answeringmultiple-choice questionsnatural language processing

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.8K
Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

3.3K

Related Experiment Videos

Last Updated: Mar 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.8K
Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

3.3K

Area of Science:

  • Medical Education
  • Artificial Intelligence
  • Natural Language Processing

Background:

  • Artificial intelligence (AI) is transforming healthcare and medical education.
  • Large language models (LLMs) demonstrate potential in medical licensing exams.
  • LLM performance varies by language, necessitating cross-language comparisons.

Purpose of the Study:

  • Evaluate LLM performance on medical multiple-choice questions across German, French, and Italian.
  • Quantitatively and qualitatively assess LLM capabilities in multilingual medical education.
  • Identify factors influencing LLM accuracy in diverse linguistic contexts.

Main Methods:

  • Mixed methods study analyzing 114 multiple-choice questions in German, French, and Italian.
  • Quantitative performance analysis of multiple LLMs (OpenAI, Meta AI, Anthropic, DeepSeek).
  • Qualitative analysis of answer explanations from top-performing LLMs (GPT4o, Claude-Sonnet-3.7) for incorrect answers.

Main Results:

  • LLM accuracy varied significantly by model and language (64%-87%), with German questions yielding the best performance.
  • English prompts generally outperformed language-matched prompts, though top models showed comparable results.
  • Qualitative analysis revealed reasoning errors in LLM explanations and identified 3 imprecise questions.

Conclusions:

  • LLM performance in medical exams is influenced by model, prompt, and input language, requiring careful selection.
  • LLM-generated explanations can enhance medical question quality, contingent on data security.
  • Human oversight is essential for nuanced medical content, and ongoing evaluation is needed for reliable LLM integration.