Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Methods of Medium Optimization01:28

Methods of Medium Optimization

Optimizing growth media enhances microbial proliferation and maximizes product yield. Statistical experimental design methodologies provide structured and reproducible approaches, offering progressively higher levels of robustness and efficiency.The One-Factor-at-a-Time (OFAT) MethodThe One-Factor-at-a-Time (OFAT) method involves adjusting a single variable while keeping all others constant. However, it cannot detect interactions between variables, often leading to suboptimal outcomes when...
Regression Analysis01:11

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
Two-Way ANOVA01:17

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the means for...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

CACE-MM: using mixed methods to strengthen causal inference in medicine and public health.

BMC medical research methodology·2026
Same author

A <i>Cautionary Tale</i> on Integrating Studies with Disparate Outcome Measures for Causal Inference.

Advances in neural information processing systems·2026
Same author

Comparative effectiveness of antidepressants for depression using EHRs from two health systems.

BMC psychiatry·2026
Same author

The TARGET guideline for reporting observational studies of interventions.

Nature medicine·2026
Same author

Abortion Bans and Maternal, Pregnancy-Related, and Pregnancy-Associated Mortality in 14 US States, 2016-2023: Estimated Impacts Amid Substantial Measurement Challenges.

American journal of public health·2026
Same author

Impact of State Telemedicine Policies on Substance Use Disorder Treatment During the COVID-19 Pandemic.

Journal of general internal medicine·2026
Same journal

Human factors methods for designing safe health information technology: what do the experts think?

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Equity-by-design for socially assistive robots as digital health tools.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

CUI-Curate: a GraphRAG-based framework for automated clinical concept curation for NLP applications.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Malfunctions in distributed clinical decision support: 3 cases from a multi‑component clinical decision support system.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

The importance of clinical context in evaluating algorithmic fairness: insights from a medication adherence prediction algorithm.

Journal of the American Medical Informatics Association : JAMIA·2026
See all related articles

Related Experiment Video

Updated: Jun 19, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Large language models for full-text methods assessment: a case study on mediation analysis.

Wenqing Zhang1, Trang Nguyen1,2, Elizabeth A Stuart1,2

  • 1Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States.

Journal of the American Medical Informatics Association : JAMIA
|June 17, 2026
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) show promise in assisting with systematic reviews by matching human expert performance on methodological assessments. However, advanced LLMs still lag on complex inference tasks, suggesting a collaborative human-AI approach.

Keywords:
benchmarkingcausal inferencelarge language modelsmediation analysissystematic reviews

Related Experiment Videos

Last Updated: Jun 19, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Psychiatry and Psychology Research
  • Artificial Intelligence in Scientific Literature Review

Background:

  • Systematic reviews are crucial for evidence synthesis but are labor-intensive, especially for extracting detailed methodological information from full-text articles.
  • Assessing causal assumptions and methodological best practices in psychiatry and psychology studies requires expert-level review.

Purpose of the Study:

  • To evaluate the performance of large language models (LLMs) in conducting full-text methodological reviews of mediation analysis studies.
  • To compare LLM capabilities against human expert-level review on key causal assumptions and best practices.

Main Methods:

  • Six LLMs (ChatGPT, Claude, Gemini) were tested on 180 full-text mediation analysis articles previously reviewed by methodologists.
  • LLMs assessed 14 binary methodological criteria, with performance measured against expert consensus using accuracy, precision, recall, F1, AUC, and PR-AUC.

Main Results:

  • LLM performance strongly correlated with human reviewers (accuracy correlation 0.71, F1 correlation 0.95).
  • Advanced LLMs achieved near-human accuracy on explicit features but were up to 15% less accurate on inference-intensive tasks.
  • Model accuracy decreased with longer documents, and common errors involved overinterpretation and misinterpretation of technical terms.

Conclusions:

  • Findings support a criterion-specific human-AI collaboration strategy for full-text methodological assessment.
  • A reproducible framework is provided for future LLM testing in evidence synthesis settings.