Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Methods of Medium Optimization

Methods of Medium Optimization

Optimizing growth media enhances microbial proliferation and maximizes product yield. Statistical experimental design methodologies provide structured and reproducible approaches, offering progressively higher levels of robustness and efficiency.The One-Factor-at-a-Time (OFAT) MethodThe One-Factor-at-a-Time (OFAT) method involves adjusting a single variable while keeping all others constant. However, it cannot detect interactions between variables, often leading to suboptimal outcomes when...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Two-Way ANOVA

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the means for...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

CACE-MM: using mixed methods to strengthen causal inference in medicine and public health.

BMC medical research methodology·2026

Same author

A <i>Cautionary Tale</i> on Integrating Studies with Disparate Outcome Measures for Causal Inference.

Advances in neural information processing systems·2026

Same author

Comparative effectiveness of antidepressants for depression using EHRs from two health systems.

BMC psychiatry·2026

Same author

The TARGET guideline for reporting observational studies of interventions.

Nature medicine·2026

Same author

Abortion Bans and Maternal, Pregnancy-Related, and Pregnancy-Associated Mortality in 14 US States, 2016-2023: Estimated Impacts Amid Substantial Measurement Challenges.

American journal of public health·2026

Same author

Impact of State Telemedicine Policies on Substance Use Disorder Treatment During the COVID-19 Pandemic.

Journal of general internal medicine·2026

Same journal

Human factors methods for designing safe health information technology: what do the experts think?

Journal of the American Medical Informatics Association : JAMIA·2026

Same journal

Equity-by-design for socially assistive robots as digital health tools.

Journal of the American Medical Informatics Association : JAMIA·2026

Same journal

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026

Same journal

CUI-Curate: a GraphRAG-based framework for automated clinical concept curation for NLP applications.

Journal of the American Medical Informatics Association : JAMIA·2026

Same journal

Malfunctions in distributed clinical decision support: 3 cases from a multi‑component clinical decision support system.

Journal of the American Medical Informatics Association : JAMIA·2026

Same journal

The importance of clinical context in evaluating algorithmic fairness: insights from a medication adherence prediction algorithm.

Journal of the American Medical Informatics Association : JAMIA·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 19, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Large language models for full-text methods assessment: a case study on mediation analysis.

Wenqing Zhang¹, Trang Nguyen^1,2, Elizabeth A Stuart^1,2

¹Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States.

Journal of the American Medical Informatics Association : JAMIA

|June 17, 2026

Summary

This summary is machine-generated.

Large language models (LLMs) show promise in assisting with systematic reviews by matching human expert performance on methodological assessments. However, advanced LLMs still lag on complex inference tasks, suggesting a collaborative human-AI approach.

Keywords:

benchmarking causal inference large language models mediation analysis systematic reviews

Related Experiment Videos

Last Updated: Jun 19, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Psychiatry and Psychology Research
Artificial Intelligence in Scientific Literature Review

Background:

Systematic reviews are crucial for evidence synthesis but are labor-intensive, especially for extracting detailed methodological information from full-text articles.
Assessing causal assumptions and methodological best practices in psychiatry and psychology studies requires expert-level review.

Purpose of the Study:

To evaluate the performance of large language models (LLMs) in conducting full-text methodological reviews of mediation analysis studies.
To compare LLM capabilities against human expert-level review on key causal assumptions and best practices.

Main Methods:

Six LLMs (ChatGPT, Claude, Gemini) were tested on 180 full-text mediation analysis articles previously reviewed by methodologists.
LLMs assessed 14 binary methodological criteria, with performance measured against expert consensus using accuracy, precision, recall, F1, AUC, and PR-AUC.

Main Results:

LLM performance strongly correlated with human reviewers (accuracy correlation 0.71, F1 correlation 0.95).
Advanced LLMs achieved near-human accuracy on explicit features but were up to 15% less accurate on inference-intensive tasks.
Model accuracy decreased with longer documents, and common errors involved overinterpretation and misinterpretation of technical terms.

Conclusions:

Findings support a criterion-specific human-AI collaboration strategy for full-text methodological assessment.
A reproducible framework is provided for future LLM testing in evidence synthesis settings.