Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 30, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

SPIRIT-CONSORT-ELM: Element-Level Assessment of Randomized Controlled Trial Reporting Using Large Language Models.

Lan Jiang, Xiangji Ying, Andrew W Brown

Medrxiv : the Preprint Server for Health Sciences

|June 29, 2026

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Study Design Indexing in Transition: A Focused Comparison of manual NLM Indexing vs. Transformer-based Automated Models.

medRxiv : the preprint server for health sciences·2026

Same author

Reply to Atkinson et al.

Journal of applied physiology (Bethesda, Md. : 1985)·2026

Same author

Outcome reporting in cohort studies of interventions.

BMJ (Clinical research ed.)·2026

Same author

Bone Mineral Density, Bone Remodeling Biomarkers, and Hemostatic Correlates in Hemophilia and von Willebrand Disease.

Blood advances·2026

Same author

Examining widely held propositions on human dietary protein needs and benefits: a critical review of the science that shapes both the data and our understanding of an essential macronutrient.

Critical reviews in food science and nutrition·2026

Same author

Feasibility of Non-Sedated Multispectral Neuroimaging in Newly Diagnosed Children with Leukemia.

medRxiv : the preprint server for health sciences·2026

Same journal

Comparative Evaluation of Pretrained Large Language Models for Suicide Risk Prediction from Clinical Notes in U.S. Veterans.

medRxiv : the preprint server for health sciences·2026

Same journal

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease.

medRxiv : the preprint server for health sciences·2026

Same journal

MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity.

medRxiv : the preprint server for health sciences·2026

Same journal

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

medRxiv : the preprint server for health sciences·2026

Same journal

Development of an automated, imaging-based preoperative screening model for early identification of malnutrition in an abdominal surgery cohort.

medRxiv : the preprint server for health sciences·2026

Same journal

A Pilot Project Leveraging Large Language Models for Automated Screening and Variable Extraction in Observational Studies.

medRxiv : the preprint server for health sciences·2026

See all related articles

This study introduces SPIRIT-CONSORT-ELM, a new dataset for assessing randomized controlled trial (RCT) reporting completeness at the element level. An automated pipeline using machine learning accurately evaluates RCT transparency beyond checklist items.

Area of Science:

Medical research methodology
Clinical trial reporting standards
Natural Language Processing in healthcare

Background:

Incomplete reporting in randomized controlled trials (RCTs) hinders verification and utility.
SPIRIT and CONSORT guidelines aim to improve protocol and results reporting, but completeness remains a challenge.
Automated checking of manuscripts could enhance reporting quality before publication.

Purpose of the Study:

To extend the SPIRIT-CONSORT-TM corpus with element-level annotations (SPIRIT-CONSORT-ELM) for assessing reporting completeness.
To develop and evaluate an automated machine reading comprehension pipeline for element-level assessment of RCT reports.
To establish a benchmark for evaluating reporting guideline completeness at a granular level.

Main Methods:

Related Experiment Videos

Last Updated: Jun 30, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Extended the SPIRIT-CONSORT-TM corpus with element-level annotations, formulating assessment as a machine reading comprehension task with 119 questions.
Developed an automated pipeline using PubMedBERT to identify relevant sentences and a generative large language model (GPT-5) with chain-of-thought reasoning to answer element-level questions.
Annotated 50 articles (25 pairs) by two independent annotators, with remaining 150 articles (75 pairs) assessed by one annotator; calculated inter-annotator agreement (Gwet's AC1: 0.782).

Main Results:

The automated pipeline achieved high accuracy in identifying element-level reporting evidence (F1: 0.822, Gwet's AC1: 0.796).
Ablation studies confirmed that chain-of-thought reasoning and in-context examples modestly improved the large language model's performance.
SPIRIT-CONSORT-ELM provides a publicly available benchmark for detailed reporting completeness assessment.

Conclusions:

SPIRIT-CONSORT-ELM enables a more nuanced assessment of RCT transparency than item-level checks alone.
The automated pipeline offers a robust baseline for evaluating RCT reporting completeness.
The developed system has the potential to serve as a practical tool for authors, reviewers, and editors to improve RCT report transparency.