Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 30, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

SPIRIT-CONSORT-ELM: Element-Level Assessment of Randomized Controlled Trial Reporting Using Large Language Models.

Lan Jiang, Xiangji Ying, Andrew W Brown

    Medrxiv : the Preprint Server for Health Sciences
    |June 29, 2026
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    Study Design Indexing in Transition: A Focused Comparison of manual NLM Indexing vs. Transformer-based Automated Models.

    medRxiv : the preprint server for health sciences·2026
    Same author

    Reply to Atkinson et al.

    Journal of applied physiology (Bethesda, Md. : 1985)·2026
    Same author

    Outcome reporting in cohort studies of interventions.

    BMJ (Clinical research ed.)·2026
    Same author

    Bone Mineral Density, Bone Remodeling Biomarkers, and Hemostatic Correlates in Hemophilia and von Willebrand Disease.

    Blood advances·2026
    Same author

    Examining widely held propositions on human dietary protein needs and benefits: a critical review of the science that shapes both the data and our understanding of an essential macronutrient.

    Critical reviews in food science and nutrition·2026
    Same author

    Feasibility of Non-Sedated Multispectral Neuroimaging in Newly Diagnosed Children with Leukemia.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    Comparative Evaluation of Pretrained Large Language Models for Suicide Risk Prediction from Clinical Notes in U.S. Veterans.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    Development of an automated, imaging-based preoperative screening model for early identification of malnutrition in an abdominal surgery cohort.

    medRxiv : the preprint server for health sciences·2026
    Same journal

    A Pilot Project Leveraging Large Language Models for Automated Screening and Variable Extraction in Observational Studies.

    medRxiv : the preprint server for health sciences·2026
    See all related articles

    This study introduces SPIRIT-CONSORT-ELM, a new dataset for assessing randomized controlled trial (RCT) reporting completeness at the element level. An automated pipeline using machine learning accurately evaluates RCT transparency beyond checklist items.

    Area of Science:

    • Medical research methodology
    • Clinical trial reporting standards
    • Natural Language Processing in healthcare

    Background:

    • Incomplete reporting in randomized controlled trials (RCTs) hinders verification and utility.
    • SPIRIT and CONSORT guidelines aim to improve protocol and results reporting, but completeness remains a challenge.
    • Automated checking of manuscripts could enhance reporting quality before publication.

    Purpose of the Study:

    • To extend the SPIRIT-CONSORT-TM corpus with element-level annotations (SPIRIT-CONSORT-ELM) for assessing reporting completeness.
    • To develop and evaluate an automated machine reading comprehension pipeline for element-level assessment of RCT reports.
    • To establish a benchmark for evaluating reporting guideline completeness at a granular level.

    Main Methods:

    Related Experiment Videos

    Last Updated: Jun 30, 2026

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    • Extended the SPIRIT-CONSORT-TM corpus with element-level annotations, formulating assessment as a machine reading comprehension task with 119 questions.
    • Developed an automated pipeline using PubMedBERT to identify relevant sentences and a generative large language model (GPT-5) with chain-of-thought reasoning to answer element-level questions.
    • Annotated 50 articles (25 pairs) by two independent annotators, with remaining 150 articles (75 pairs) assessed by one annotator; calculated inter-annotator agreement (Gwet's AC1: 0.782).

    Main Results:

    • The automated pipeline achieved high accuracy in identifying element-level reporting evidence (F1: 0.822, Gwet's AC1: 0.796).
    • Ablation studies confirmed that chain-of-thought reasoning and in-context examples modestly improved the large language model's performance.
    • SPIRIT-CONSORT-ELM provides a publicly available benchmark for detailed reporting completeness assessment.

    Conclusions:

    • SPIRIT-CONSORT-ELM enables a more nuanced assessment of RCT transparency than item-level checks alone.
    • The automated pipeline offers a robust baseline for evaluating RCT reporting completeness.
    • The developed system has the potential to serve as a practical tool for authors, reviewers, and editors to improve RCT report transparency.