AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

  • 0Princess Margaret Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada.

Summary

This summary is machine-generated.

AutoReporter, a large language model system, automates research reporting guideline adherence checks. This AI tool offers high accuracy and substantial agreement with human experts, improving scientific quality control.

Area Of Science

  • Biomedical Informatics
  • Scientific Publishing
  • Artificial Intelligence

Background

  • Adherence to research reporting guidelines is crucial for scientific integrity and reproducibility.
  • Manual assessment of guideline adherence is time-consuming and resource-intensive.
  • Existing automated methods often require extensive training data and computational resources.

Purpose Of The Study

  • To develop AutoReporter, a novel large language model (LLM) system designed for automated evaluation of adherence to research reporting guidelines.
  • To benchmark various prompt-engineering and retrieval strategies for LLM-based guideline adherence assessment.
  • To validate the performance of the developed system on a diverse dataset.

Main Methods

  • Eight prompt-engineering and retrieval strategies were evaluated using reasoning and general-purpose LLMs on the SPIRIT-CONSORT-TM corpus.
  • The top-performing approach, AutoReporter, utilized a zero-shot, no-retrieval prompt with the o3-mini reasoning LLM.
  • AutoReporter was validated on BenchReport, a new dataset comprising expert-rated assessments from 10 systematic reviews.

Main Results

  • AutoReporter achieved high accuracy: 90.09% for CONSORT and 92.07% for SPIRIT guidelines.
  • The system demonstrated substantial agreement with human experts (Cohen's κ > 0.70 for both CONSORT and SPIRIT).
  • Performance on the BenchReport dataset yielded a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings.

Conclusions

  • Structured prompting alone, as implemented in AutoReporter, can achieve performance comparable to or exceeding fine-tuned domain-specific models.
  • This approach eliminates the need for manually annotated corpora and computationally intensive training.
  • LLMs offer a feasible solution for automating reporting guideline adherence assessments, enabling scalable quality control in scientific research.

Related Concept Videos

Pre-Procedural Guidelines for Assessing Blood Pressure 01:10

788

Accurate blood pressure assessment is crucial for diagnosing and managing various health conditions. To ensure the reliability of these measurements, healthcare professionals must adhere to standardized pre-procedural guidelines. These guidelines enhance patient safety and improve the overall quality of healthcare. The following steps are essential for obtaining accurate and consistent blood pressure readings, from using the appropriate tools to ensuring effective communication with the...

Non-equilibrium in the Cell 01:16

5.3K

An important concept in studying metabolism and energy is that of chemical equilibrium. Most chemical reactions are reversible. They can proceed in both directions, releasing energy into their environment in one direction, and absorbing it from the environment in the other direction. The same is true for the chemical reactions involved in cell metabolism, such as the breaking down and building up of proteins into and from individual amino acids, respectively. Reactants within a closed system...

Guidelines for Writing Outcome 01:11

3.6K

When developing expected outcomes for a patient care plan, the nurse should adhere to the following recommendations:
Patient outcomes reflect the patient's response to the goal rather than what the nurse aims to achieve. Terminology should be observable and measurable to avoid the reader's interpretation. The desired outcome should be realistic and achievable in the designated care timeframe. Expected outcomes should align with adjunctive therapies. The outcome should enhance care...