Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology
View abstract on PubMed
Summary
This summary is machine-generated.Human reviewers struggle to identify AI-generated abstracts, with AI detection software showing imperfect accuracy. Guidelines are needed for AI use in scientific writing and review processes.
Area Of Science
- Medical research
- Artificial intelligence in medicine
- Scientific writing
Background
- ChatGPT, an accessible AI language model, is increasingly used in medical research.
- The medical community needs to understand AI's capabilities, ethics, and implications for authorship.
- Uncertainties exist regarding AI writing quality, accuracy, and detection in scientific contexts.
Purpose Of The Study
- To compare human reviewers' and AI detection software's ability to identify AI-generated abstracts in Gynecology and Urogynecology.
- To analyze differences in writing errors, readability, and quality between original and AI-generated abstracts.
Main Methods
- 25 original abstracts and 25 AI-generated abstracts (using ChatGPT) were selected from Gynecology and Urogynecology journals.
- Blinded faculty and fellows reviewed abstracts to identify their origin (human or AI).
- AI detection software (GPTZero, Originality, Copyleaks) and Grammarly were used for analysis.
Main Results
- Human reviewers correctly identified only 49.7% of abstracts overall.
- AI detectors were more likely to flag AI-generated abstracts (73.3% GPTZero, 98.1% Originality, 58.2% Copyleaks) than original ones.
- Grammarly identified more writing errors in original abstracts, indicating poorer quality compared to AI-generated text.
Conclusions
- Human reviewers cannot reliably distinguish AI-generated scientific text due to its realism.
- AI detection software shows promise but requires improvement for optimal accuracy.
- Clear guidelines for AI use and detection in manuscript review are essential as AI adoption grows.

