Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evaluating Large Language Models for Automated Evidence Synthesis in Neuroimaging AI: A Multi-Model Benchmark.

Umid Sulaimanov1, Nafiye Sanlier1, Ariorad Moniri2

  • 1Department of Neurological Surgery, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53792, USA.

Journal of Clinical Medicine
|June 12, 2026
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Anterior Midline Skull Base Meningiomas: A Systematic Review of Resection Rates, Functional Outcomes, and Perioperative Complications Following Contemporary Endoscopic Endonasal Versus Transcranial Approaches.

Journal of clinical medicine·2026
Same author

Open brain biopsy for nonneoplastic undiagnosed neurological conditions: diagnostic yield, clinical impact, and contemporary role.

Irish journal of medical science·2026
Same author

Surgical strategies and long-term survival for third ventricle chordoid gliomas: a systematic review and clinical algorithm.

Neurosurgical review·2026
Same author

Madison Microneurosurgery Initiative: A Tribute to Professor M. Gazi Yaşargil's Legacy in Microvascular Surgery Training. Part I - A Brief History of Microsurgery and Yaşargil's Contributions.

Turkish neurosurgery·2026
Same author

Madison Microneurosurgery Initiative: A Tribute to Professor M. Gazi Yaşargil's Legacy in Microvascular Surgery Training. Part II - Principles Applied and Practices Implemented.

Turkish neurosurgery·2026
Same author

Endoscopic Transorbital Anterior Clinoidectomy: Surgical Anatomy and Step-wise Technique.

Operative neurosurgery (Hagerstown, Md.)·2026
Same journal

Evidence-Based Clinical Recommendations for the Appropriate Use of Diagnostic Tests in Pediatric Allergology: Focus on Asthma, Rhinoconjunctivitis, and Keratoconjunctivitis Vernal.

Journal of clinical medicine·2026
Same journal

Surgical and Transcatheter Approach of a Failed Mitral Valve Repair: A Comprehensive Review on Selecting the Most Suitable Approach.

Journal of clinical medicine·2026
Same journal

Hybrid Metaheuristic Feature Selection for Breast Cancer Detection in Digital Mammography: A Feasibility Study with Nested Validation, Benchmarking, and External Stress Testing.

Journal of clinical medicine·2026
Same journal

Identity Transformation and the Role of Accountability in Recovery from Problematic Pornography Use: A Phenomenological-Hermeneutical Study.

Journal of clinical medicine·2026
Same journal

Does Early Surgical Treatment in Degenerative Cervical Myelopathy Have a Favorable Clinical Outcome and Impact on Quality of Life?

Journal of clinical medicine·2026
Same journal

Shear Wave Elastography in Musculoskeletal Imaging: A Narrative Review.

Journal of clinical medicine·2026
See all related articles
This summary is machine-generated.

Large language models (LLMs) show promise for automating data extraction in systematic reviews, but struggle with complex neuroimaging AI literature. Gemini 3 Pro Preview led in accuracy, though human oversight remains crucial for nuanced data.

Area of Science:

  • Artificial Intelligence
  • Neuroimaging
  • Systematic Reviews

Background:

  • Data extraction for systematic reviews is time-consuming and resource-intensive.
  • Evaluating the utility of advanced AI in automating evidence synthesis is critical.
  • Specialized neuroimaging artificial intelligence (AI) literature presents unique challenges for data extraction.

Purpose of the Study:

  • To assess the performance of four leading large language models (LLMs) in extracting structured metadata from neuroimaging AI literature.
  • To compare the accuracy of Google Gemini 3 Pro Preview, Anthropic Claude Opus 4.5, Perplexity Sonar Pro, and OpenAI GPT 5.2 for complex data extraction tasks.
  • To determine the impact of variable complexity on LLM performance in automated evidence synthesis.

Main Methods:

Keywords:
artificial intelligencebenchmarkingevidence synthesisinformation extractionlarge language modelsneuroimaging

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Jun 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

  • A standardized prompt was used to extract 22 variables from 91 neuroimaging AI articles.
  • Variables were categorized into low, medium, and high complexity tiers.
  • Performance was evaluated using exact-match accuracy against expert-validated ground truth.
  • Main Results:

    • Gemini 3 Pro Preview achieved the highest overall exact-match accuracy (56.4%), outperforming other models.
    • Model performance decreased significantly with increasing variable complexity.
    • Accuracy for low-complexity fields was high (88.9-92.9%), while high-complexity variables yielded very low accuracy (2.7-15.5%).

    Conclusions:

    • Frontier LLMs can automate the extraction of simple, categorical data effectively.
    • Complex methodological variables requiring clinical judgment or multi-section synthesis remain challenging for current LLMs.
    • Human review is indispensable for ensuring accuracy in extracting context-dependent variables from specialized literature.