Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 10, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Comparing supervised machine learning and large language models in title-abstract screening.

Marco F Aigner1, Matthias Ganzinger2, Pascal Probst3,4

  • 1Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany.

Systematic Reviews
|June 9, 2026
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Randomised feasibility trial of outpatient versus inpatient laparoscopic cholecystectomy for benign gallbladder disease (OUTchol): study protocol.

BMJ open·2026
Same author

Hypochlorous acid testing studies: elective surgeries (Hypoclates:Elective).

BMJ open·2026
Same author

High-Frequency Global Postoperative Status PROMs Track Pain Peaks and Analgesic Use After Degenerative Lumbar Spine Surgery.

Global spine journal·2026
Same author

Sleeve gastrectomy versus Roux-en-Y-gastric bypass in patients with body mass index over 50 kg/m2: international multicentre cohort.

BJS open·2026
Same author

Quality of Life Trajectories With Integration Into Electronic Health Records for High-Resolution Patient Outcomes: Algorithm Development and Validation Study.

Journal of medical Internet research·2026
Same author

Single source - triple flow: Structured electronic data capture for pancreatic surgery patients.

Digital health·2026
Same journal

Defining interstitial lung disease related to idiopathic inflammatory myopathies: a systematic review protocol of the Myositis Clinical Trial Consortium (MCTC).

Systematic reviews·2026
Same journal

Sexual and reproductive health information needs of young adults with chronic conditions: a scoping review protocol.

Systematic reviews·2026
Same journal

A critical systematic review of the impact of sedentarism and physical inactivity on the cognitive performance of children.

Systematic reviews·2026
Same journal

Effectiveness of health education interventions on breast cancer literacy and breast screening uptake in sub-Saharan Africa: a systematic review protocol.

Systematic reviews·2026
Same journal

Circulating microRNA dysregulation in hypertrophic cardiomyopathy, arrhythmogenic cardiomyopathy, and dilated cardiomyopathy: a systematic review.

Systematic reviews·2026
Same journal

Efficacy of probiotics in the management of oral candidiasis: an umbrella review of systematic reviews and meta-analyses.

Systematic reviews·2026
See all related articles
This summary is machine-generated.

Supervised machine learning and large language models show promise for automating systematic review screening, with both achieving high recall comparable to human reviewers. Supervised models offer better specificity, while large language models provide more sensitive, explainable results.

Area of Science:

  • Bibliometrics
  • Artificial Intelligence in Research

Background:

  • Systematic reviews necessitate efficient article screening.
  • Automating title/abstract screening using machine learning (ML) or large language models (LLMs) can accelerate review processes.
  • Direct comparisons between TF-IDF-based supervised ML and zero-shot LLMs for screening automation are limited.

Purpose of the Study:

  • To directly compare the feasibility and performance of common supervised ML models and a zero-shot LLM for systematic review title/abstract screening automation.
  • To evaluate performance across different datasets and identify scenarios where each approach is most effective.

Main Methods:

  • Four supervised ML models (Naïve Bayes, SVM, Random Forest, Logistic Regression) and one LLM (Llama-3.1-8B-Instruct) were used to predict article eligibility.
Keywords:
Large language modelSupervised machine learningSystematic reviewTitle/abstract-screening

Related Experiment Videos

Last Updated: Jun 10, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

  • Article eligibility was based on human reviewer decisions from six systematic reviews.
  • Performance was evaluated using recall, specificity, precision, F1-score, and accuracy over 1000 bootstrap samples, compared against single human reviewer performance (0.86 recall, 0.79 specificity).
  • Main Results:

    • Model performance varied significantly across datasets.
    • Supervised ML models (except Naïve Bayes) showed aligned recall and specificity compared to the LLM.
    • LLM matched human recall; Naïve Bayes exceeded it, but both fell below human specificity.
    • Logistic Regression, Random Forest, and SVM had lower recall but higher specificity than humans.

    Conclusions:

    • Both supervised ML and LLMs achieve high recall, nearing or exceeding human levels.
    • Supervised ML models offer a better harmonic mean of recall and specificity.
    • LLMs are more sensitive and provide explainable reasoning, suggesting tandem use with human reviewers for critical reviews.