Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 10, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Comparing supervised machine learning and large language models in title-abstract screening.

Marco F Aigner¹, Matthias Ganzinger², Pascal Probst^3,4

¹Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany.

Systematic Reviews

|June 9, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Randomised feasibility trial of outpatient versus inpatient laparoscopic cholecystectomy for benign gallbladder disease (OUTchol): study protocol.

BMJ open·2026

Same author

Hypochlorous acid testing studies: elective surgeries (Hypoclates:Elective).

BMJ open·2026

Same author

High-Frequency Global Postoperative Status PROMs Track Pain Peaks and Analgesic Use After Degenerative Lumbar Spine Surgery.

Global spine journal·2026

Same author

Sleeve gastrectomy versus Roux-en-Y-gastric bypass in patients with body mass index over 50 kg/m2: international multicentre cohort.

BJS open·2026

Same author

Quality of Life Trajectories With Integration Into Electronic Health Records for High-Resolution Patient Outcomes: Algorithm Development and Validation Study.

Journal of medical Internet research·2026

Same author

Single source - triple flow: Structured electronic data capture for pancreatic surgery patients.

Digital health·2026

Same journal

Defining interstitial lung disease related to idiopathic inflammatory myopathies: a systematic review protocol of the Myositis Clinical Trial Consortium (MCTC).

Systematic reviews·2026

Same journal

Sexual and reproductive health information needs of young adults with chronic conditions: a scoping review protocol.

Systematic reviews·2026

Same journal

A critical systematic review of the impact of sedentarism and physical inactivity on the cognitive performance of children.

Systematic reviews·2026

Same journal

Effectiveness of health education interventions on breast cancer literacy and breast screening uptake in sub-Saharan Africa: a systematic review protocol.

Systematic reviews·2026

Same journal

Circulating microRNA dysregulation in hypertrophic cardiomyopathy, arrhythmogenic cardiomyopathy, and dilated cardiomyopathy: a systematic review.

Systematic reviews·2026

Same journal

Efficacy of probiotics in the management of oral candidiasis: an umbrella review of systematic reviews and meta-analyses.

Systematic reviews·2026

See all related articles

This summary is machine-generated.

Supervised machine learning and large language models show promise for automating systematic review screening, with both achieving high recall comparable to human reviewers. Supervised models offer better specificity, while large language models provide more sensitive, explainable results.

Area of Science:

Bibliometrics
Artificial Intelligence in Research

Background:

Systematic reviews necessitate efficient article screening.
Automating title/abstract screening using machine learning (ML) or large language models (LLMs) can accelerate review processes.
Direct comparisons between TF-IDF-based supervised ML and zero-shot LLMs for screening automation are limited.

Purpose of the Study:

To directly compare the feasibility and performance of common supervised ML models and a zero-shot LLM for systematic review title/abstract screening automation.
To evaluate performance across different datasets and identify scenarios where each approach is most effective.

Main Methods:

Four supervised ML models (Naïve Bayes, SVM, Random Forest, Logistic Regression) and one LLM (Llama-3.1-8B-Instruct) were used to predict article eligibility.

Keywords:

Large language model Supervised machine learning Systematic review Title/abstract-screening

Related Experiment Videos

Last Updated: Jun 10, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Article eligibility was based on human reviewer decisions from six systematic reviews.

Performance was evaluated using recall, specificity, precision, F1-score, and accuracy over 1000 bootstrap samples, compared against single human reviewer performance (0.86 recall, 0.79 specificity).

Main Results:

Model performance varied significantly across datasets.
Supervised ML models (except Naïve Bayes) showed aligned recall and specificity compared to the LLM.
LLM matched human recall; Naïve Bayes exceeded it, but both fell below human specificity.
Logistic Regression, Random Forest, and SVM had lower recall but higher specificity than humans.

Conclusions:

Both supervised ML and LLMs achieve high recall, nearing or exceeding human levels.
Supervised ML models offer a better harmonic mean of recall and specificity.
LLMs are more sensitive and provide explainable reasoning, suggesting tandem use with human reviewers for critical reviews.