Jove
Visualize
Contact Us

Related Concept Videos

JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies
  1. Home
  2. Prompt Injection Attacks On Vision-language Models For Surgical Decision Support.
  1. Home
  2. Prompt Injection Attacks On Vision-language Models For Surgical Decision Support.

Related Experiment Video

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

681

Prompt injection attacks on vision-language models for surgical decision support.

Zheyuan Zhang1, Muhammad Ibtsaam Qadir1, Matthias Carstens1,2

  • 1Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA.

Medrxiv : the Preprint Server for Health Sciences
|August 8, 2025

View abstract on PubMed

Summary
This summary is machine-generated.

Vision-language models show promise for surgical AI but are vulnerable to prompt injection attacks. Robustness varies, with Gemini 2.5 Pro demonstrating greater resilience than GPT-o4-mini-high, highlighting the need for safety measures.

More Related Videos

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

2.9K
Author Spotlight: Segmentation and VR for Advanced Neurovascular Interventions
06:18

Author Spotlight: Segmentation and VR for Advanced Neurovascular Interventions

Published on: April 5, 2024

1.2K

Related Experiment Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

681
Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

2.9K
Author Spotlight: Segmentation and VR for Advanced Neurovascular Interventions
06:18

Author Spotlight: Segmentation and VR for Advanced Neurovascular Interventions

Published on: April 5, 2024

1.2K

Area of Science:

  • Artificial Intelligence in Surgery
  • Computer Vision
  • Medical Decision Support

Background:

  • AI-driven analysis of surgical videos can enhance safety and precision in minimally invasive procedures.
  • Vision-language models (VLMs) offer advanced capabilities for interpreting complex video data in surgical contexts.
  • VLMs are susceptible to prompt injection attacks, posing risks to clinical applications.

Purpose of the Study:

  • To systematically assess the vulnerability of state-of-the-art VLMs to textual and visual prompt injection attacks.
  • To evaluate VLM performance on clinically relevant surgical decision support tasks under adversarial conditions.

Main Methods:

  • Evaluated four VLMs (Gemini 1.5 Pro, Gemini 2.5 Pro, GPT-o4-mini-high, Qwen 2.5-VL) on eleven surgical decision support tasks.
  • Simulated prompt injection attacks using misleading text and visual overlays at varying durations.
  • Measured model accuracy by comparing performance under baseline and prompt injection conditions.
  • Main Results:

    • All VLMs showed reduced accuracy due to prompt injections; prolonged visual injections were more detrimental.
    • Gemini 2.5 Pro exhibited the highest baseline accuracy (0.82) and demonstrated superior robustness against attacks.
    • GPT-o4-mini-high was most vulnerable, with accuracy dropping from 0.67 to 0.24 under full-duration visual injection (P < .001).

    Conclusions:

    • Current VLMs are susceptible to prompt injection attacks, compromising their reliability for surgical decision support.
    • Robust temporal reasoning and specialized safety guardrails are essential for safe real-time deployment of VLMs in surgery.
    • Findings underscore the need for rigorous security evaluations before integrating AI into clinical workflows.