Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Endoscopic Procedures III: Video Capsule Endoscopy01:28

Endoscopic Procedures III: Video Capsule Endoscopy

653
Capsule endoscopy, or wireless or video capsule endoscopy, is a diagnostic procedure for examining the entire gastrointestinal tract. Patients swallow a capsule about the size of a vitamin tablet. The capsule is equipped with a transmitter, a battery, an LED light source, and a color video camera to capture images throughout the gastrointestinal tract. This procedure is particularly useful for diagnosing conditions such as Crohn's disease, ulcerative colitis, tumors, polyps, ulcers,...
653

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Feasibility of Vibroacoustic Sensing for Detection of Peritoneal Entry During Laparoscopic Access: A Pilot Study in a Human Body Donor.

Diagnostics (Basel, Switzerland)·2026
Same author

Publisher Correction: Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence.

NPJ digital medicine·2026
Same author

Impact of Renal Impairment and Lymphodepletion Regimen on Outcomes after CAR T Cell Therapy in Relapsed/Refractory Multiple Myeloma.

Transplantation and cellular therapy·2026
Same author

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence.

NPJ digital medicine·2026
Same author

Body composition predicts poor outcomes and reveals immunometabolic dysfunction via single-cell profiling in anti-BCMA CAR T-treated myeloma.

HemaSphere·2026
Same author

A Treatment Decision Model for Cutaneous Squamous Cell Carcinoma Based on Bayesian Networks.

Cancers·2026
Same journal

ESD-VesNet: uncertainty-aware vessel segmentation network for endoscopic submucosal dissection with hard negative mining.

International journal of computer assisted radiology and surgery·2026
Same journal

Lean Unet: a compact model for image segmentation.

International journal of computer assisted radiology and surgery·2026
Same journal

Strain alignment: toward assessing mechanical plausibility of predicted displacement fields.

International journal of computer assisted radiology and surgery·2026
Same journal

Vascular geometry characterization for AI-based endovascular navigation.

International journal of computer assisted radiology and surgery·2026
Same journal

Nail It! A learning framework for autonomous surgical suturing and teleoperation on the dVRK.

International journal of computer assisted radiology and surgery·2026
Same journal

Correspondence-free local-to-global liver deformation correction via implicit neural representation and biomechanical model.

International journal of computer assisted radiology and surgery·2026
See all related articles

Related Experiment Video

Updated: Jan 7, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

3.3K

Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT

Richard Bieck1, Martin Sorge2, Katharina Heuermann2

  • 1Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstraße 14, 04103, Leipzig, Germany. Richard.bieck@medizin.uni-leipzig.de.

International Journal of Computer Assisted Radiology and Surgery
|December 22, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a vision-language model (VLM) for endoscopic ENT surgeries, enhancing image classification and report generation. The VLM integrates visual and textual data, outperforming existing models for multi-task assistance.

Keywords:
Deep learningExplainabilityFESSImage embeddingImage-based endoscopic navigationPre-trainingText embeddingTransformersVision-language models

More Related Videos

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery
09:41

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

Published on: May 20, 2016

12.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

980

Related Experiment Videos

Last Updated: Jan 7, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

3.3K
A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery
09:41

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

Published on: May 20, 2016

12.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

980

Area of Science:

  • Medical Artificial Intelligence
  • Computer Vision
  • Natural Language Processing

Background:

  • Current deep learning models for endoscopic assistance primarily use image-based tasks.
  • Integration of natural language processing is limited, hindering comprehensive assistance capabilities.

Purpose of the Study:

  • To develop and evaluate a vision-language model (VLM) for multi-task learning in endoscopic ENT surgeries.
  • The VLM aims to perform image classification, text prediction, and surgical report generation.

Main Methods:

  • A VLM architecture with domain-biased encoders for image and text embedding was employed.
  • The model was trained on a new multi-task dataset from 30 endoscopic procedures (130,000 images, reports).
  • Evaluated two VLM variations against baseline, EndoVit, and SurgicalGPT models using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity.

Main Results:

  • The VLM improved image classification F1 scores by up to 12% and text generation by up to 14%.
  • Domain-specific VLM slightly outperformed EndoVit and SurgicalGPT.
  • Ablation studies showed vision component benefits language tasks, while text minimally impacts landmark detection.

Conclusions:

  • A novel VLM was developed for endoscopic ENT assistance, integrating image and text data.
  • The VLM replaces three isolated models, offering multi-task assistance and surpassing previous general-purpose baselines.
  • Future work needs to address imbalanced class distributions and improve structured text generation.