Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Endoscopic Procedures III: Video Capsule Endoscopy

Endoscopic Procedures III: Video Capsule Endoscopy

Capsule endoscopy, or wireless or video capsule endoscopy, is a diagnostic procedure for examining the entire gastrointestinal tract. Patients swallow a capsule about the size of a vitamin tablet. The capsule is equipped with a transmitter, a battery, an LED light source, and a color video camera to capture images throughout the gastrointestinal tract. This procedure is particularly useful for diagnosing conditions such as Crohn's disease, ulcerative colitis, tumors, polyps, ulcers,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Feasibility of Vibroacoustic Sensing for Detection of Peritoneal Entry During Laparoscopic Access: A Pilot Study in a Human Body Donor.

Diagnostics (Basel, Switzerland)·2026

Same author

Publisher Correction: Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence.

NPJ digital medicine·2026

Same author

Impact of Renal Impairment and Lymphodepletion Regimen on Outcomes after CAR T Cell Therapy in Relapsed/Refractory Multiple Myeloma.

Transplantation and cellular therapy·2026

Same author

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence.

NPJ digital medicine·2026

Same author

Body composition predicts poor outcomes and reveals immunometabolic dysfunction via single-cell profiling in anti-BCMA CAR T-treated myeloma.

HemaSphere·2026

Same author

A Treatment Decision Model for Cutaneous Squamous Cell Carcinoma Based on Bayesian Networks.

Cancers·2026

Same journal

ESD-VesNet: uncertainty-aware vessel segmentation network for endoscopic submucosal dissection with hard negative mining.

International journal of computer assisted radiology and surgery·2026

Same journal

Lean Unet: a compact model for image segmentation.

International journal of computer assisted radiology and surgery·2026

Same journal

Strain alignment: toward assessing mechanical plausibility of predicted displacement fields.

International journal of computer assisted radiology and surgery·2026

Same journal

Vascular geometry characterization for AI-based endovascular navigation.

International journal of computer assisted radiology and surgery·2026

Same journal

Nail It! A learning framework for autonomous surgical suturing and teleoperation on the dVRK.

International journal of computer assisted radiology and surgery·2026

Same journal

Correspondence-free local-to-global liver deformation correction via implicit neural representation and biomechanical model.

International journal of computer assisted radiology and surgery·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 7, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT

Richard Bieck¹, Martin Sorge², Katharina Heuermann²

¹Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstraße 14, 04103, Leipzig, Germany. Richard.bieck@medizin.uni-leipzig.de.

International Journal of Computer Assisted Radiology and Surgery

|December 22, 2025

Summary

This summary is machine-generated.

This study introduces a vision-language model (VLM) for endoscopic ENT surgeries, enhancing image classification and report generation. The VLM integrates visual and textual data, outperforming existing models for multi-task assistance.

Keywords:

Deep learning Explainability FESS Image embedding Image-based endoscopic navigation Pre-training Text embedding Transformers Vision-language models

More Related Videos

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

Published on: May 20, 2016

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Jan 7, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

A Pipeline for 3D Multimodality Image Integration and Computer-assisted Planning in Epilepsy Surgery

Published on: May 20, 2016

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Medical Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Current deep learning models for endoscopic assistance primarily use image-based tasks.
Integration of natural language processing is limited, hindering comprehensive assistance capabilities.

Purpose of the Study:

To develop and evaluate a vision-language model (VLM) for multi-task learning in endoscopic ENT surgeries.
The VLM aims to perform image classification, text prediction, and surgical report generation.

Main Methods:

A VLM architecture with domain-biased encoders for image and text embedding was employed.
The model was trained on a new multi-task dataset from 30 endoscopic procedures (130,000 images, reports).
Evaluated two VLM variations against baseline, EndoVit, and SurgicalGPT models using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity.

Main Results:

The VLM improved image classification F1 scores by up to 12% and text generation by up to 14%.
Domain-specific VLM slightly outperformed EndoVit and SurgicalGPT.
Ablation studies showed vision component benefits language tasks, while text minimally impacts landmark detection.

Conclusions:

A novel VLM was developed for endoscopic ENT assistance, integrating image and text data.
The VLM replaces three isolated models, offering multi-task assistance and surpassing previous general-purpose baselines.
Future work needs to address imbalanced class distributions and improve structured text generation.