Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 27, 2026

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality
07:46

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Published on: August 9, 2024

Surgical Video Understanding with Alignment-Preserving Temporal Adaptation and Action Triplet Text Alignment.

Taiyo Ikeido1, Ren Togo2, Takahiro Ogawa2

  • 1Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan.

Bioengineering (Basel, Switzerland)
|June 26, 2026
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluation of 2D and 3D nnU-Net models with two-label and three-label strategies for automatic segmentation and total metabolic tumor volume estimation of metastatic differentiated thyroid carcinoma on FDG-PET/CT.

Japanese journal of radiology·2026
Same author

Effects of gadolinium-based contrast agents on the cardiac diffusion-weighted images of phantoms and patients.

Magma (New York, N.Y.)·2026
Same author

Machine Learning-Based Prognostic Prediction for Knee Osteoarthritis After High Tibial Osteotomy Using Wavelet-Derived Gait Features.

Journal of functional morphology and kinesiology·2026
Same author

Impact of Basal Ganglia Perivascular Spaces on Ischemic and Hemorrhagic Risks in Patients Taking Antithrombotic Therapies.

Neurology·2026
Same author

Transfer Learning Strategies for Pathological Foundation Models: A Systematic Evaluation in Brain Tumor Classification.

Pathology international·2026
Same author

Privacy-Aware Continual Self-Supervised Learning on Multi-Window Chest Computed Tomography for Domain-Shift Robustness.

Bioengineering (Basel, Switzerland)·2026
Same journal

Correction: Komatsu et al. Three-Dimensional Visualization and Detection of the Pulmonary Venous-Left Atrium Connection Using Artificial Intelligence in Fetal Cardiac Ultrasound Screening. <i>Bioengineering</i> 2026, <i>13</i>, 100.

Bioengineering (Basel, Switzerland)·2026
Same journal

Comparison of CO<sub>2</sub> Laser and Microdebrider in the Surgical Treatment of Pediatric Recurrent Respiratory Papillomatosis: A Retrospective Analysis.

Bioengineering (Basel, Switzerland)·2026
Same journal

Toward More Translational Tumor Models: Breast dECM-Based 3D Systems Capture Native Microenvironmental Cues.

Bioengineering (Basel, Switzerland)·2026
Same journal

Postural Stability Changes During the 4 Phases of the Half Squat: Kinematics Profile of the Center of Pressure and Center of Mass in High-Performance Weightlifters-A Pilot Study.

Bioengineering (Basel, Switzerland)·2026
Same journal

Definite Implant Position as Novel Readout for Effectiveness of Ridge Preservation Indicates to Beneficial Effect of Combined Treatment with Platelet-Rich Fibrin (PRF) and Xenogenic Biomaterial in Bone Regeneration.

Bioengineering (Basel, Switzerland)·2026
Same journal

Trueness and Precision of Intraoral Scanners for 3D-Printed Orthodontic Models with Attachments: An In Vitro Comparative Study.

Bioengineering (Basel, Switzerland)·2026
See all related articles
This summary is machine-generated.

This study introduces an efficient framework for surgical video analysis using a pretrained vision-language model and a temporal adapter. It enhances surgical phase recognition and enables few-shot activity recognition with minimal annotations.

Area of Science:

  • Medical image analysis
  • Computer vision
  • Artificial intelligence in surgery

Background:

  • Surgical workflow understanding is crucial but hindered by the high cost and expertise required for dense video annotations.
  • Existing methods struggle with annotation efficiency for long-horizon surgical videos.

Purpose of the Study:

  • To develop a text-guided, annotation-efficient framework for surgical video understanding.
  • To leverage a frozen surgical vision-language-pretrained (VLP) encoder with a lightweight temporal adapter.
  • To improve surgical phase recognition and enable few-shot activity recognition.

Main Methods:

  • Utilized a frozen SurgVLP image encoder for frame-level visual embeddings.
  • Employed a lightweight temporal adapter to aggregate embeddings into clip-level representations.
Keywords:
CholecT50few-shot recognitionsurgical video understandingtemporal adaptationtext prototype matchingvision–language pretraining

More Related Videos

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery
05:12

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

Related Experiment Videos

Last Updated: Jun 27, 2026

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality
07:46

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Published on: August 9, 2024

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery
05:12

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

  • Evaluated using text-guided prototype matching for phase recognition and few-shot triplet recognition on the CholecT50 dataset.
  • Implemented a Text Contrastive method with rich phase prompts.
  • Main Results:

    • Temporal adaptation enhanced phase recognition while preserving the SurgVLP embedding space.
    • The Text Contrastive method with rich phase prompts achieved superior phase recognition performance.
    • The framework enabled classifier-free few-shot triplet recognition without a dedicated triplet classifier.

    Conclusions:

    • Effective surgical video understanding with limited annotations requires temporal adaptation and preserved alignment with the pretrained text space.
    • Semantically informative text prompts are vital for improving performance.
    • The proposed framework offers an annotation-efficient approach to surgical video analysis.