Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 27, 2026

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Published on: August 9, 2024

Surgical Video Understanding with Alignment-Preserving Temporal Adaptation and Action Triplet Text Alignment.

Taiyo Ikeido¹, Ren Togo², Takahiro Ogawa²

¹Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan.

Bioengineering (Basel, Switzerland)

|June 26, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evaluation of 2D and 3D nnU-Net models with two-label and three-label strategies for automatic segmentation and total metabolic tumor volume estimation of metastatic differentiated thyroid carcinoma on FDG-PET/CT.

Japanese journal of radiology·2026

Same author

Effects of gadolinium-based contrast agents on the cardiac diffusion-weighted images of phantoms and patients.

Magma (New York, N.Y.)·2026

Same author

Machine Learning-Based Prognostic Prediction for Knee Osteoarthritis After High Tibial Osteotomy Using Wavelet-Derived Gait Features.

Journal of functional morphology and kinesiology·2026

Same author

Impact of Basal Ganglia Perivascular Spaces on Ischemic and Hemorrhagic Risks in Patients Taking Antithrombotic Therapies.

Neurology·2026

Same author

Transfer Learning Strategies for Pathological Foundation Models: A Systematic Evaluation in Brain Tumor Classification.

Pathology international·2026

Same author

Privacy-Aware Continual Self-Supervised Learning on Multi-Window Chest Computed Tomography for Domain-Shift Robustness.

Bioengineering (Basel, Switzerland)·2026

Same journal

Correction: Komatsu et al. Three-Dimensional Visualization and Detection of the Pulmonary Venous-Left Atrium Connection Using Artificial Intelligence in Fetal Cardiac Ultrasound Screening. <i>Bioengineering</i> 2026, <i>13</i>, 100.

Bioengineering (Basel, Switzerland)·2026

Same journal

Comparison of CO<sub>2</sub> Laser and Microdebrider in the Surgical Treatment of Pediatric Recurrent Respiratory Papillomatosis: A Retrospective Analysis.

Bioengineering (Basel, Switzerland)·2026

Same journal

Toward More Translational Tumor Models: Breast dECM-Based 3D Systems Capture Native Microenvironmental Cues.

Bioengineering (Basel, Switzerland)·2026

Same journal

Postural Stability Changes During the 4 Phases of the Half Squat: Kinematics Profile of the Center of Pressure and Center of Mass in High-Performance Weightlifters-A Pilot Study.

Bioengineering (Basel, Switzerland)·2026

Same journal

Definite Implant Position as Novel Readout for Effectiveness of Ridge Preservation Indicates to Beneficial Effect of Combined Treatment with Platelet-Rich Fibrin (PRF) and Xenogenic Biomaterial in Bone Regeneration.

Bioengineering (Basel, Switzerland)·2026

Same journal

Trueness and Precision of Intraoral Scanners for 3D-Printed Orthodontic Models with Attachments: An In Vitro Comparative Study.

Bioengineering (Basel, Switzerland)·2026

See all related articles

This summary is machine-generated.

This study introduces an efficient framework for surgical video analysis using a pretrained vision-language model and a temporal adapter. It enhances surgical phase recognition and enables few-shot activity recognition with minimal annotations.

Area of Science:

Medical image analysis
Computer vision
Artificial intelligence in surgery

Background:

Surgical workflow understanding is crucial but hindered by the high cost and expertise required for dense video annotations.
Existing methods struggle with annotation efficiency for long-horizon surgical videos.

Purpose of the Study:

To develop a text-guided, annotation-efficient framework for surgical video understanding.
To leverage a frozen surgical vision-language-pretrained (VLP) encoder with a lightweight temporal adapter.
To improve surgical phase recognition and enable few-shot activity recognition.

Main Methods:

Utilized a frozen SurgVLP image encoder for frame-level visual embeddings.
Employed a lightweight temporal adapter to aggregate embeddings into clip-level representations.

Keywords:

CholecT50 few-shot recognition surgical video understanding temporal adaptation text prototype matching vision–language pretraining

More Related Videos

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

Related Experiment Videos

Last Updated: Jun 27, 2026

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Automatic Surgery in Transcatheter Aortic Valve Replacement Using Augmented Reality

Published on: August 9, 2024

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

Evaluated using text-guided prototype matching for phase recognition and few-shot triplet recognition on the CholecT50 dataset.

Implemented a Text Contrastive method with rich phase prompts.

Main Results:

Temporal adaptation enhanced phase recognition while preserving the SurgVLP embedding space.
The Text Contrastive method with rich phase prompts achieved superior phase recognition performance.
The framework enabled classifier-free few-shot triplet recognition without a dedicated triplet classifier.

Conclusions:

Effective surgical video understanding with limited annotations requires temporal adaptation and preserved alignment with the pretrained text space.
Semantically informative text prompts are vital for improving performance.
The proposed framework offers an annotation-efficient approach to surgical video analysis.