Surgical Instrument Segmentation via Segment-Then-Classify Framework with Instance-Level Spatiotemporal Consistency Modeling
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a novel Segment-then-Classify framework for precise surgical instrument segmentation in endoscopic videos, improving robot-assisted surgery accuracy and stability.
Area Of Science
- Robotics
- Computer Vision
- Medical Imaging
Background
- Accurate segmentation of surgical instruments is vital for robot-assisted surgery and intraoperative analysis.
- Existing methods struggle with spatial completeness and temporal stability, especially under occlusion or motion blur.
Purpose Of The Study
- To present a Segment-then-Classify framework that decouples mask generation from semantic classification.
- To enhance spatial completeness and temporal stability in surgical instrument segmentation.
- To improve interpretability and robustness in challenging surgical video conditions.
Main Methods
- Utilized a Mask2Former-based segmentation backbone for class-agnostic instance mask and region feature generation.
- Employed a bounding box-guided instance-level spatiotemporal modeling module.
- Fused geometric priors and temporal consistency using a lightweight transformer encoder.
Main Results
- Achieved significant improvements in mean Intersection over Union (mIoU) by 3.06%, 2.99%, and 1.67% on EndoVis datasets.
- Demonstrated substantial gains in mean correspondence Intersection over Union (mcIoU) of 2.36%, 2.85%, and 6.06% over state-of-the-art methods.
- Maintained computational efficiency while enhancing segmentation performance.
Conclusions
- The proposed Segment-then-Classify framework effectively enhances spatial completeness and temporal stability in surgical instrument segmentation.
- The framework shows superior performance and robustness compared to existing methods on benchmark datasets.
- This approach offers a promising solution for improving accuracy and reliability in robot-assisted surgery.

