Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

RPCANet$^{++}$: Deep Interpretable Robust PCA for Sparse Object Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

MRCNet: Motion Reasoning Chain for Cross Modal Video Camouflaged Object Detection.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Distillation-free Scaling of Large State-Space Models for Images and Videos.

International journal of computer vision·2026

Same author

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

DFormer++: Improving RGBD Representation Learning for Semantic Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking With Alternating Detection and Association.

IEEE transactions on pattern analysis and machine intelligence·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 10, 2025

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation.

Shijie Li, Yazan Abu Farha, Yun Liu

IEEE Transactions on Pattern Analysis and Machine Intelligence

|September 4, 2020

Summary

This summary is machine-generated.

This study introduces a novel multi-stage deep learning architecture for temporal action segmentation in long videos. The proposed model effectively reduces over-segmentation errors and achieves state-of-the-art performance on benchmark datasets.

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

Related Experiment Videos

Last Updated: Dec 10, 2025

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

Area of Science:

Computer Science
Artificial Intelligence
Machine Learning

Background:

Deep learning excels at classifying short videos, prompting research into action segmentation for long, untrimmed videos.
Current methods using temporal convolutions and pooling for action segmentation often result in over-segmentation errors.
Addressing temporal dependencies in long videos remains a challenge for accurate action recognition.

Purpose of the Study:

To propose a novel multi-stage deep learning architecture for temporal action segmentation.
To overcome the over-segmentation errors inherent in existing state-of-the-art approaches.
To improve the capture of long-range dependencies and recognition of action segments in untrimmed videos.

Main Methods:

A multi-stage architecture with initial prediction and subsequent refinement stages.
Utilizing dilated temporal convolutions with large receptive fields and few parameters in each stage.
Introducing a dual dilated layer to combine large and small receptive fields, addressing limitations in lower layers.
Decoupling the design of the first stage from refining stages to meet specific requirements.

Main Results:

The proposed architecture effectively captures long-range dependencies crucial for action segmentation.
The dual dilated layer successfully mitigates the small receptive field issue in lower network layers.
The model achieves state-of-the-art results on the 50Salads, Georgia Tech Egocentric Activities (GTEA), and Breakfast datasets.
Demonstrated significant reduction in over-segmentation errors compared to previous methods.

Conclusions:

The proposed multi-stage architecture offers an effective solution for temporal action segmentation in long untrimmed videos.
The novel dual dilated layer and decoupled stage designs enhance the model's ability to handle temporal complexities.
The achieved state-of-the-art results validate the model's effectiveness and potential for real-world applications.