Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

RPCANet$^{++}$: Deep Interpretable Robust PCA for Sparse Object Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

MRCNet: Motion Reasoning Chain for Cross Modal Video Camouflaged Object Detection.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Distillation-free Scaling of Large State-Space Models for Images and Videos.

International journal of computer vision·2026
Same author

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

DFormer++: Improving RGBD Representation Learning for Semantic Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking With Alternating Detection and Association.

IEEE transactions on pattern analysis and machine intelligence·2025
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Dec 10, 2025

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

3.2K

MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation.

Shijie Li, Yazan Abu Farha, Yun Liu

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |September 4, 2020
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a novel multi-stage deep learning architecture for temporal action segmentation in long videos. The proposed model effectively reduces over-segmentation errors and achieves state-of-the-art performance on benchmark datasets.

    More Related Videos

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    875
    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
    04:48

    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

    Published on: July 5, 2024

    677

    Related Experiment Videos

    Last Updated: Dec 10, 2025

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    3.2K
    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    875
    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
    04:48

    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

    Published on: July 5, 2024

    677

    Area of Science:

    • Computer Science
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Deep learning excels at classifying short videos, prompting research into action segmentation for long, untrimmed videos.
    • Current methods using temporal convolutions and pooling for action segmentation often result in over-segmentation errors.
    • Addressing temporal dependencies in long videos remains a challenge for accurate action recognition.

    Purpose of the Study:

    • To propose a novel multi-stage deep learning architecture for temporal action segmentation.
    • To overcome the over-segmentation errors inherent in existing state-of-the-art approaches.
    • To improve the capture of long-range dependencies and recognition of action segments in untrimmed videos.

    Main Methods:

    • A multi-stage architecture with initial prediction and subsequent refinement stages.
    • Utilizing dilated temporal convolutions with large receptive fields and few parameters in each stage.
    • Introducing a dual dilated layer to combine large and small receptive fields, addressing limitations in lower layers.
    • Decoupling the design of the first stage from refining stages to meet specific requirements.

    Main Results:

    • The proposed architecture effectively captures long-range dependencies crucial for action segmentation.
    • The dual dilated layer successfully mitigates the small receptive field issue in lower network layers.
    • The model achieves state-of-the-art results on the 50Salads, Georgia Tech Egocentric Activities (GTEA), and Breakfast datasets.
    • Demonstrated significant reduction in over-segmentation errors compared to previous methods.

    Conclusions:

    • The proposed multi-stage architecture offers an effective solution for temporal action segmentation in long untrimmed videos.
    • The novel dual dilated layer and decoupled stage designs enhance the model's ability to handle temporal complexities.
    • The achieved state-of-the-art results validate the model's effectiveness and potential for real-world applications.