Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Effect of Mechanical Polishing on Rice Flavor: Comparison and Exploration of Key Aroma Characteristics Components.

Foods (Basel, Switzerland)·2026

Same author

Combined inhibition of BETs and HDACs as a potential epigenetics-based therapy for malignant rhabdoid tumor.

Cell death & disease·2026

Same author

Arginine metabolism and the NF-ĸB pathway jointly regulate the airway inflammation in asthma mediated by ILC2s.

International immunopharmacology·2026

Same author

Debranching and OSA esterification of waxy maize starch: effects on nanoparticle properties and emulsion performance.

Food chemistry: X·2026

Same author

A Synthetic Data-Augmented Deep Learning Framework for Robust Segmentation and Quantification of the Carotid Artery in Ultrasound Images.

Ultrasound in medicine & biology·2026

Same author

CENPA as a Genome Stability-Associated Biomarker in Hepatocellular Carcinoma: Multiomics Analysis and Experimental Validation.

Human mutation·2026

Same journal

Hyperbolic Cycle Alignment for Infrared-Visible Image Fusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Learning Gaze Synthesizer via 3D-eye Controlled Diffusion and Cross-domain Feature Alignment.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Underlying Semantic Diffusion for Effective and Efficient In-Context Learning.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

DiffRES: Unleashing Text-to-Image Diffusion Models for Generative Referring Expression Segmentation without Information Leakage.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Location Matters: Frequency-Spatial Dual Space Adaptation for Cross-Domain Few-Shot Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BayeTopo: Bayesian-based Topology-guided Learning for Vascular Imaging Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 3, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

StochasticFormer: Stochastic Modeling for Weakly Supervised Temporal Action Localization.

Haichao Shi, Xiao-Yu Zhang, Changsheng Li

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|April 7, 2023

Summary

This summary is machine-generated.

This study introduces StochasticFormer, a novel framework for weakly supervised temporal action localization (WS-TAL) that addresses under- and over-localization issues. By modeling finer-grained interactions, it achieves more accurate action identification in videos.

More Related Videos

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Published on: February 9, 2017

Corticospinal Excitability Modulation During Action Observation

Corticospinal Excitability Modulation During Action Observation

Published on: December 31, 2013

Related Experiment Videos

Last Updated: Aug 3, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Temporal Ordering of Dynamic Expression Data from Detailed Spatial Expression Maps

Published on: February 9, 2017

Corticospinal Excitability Modulation During Action Observation

Corticospinal Excitability Modulation During Action Observation

Published on: December 31, 2013

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Weakly supervised temporal action localization (WS-TAL) identifies action time intervals using video-level labels.
Existing WS-TAL methods struggle with under- and over-localization, degrading performance.

Purpose of the Study:

To propose StochasticFormer, a transformer-based framework for refined temporal action localization.
To investigate finer-grained interactions among intermediate predictions for improved accuracy.

Main Methods:

Developed a transformer-structured stochastic process modeling framework (StochasticFormer).
Utilized a pseudo localization module to generate variable-length pseudo action instances.
Employed an encoder-decoder network with deterministic and latent paths for information integration.
Optimized the framework using video-level classification, frame-level semantic coherence, and ELBO losses.

Main Results:

StochasticFormer effectively addresses under- and over-localization challenges in WS-TAL.
Demonstrated superior performance compared to state-of-the-art methods on THUMOS14 and ActivityNet1.2 benchmarks.
Achieved further refined localization by investigating finer-grained interactions.

Conclusions:

StochasticFormer offers a robust solution for weakly supervised temporal action localization.
The proposed framework significantly enhances localization accuracy in untrimmed videos.
The approach shows strong potential for advancing video understanding research.