Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

Deconvolution

Deconvolution

Deconvolution, also known as inverse filtering, is the process of extracting the impulse response from known input and output signals. This technique is vital in scenarios where the system's characteristics are unknown, and they must be inferred from the observable signals.
Deconvolution involves several mathematical techniques to derive the impulse response. One common approach is polynomial division. In this method, the input and output sequences are treated as coefficients of...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Transformation

Transformation

Microbial communities are dynamic environments where cell lysis releases free DNA into the surroundings. Other cells can take up this extracellular DNA through a process known as transformation.When a cell incorporates this foreign DNA into its genome, resulting in genetic modification, the process is known as transformation. Cells capable of this process are termed competent. Competence can be natural, as observed in certain bacteria and archaea, or artificially induced in the...

Improving Translational Accuracy

Improving Translational Accuracy

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Privacy-Enhanced Vertical Federated Learning for Healthcare via Directional Noise and Subset Representations.

IEEE journal of biomedical and health informatics·2026

Same author

Selection of proper artificial intelligence techniques developed for CT scan image analysis of liver cancer using fuzzy AHP-TOPSIS.

BMC medical imaging·2026

Same author

Identification and functional characterization of a novel mutation in the NEUROD1 gene in a Chinese family with maturity-onset diabetes of the young.

Acta diabetologica·2026

Same author

Paving the Way for Point Cloud Video Representation Learning Using a PDE Model.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

MFDA-UNet: Medical Image Segmentation with Frequency-Decoupled Representation and Gated Cross-Scale Integration.

Sensors (Basel, Switzerland)·2026

Same author

Analysis of Long-Term and Short-Term Efficacy of Different Types of Tympanosclerosis Surgery Under Total Otoendoscopy.

The Annals of otology, rhinology, and laryngology·2026

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Videos

Interactive image-to-video transfer learning.

Cong Wu¹, Tianyang Xu², Zhenhua Feng²

¹School of Artificial Intelligence and Computer Science, Jiangnan University, 214122, China; Postdoctoral Research Station in Design, Jiangnan University, 214122, China.

Neural Networks : the Official Journal of the International Neural Network Society

|December 10, 2025

Summary

This summary is machine-generated.

This study introduces an efficient image-to-video transfer learning framework, SDST, that enhances static and dynamic cue interaction for action recognition. The method improves video understanding without extensive fine-tuning, outperforming existing techniques.

Keywords:

Action recognition Dynamic-static interaction modelling Image-to-video transfer learning Spatial-temporal interaction modelling

Related Experiment Videos

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Transfer learning from image to video is crucial for action recognition.
Current methods involve extensive fine-tuning, leading to high computational costs.
Efficient methods often neglect effective video reasoning with frozen image backbones.

Purpose of the Study:

To propose an efficient image-to-video transfer learning framework (SDST) for action recognition.
To enhance the interaction between static and dynamic cues, and spatial and temporal domains.
To bridge the gap between static visual representations and video-based action recognition tasks.

Main Methods:

Introducing the Motion Booster Module to extract and fuse motion descriptors with static representations via cross-attention.
Proposing a lightweight Channel-Aware Multi-Scale Temporal Modelling module for temporal reasoning.
Enriching a frozen image backbone with these novel components.

Main Results:

SDST surpasses State-of-the-Art efficient transfer learning methods on benchmarks like Something-Something V1&V2, Diving-48, and Kinetics-400.
The framework outperforms several fully fine-tuned approaches without additional complexity.
Demonstrated transferability to vision-language models like CLIP, yielding further performance gains.

Conclusions:

The proposed SDST framework offers a general and scalable solution for efficient video understanding.
It effectively addresses the limitations of existing efficient transfer learning methods.
Highlights the potential for improved action recognition in the era of large vision-language models.