Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

791
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
791
Introduction to Learning01:18

Introduction to Learning

895
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
895
Deconvolution01:20

Deconvolution

524
Deconvolution, also known as inverse filtering, is the process of extracting the impulse response from known input and output signals. This technique is vital in scenarios where the system's characteristics are unknown, and they must be inferred from the observable signals.
Deconvolution involves several mathematical techniques to derive the impulse response. One common approach is polynomial division. In this method, the input and output sequences are treated as coefficients of...
524
Vision01:24

Vision

59.2K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
59.2K
Transformation01:26

Transformation

644
Microbial communities are dynamic environments where cell lysis releases free DNA into the surroundings. Other cells can take up this extracellular DNA through a process known as transformation.When a cell incorporates this foreign DNA into its genome, resulting in genetic modification, the process is known as transformation. Cells capable of this process are termed competent. Competence can be natural, as observed in certain bacteria and archaea, or artificially induced in the...
644
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Privacy-Enhanced Vertical Federated Learning for Healthcare via Directional Noise and Subset Representations.

IEEE journal of biomedical and health informatics·2026
Same author

Selection of proper artificial intelligence techniques developed for CT scan image analysis of liver cancer using fuzzy AHP-TOPSIS.

BMC medical imaging·2026
Same author

Identification and functional characterization of a novel mutation in the NEUROD1 gene in a Chinese family with maturity-onset diabetes of the young.

Acta diabetologica·2026
Same author

Paving the Way for Point Cloud Video Representation Learning Using a PDE Model.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

MFDA-UNet: Medical Image Segmentation with Frequency-Decoupled Representation and Gated Cross-Scale Integration.

Sensors (Basel, Switzerland)·2026
Same author

Analysis of Long-Term and Short-Term Efficacy of Different Types of Tympanosclerosis Surgery Under Total Otoendoscopy.

The Annals of otology, rhinology, and laryngology·2026
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Videos

Interactive image-to-video transfer learning.

Cong Wu1, Tianyang Xu2, Zhenhua Feng2

  • 1School of Artificial Intelligence and Computer Science, Jiangnan University, 214122, China; Postdoctoral Research Station in Design, Jiangnan University, 214122, China.

Neural Networks : the Official Journal of the International Neural Network Society
|December 10, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces an efficient image-to-video transfer learning framework, SDST, that enhances static and dynamic cue interaction for action recognition. The method improves video understanding without extensive fine-tuning, outperforming existing techniques.

Keywords:
Action recognitionDynamic-static interaction modellingImage-to-video transfer learningSpatial-temporal interaction modelling

Related Experiment Videos

Area of Science:

  • Computer Vision
  • Machine Learning
  • Artificial Intelligence

Background:

  • Transfer learning from image to video is crucial for action recognition.
  • Current methods involve extensive fine-tuning, leading to high computational costs.
  • Efficient methods often neglect effective video reasoning with frozen image backbones.

Purpose of the Study:

  • To propose an efficient image-to-video transfer learning framework (SDST) for action recognition.
  • To enhance the interaction between static and dynamic cues, and spatial and temporal domains.
  • To bridge the gap between static visual representations and video-based action recognition tasks.

Main Methods:

  • Introducing the Motion Booster Module to extract and fuse motion descriptors with static representations via cross-attention.
  • Proposing a lightweight Channel-Aware Multi-Scale Temporal Modelling module for temporal reasoning.
  • Enriching a frozen image backbone with these novel components.

Main Results:

  • SDST surpasses State-of-the-Art efficient transfer learning methods on benchmarks like Something-Something V1&V2, Diving-48, and Kinetics-400.
  • The framework outperforms several fully fine-tuned approaches without additional complexity.
  • Demonstrated transferability to vision-language models like CLIP, yielding further performance gains.

Conclusions:

  • The proposed SDST framework offers a general and scalable solution for efficient video understanding.
  • It effectively addresses the limitations of existing efficient transfer learning methods.
  • Highlights the potential for improved action recognition in the era of large vision-language models.