Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Language Development01:22

Language Development

1.1K
Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
1.1K
Improving Translational Accuracy02:07

Improving Translational Accuracy

15.6K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
15.6K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.8K
3.8K
Introduction to Learning01:18

Introduction to Learning

1.6K
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
1.6K
Associative Learning01:27

Associative Learning

1.9K
Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...
1.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A framework of digital biomarkers for neurodegenerative diseases.

Nature reviews bioengineering·2026
Same author

SocialGen: Modeling Multi-Human Social Interaction with Language Models.

Proceedings. International Conference on 3D Vision·2026
Same author

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same author

Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same author

LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation.

... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision·2026
Same author

Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation.

Advances in neural information processing systems·2026
Same journal

Self-Supervised Voxel-Level Representation Rediscovers Subcellular Structures in Volume Electron Microscopy.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2026
Same journal

Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025
Same journal

FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025
Same journal

Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025
Same journal

nnMobileNet: Rethinking CNN for Retinopathy Research.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025
Same journal

Refining Biologically Inconsistent Segmentation Masks with Masked Autoencoders.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2024
See all related articles

Related Experiment Video

Updated: Apr 7, 2026

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm
06:07

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Published on: May 15, 2019

9.2K

AdaVid: Adaptive Video-Language Pretraining.

Chaitanya Patel1, Juan Carlos Niebles1, Ehsan Adeli1

  • 1Stanford University.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops
|April 6, 2026
PubMed
Summary
This summary is machine-generated.

AdaVid enables efficient video encoders for edge devices by dynamically adapting computation. This framework achieves competitive performance while significantly reducing computational demands for video-language tasks.

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K

Related Experiment Videos

Last Updated: Apr 7, 2026

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm
06:07

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Published on: May 15, 2019

9.2K
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K

Area of Science:

  • Computer Vision
  • Artificial Intelligence
  • Machine Learning

Background:

  • Contrastive video-language pretraining yields robust video representations but faces deployment challenges on resource-constrained edge devices due to high computational costs.
  • Current models are limited to processing short video clips (4-64 frames), restricting their applicability to longer video analysis.

Purpose of the Study:

  • Introduce AdaVid, a flexible framework for learning efficient, adaptive video encoders.
  • Enable dynamic adjustment of computational footprint based on available resources for edge deployment.
  • Improve performance and efficiency for both short and long video understanding tasks.

Main Methods:

  • Developed an adaptive transformer block inspired by Matryoshka Representation Learning, allowing dynamic adjustment of hidden embedding dimensions at inference.
  • Proposed a lightweight hierarchical network to aggregate features from short clips for processing longer videos.
  • Trained AdaVid-EgoVLP on the Ego4D dataset for video-narration tasks and evaluated on Diving48 and other long video benchmarks.

Main Results:

  • AdaVid-EgoVLP matched standard EgoVLP performance on short video-language tasks using half the compute.
  • AdaVid outperformed EgoVLP with equal computational resources on short video benchmarks.
  • Demonstrated effective trade-offs between frame count and compute on the Diving48 benchmark, enabling more frames within limits.
  • Achieved a strong balance between compute efficiency and accuracy on long video benchmarks using the hierarchical network.

Conclusions:

  • AdaVid offers a flexible and efficient solution for deploying advanced video encoders on edge devices.
  • The adaptive transformer block and hierarchical network effectively manage computational resources for diverse video lengths and tasks.
  • AdaVid significantly advances the efficiency and applicability of video-language pretraining in real-world, resource-limited scenarios.