Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A framework of digital biomarkers for neurodegenerative diseases.

Nature reviews bioengineering·2026

Same author

SocialGen: Modeling Multi-Human Social Interaction with Language Models.

Proceedings. International Conference on 3D Vision·2026

Same author

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same author

Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same author

LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation.

... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision·2026

Same author

Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation.

Advances in neural information processing systems·2026

Same journal

Self-Supervised Voxel-Level Representation Rediscovers Subcellular Structures in Volume Electron Microscopy.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2026

Same journal

Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025

Same journal

FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025

Same journal

Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025

Same journal

nnMobileNet: Rethinking CNN for Retinopathy Research.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2025

Same journal

Refining Biologically Inconsistent Segmentation Masks with Masked Autoencoders.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 7, 2026

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Published on: May 15, 2019

AdaVid: Adaptive Video-Language Pretraining.

Chaitanya Patel¹, Juan Carlos Niebles¹, Ehsan Adeli¹

¹Stanford University.

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops

|April 6, 2026

Summary

This summary is machine-generated.

AdaVid enables efficient video encoders for edge devices by dynamically adapting computation. This framework achieves competitive performance while significantly reducing computational demands for video-language tasks.

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Related Experiment Videos

Last Updated: Apr 7, 2026

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Exploring Infant Sensitivity to Visual Language using Eye Tracking and the Preferential Looking Paradigm

Published on: May 15, 2019

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Contrastive video-language pretraining yields robust video representations but faces deployment challenges on resource-constrained edge devices due to high computational costs.
Current models are limited to processing short video clips (4-64 frames), restricting their applicability to longer video analysis.

Purpose of the Study:

Introduce AdaVid, a flexible framework for learning efficient, adaptive video encoders.
Enable dynamic adjustment of computational footprint based on available resources for edge deployment.
Improve performance and efficiency for both short and long video understanding tasks.

Main Methods:

Developed an adaptive transformer block inspired by Matryoshka Representation Learning, allowing dynamic adjustment of hidden embedding dimensions at inference.
Proposed a lightweight hierarchical network to aggregate features from short clips for processing longer videos.
Trained AdaVid-EgoVLP on the Ego4D dataset for video-narration tasks and evaluated on Diving48 and other long video benchmarks.

Main Results:

AdaVid-EgoVLP matched standard EgoVLP performance on short video-language tasks using half the compute.
AdaVid outperformed EgoVLP with equal computational resources on short video benchmarks.
Demonstrated effective trade-offs between frame count and compute on the Diving48 benchmark, enabling more frames within limits.
Achieved a strong balance between compute efficiency and accuracy on long video benchmarks using the hierarchical network.

Conclusions:

AdaVid offers a flexible and efficient solution for deploying advanced video encoders on edge devices.
The adaptive transformer block and hierarchical network effectively manage computational resources for diverse video lengths and tasks.
AdaVid significantly advances the efficiency and applicability of video-language pretraining in real-world, resource-limited scenarios.