Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evolutionary Tree for All Bumblebee Species World-Wide Estimated by Combining Information from Fast-Evolving Genes, Slow-Evolving Genes, and Genomic Data (Apidae, <i>Bombus</i>).

Insects·2026
Same author

Metal-Organic Framework as a Bioorthogonal Catalyst for Gene Editing.

Journal of the American Chemical Society·2026
Same author

Dietary niche partitioning and convergent gut microbiota in sympatric <i>Vespa</i>.

Frontiers in microbiology·2026
Same author

Marine Saliency Segmenter: Object-Focused Conditional Diffusion With Region-Level Semantic Knowledge Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Conformal graphene coatings on ordinary fabrics for wearable electronic devices.

Nature communications·2026
Same author

Stoichiometric properties of soil and microbial carbon, nitrogen, and phosphorus on the Jingpohu lava plateau.

Scientific reports·2026
Same journal

Benchmarking the Robustness of Autonomous Driving to Environmental Illusions: A Lane Perception Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jul 2, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
09:27

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

10.0K

Vision-Language Models for Vision Tasks: A Survey.

Jingyi Zhang, Jiaxing Huang, Sheng Jin

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |February 26, 2024
    PubMed
    Summary
    This summary is machine-generated.

    Vision-Language Models (VLMs) offer a more efficient approach to visual recognition by learning from vast internet data. This review explores VLMs, their methods, and future directions for improved zero-shot visual recognition.

    More Related Videos

    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
    07:12

    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

    Published on: April 11, 2025

    344
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.7K

    Related Experiment Videos

    Last Updated: Jul 2, 2025

    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
    09:27

    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

    Published on: October 13, 2018

    10.0K
    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
    07:12

    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

    Published on: April 11, 2025

    344
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.7K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Traditional visual recognition relies on labor-intensive, task-specific deep neural network (DNN) training with crowd-labeled data.
    • This paradigm is time-consuming and inefficient for the growing number of visual recognition tasks.
    • Emerging Vision-Language Models (VLMs) address these limitations by leveraging large-scale image-text data.

    Purpose of the Study:

    • To provide a systematic review of Vision-Language Models (VLMs) for diverse visual recognition tasks.
    • To analyze VLM foundations, including architectures, pre-training objectives, and downstream applications.
    • To categorize and evaluate existing VLM pre-training, transfer learning, and knowledge distillation methods.

    Main Methods:

    • Comprehensive literature review of VLM research for visual recognition.
    • Categorization of VLM pre-training strategies, transfer learning techniques, and knowledge distillation approaches.
    • Benchmarking and analysis of reviewed VLM methods using widely-adopted datasets.

    Main Results:

    • VLMs learn rich vision-language correlations from web-scale image-text pairs.
    • A single VLM can enable zero-shot predictions across various visual recognition tasks.
    • This review categorizes and analyzes diverse VLM methodologies and their performance.

    Conclusions:

    • Vision-Language Models represent a significant advancement over traditional DNN training for visual recognition.
    • VLMs offer efficient and versatile zero-shot capabilities, reducing reliance on task-specific labeling.
    • Future research directions include addressing current challenges and exploring novel VLM applications in visual recognition.