Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evolutionary Tree for All Bumblebee Species World-Wide Estimated by Combining Information from Fast-Evolving Genes, Slow-Evolving Genes, and Genomic Data (Apidae, <i>Bombus</i>).

Insects·2026

Same author

Metal-Organic Framework as a Bioorthogonal Catalyst for Gene Editing.

Journal of the American Chemical Society·2026

Same author

Dietary niche partitioning and convergent gut microbiota in sympatric <i>Vespa</i>.

Frontiers in microbiology·2026

Same author

Marine Saliency Segmenter: Object-Focused Conditional Diffusion With Region-Level Semantic Knowledge Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Conformal graphene coatings on ordinary fabrics for wearable electronic devices.

Nature communications·2026

Same author

Stoichiometric properties of soil and microbial carbon, nitrogen, and phosphorus on the Jingpohu lava plateau.

Scientific reports·2026

Same journal

Benchmarking the Robustness of Autonomous Driving to Environmental Illusions: A Lane Perception Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 2, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Vision-Language Models for Vision Tasks: A Survey.

Jingyi Zhang, Jiaxing Huang, Sheng Jin

IEEE Transactions on Pattern Analysis and Machine Intelligence

|February 26, 2024

Summary

This summary is machine-generated.

Vision-Language Models (VLMs) offer a more efficient approach to visual recognition by learning from vast internet data. This review explores VLMs, their methods, and future directions for improved zero-shot visual recognition.

More Related Videos

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Related Experiment Videos

Last Updated: Jul 2, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Traditional visual recognition relies on labor-intensive, task-specific deep neural network (DNN) training with crowd-labeled data.
This paradigm is time-consuming and inefficient for the growing number of visual recognition tasks.
Emerging Vision-Language Models (VLMs) address these limitations by leveraging large-scale image-text data.

Purpose of the Study:

To provide a systematic review of Vision-Language Models (VLMs) for diverse visual recognition tasks.
To analyze VLM foundations, including architectures, pre-training objectives, and downstream applications.
To categorize and evaluate existing VLM pre-training, transfer learning, and knowledge distillation methods.

Main Methods:

Comprehensive literature review of VLM research for visual recognition.
Categorization of VLM pre-training strategies, transfer learning techniques, and knowledge distillation approaches.
Benchmarking and analysis of reviewed VLM methods using widely-adopted datasets.

Main Results:

VLMs learn rich vision-language correlations from web-scale image-text pairs.
A single VLM can enable zero-shot predictions across various visual recognition tasks.
This review categorizes and analyzes diverse VLM methodologies and their performance.

Conclusions:

Vision-Language Models represent a significant advancement over traditional DNN training for visual recognition.
VLMs offer efficient and versatile zero-shot capabilities, reducing reliance on task-specific labeling.
Future research directions include addressing current challenges and exploring novel VLM applications in visual recognition.