Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Color Vision

Color Vision

Color perception begins in the retina, the light-sensitive layer at the back of the eye. Two main theories explain how colors are seen: the trichromatic theory and the opponent-process theory. The trichromatic theory, proposed by Thomas Young in 1802 and extended by Hermann von Helmholtz in 1852, suggests that color vision is based on three types of cone receptors in the retina. These cones are sensitive to different but overlapping ranges of wavelengths corresponding to red, blue, and green.

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Visual System

Visual System

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...

Photoreceptors and Visual Pathways

Photoreceptors and Visual Pathways

At the molecular level, visual signals trigger transformations in photopigment molecules, resulting in changes in the photoreceptor cell's membrane potential. The photon's energy level is denoted by its wavelength, with each specific wavelength of visible light associated with a distinct color. The spectral range of visible light, classified as electromagnetic radiation, spans from 380 to 720 nm. Electromagnetic radiation wavelengths exceeding 720 nm fall under the infrared category,...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Anatomy of the Eyeball

Anatomy of the Eyeball

The eye is a spherical, hollow structure composed of three tissue layers. The outer layer — the fibrous tunic, comprises the sclera — a white structure — and the cornea, which is transparent. The sclera encompasses some of the ocular surface, most of which is not visible. However, the 'white of the eye' is distinctively visible in humans compared to other species. The cornea, a clear covering at the front of the eye, enables light penetration. The eye's middle...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Advancing Causal Intervention in Image Captioning With Causal Prompt.

IEEE transactions on neural networks and learning systems·2025

Same author

Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition.

IEEE transactions on pattern analysis and machine intelligence·2024

Same author

Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory.

IEEE transactions on neural networks and learning systems·2023

Same author

Deep learning-based classification system of bacterial keratitis and fungal keratitis using anterior segment images.

Frontiers in medicine·2023

Same author

Stereoscopic Vision Recalling Memory for Monocular 3D Object Detection.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2023

Same author

Advancing Adversarial Training by Injecting Booster Signal.

IEEE transactions on neural networks and learning systems·2023

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Multi-Branch Tree-based Fusion Neural Architecture Search with Zero-Cost Screen for Multi-Modal Classification.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 11, 2026

Visualizing Visual Adaptation

Visualizing Visual Adaptation

Published on: April 24, 2017

A Causal Lens on Non-RGB Vision Sensor Understanding in Vision-Language Models.

Youngjoon Yu, Yong Man Ro

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|April 9, 2026

Summary

This summary is machine-generated.

Vision-Language Models (VLMs) struggle with non-RGB data due to bias. A new benchmark and causal framework improve VLMs' understanding of thermal, depth, and X-ray sensors.

More Related Videos

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Lensless Fluorescent Microscopy on a Chip

Lensless Fluorescent Microscopy on a Chip

Published on: August 17, 2011

Related Experiment Videos

Last Updated: Apr 11, 2026

Visualizing Visual Adaptation

Visualizing Visual Adaptation

Published on: April 24, 2017

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Lensless Fluorescent Microscopy on a Chip

Lensless Fluorescent Microscopy on a Chip

Published on: August 17, 2011

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Vision-Language Models (VLMs) excel with RGB images but fail with non-RGB sensor data (thermal, depth, hyperspectral, X-ray).
This failure is due to an RGB-centric bias, causing VLMs to misinterpret unique physical properties of non-RGB modalities.

Purpose of the Study:

To systematically evaluate and address the RGB-centric bias in VLMs using non-RGB sensor data.
To introduce a novel benchmark suite, CausalSense, and a causal learning framework to mitigate this bias.

Main Methods:

Developed CausalSense, a benchmark suite for evaluating VLM bias on non-RGB data.
Designed a causal learning framework using confounder dictionaries and backdoor adjustments.
Integrated sensor-specific knowledge into VLMs without extensive retraining.

Main Results:

State-of-the-art VLMs show significant performance deficits with non-RGB sensor comprehension.
The proposed causal deconfounded cross-modal encoder substantially improved VLM reasoning about physical attributes.
A measurable reduction in the performance gap was achieved.

Conclusions:

Current VLMs exhibit a critical RGB-centric bias limiting their use with diverse sensor data.
The CausalSense benchmark and causal framework enable more resilient, sensor-aware VLMs.
This research facilitates VLM interpretation of phenomena beyond the visible spectrum.