Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Visual System

Visual System

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Scaling 3D Compositional Models for Robust Classification and Pose Estimation.

Proceedings. IEEE International Conference on Computer Vision·2026

Same author

A comprehensive survey of AI agents in healthcare.

Journal of biomedical informatics·2026

Same author

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More.

Proceedings of machine learning research·2026

Same author

Hyperplasia Functions as a Link between Obesity and Cancer.

Cancer research·2026

Same author

Mamba-Reg: Vision Mamba Also Needs Registers.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2025

Same author

Application of a computer vision algorithm to quantify the frequency and duration of children's microactivities in different play scenarios.

Journal of exposure science & environmental epidemiology·2025

Same journal

CARL: A Framework for Equivariant Image Registration.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same journal

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same journal

Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same journal

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same journal

Perceptual Inductive Bias Is What You Need Before Contrastive Learning.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

Same journal

MultiMorph: On-demand Atlas Construction.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 12, 2026

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency.

Feng Wang¹, Timing Yang¹, Yaodong Yu²

¹Johns Hopkins University.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

|November 3, 2025

Summary

This summary is machine-generated.

The Adventurer models treat images as sequences, using uni-directional language models for visual representation. This approach offers an efficient and accurate trade-off for high-resolution image processing.

More Related Videos

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

Related Experiment Videos

Last Updated: Jan 12, 2026

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

High-resolution and fine-grained images pose significant computational and memory challenges for existing models.
Current visual representation learning methods often struggle with scalability due to quadratic complexity.

Purpose of the Study:

Introduce the Adventurer series models for efficient visual representation learning.
Address the computational and memory limitations of processing high-resolution images.

Main Methods:

Treat images as sequences of patch tokens.
Employ uni-directional language models for visual representation learning.
Utilize a global pooling token and a flipping operation for seamless integration into causal inference frameworks.

Main Results:

Adventurer models achieve an optimal efficiency-accuracy trade-off compared to DeiT and Vim.
Adventurer-Base attained 84.3% test accuracy on ImageNet-1k with 216 images/s training throughput.
Demonstrated 3.8x and 6.2x faster training throughput than Vim and DeiT, respectively.

Conclusions:

The Adventurer architecture offers significant computation and memory efficiency.
Linear complexity allows for effective scaling with high-resolution and fine-grained images.
Potential to benefit future research in long sequence modeling for complex visual data.