Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Effects of dietary octapeptin supplementation on growth performance, intestinal morphology, immune function, and serum metabolism of weaned piglets.

Journal of animal science·2026

Same author

Synergistic anion-cation descriptor for bidirectional electrocatalyst in Li-CO<sub>2</sub> battery.

Science advances·2026

Same author

Correction: Hu et al. Dietary Net Energy Concentration Affects Growth Performance, Carcass Traits, Intramuscular Fatty Acid Profile, and Cecal Microbiota of Pigs with Restricted Feed Allowance. <i>Animals</i> 2025, <i>15</i>, 3514.

Animals : an open access journal from MDPI·2026

Same author

AniFeats: Animate 3D Feature Meshes for Character Video Generation.

IEEE transactions on visualization and computer graphics·2026

Same author

[Construction of a ubiquitination-related gene prognostic model for breast cancer based on taxane treatment response].

Zhong nan da xue xue bao. Yi xue ban = Journal of Central South University. Medical sciences·2026

Same author

Light-Induced Charge Order Mode in a Metastable Cuprate Ladder.

Physical review letters·2026

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 12, 2025

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

MetaFormer Baselines for Vision.

Weihao Yu, Chenyang Si, Pan Zhou

IEEE Transactions on Pattern Analysis and Machine Intelligence

|November 1, 2023

Summary

This summary is machine-generated.

MetaFormer architecture achieves strong performance with basic token mixers, demonstrating its robust foundation. Even simple mixers like identity mapping or random matrices yield high accuracy, with advanced models like CAFormer setting new records.

More Related Videos

Assessing Binocular Central Visual Field and Binocular Eye Movements in a Dichoptic Viewing Condition

Assessing Binocular Central Visual Field and Binocular Eye Movements in a Dichoptic Viewing Condition

Published on: July 21, 2020

Quantification of Oculomotor Responses and Accommodation Through Instrumentation and Analysis Toolboxes

Quantification of Oculomotor Responses and Accommodation Through Instrumentation and Analysis Toolboxes

Published on: March 3, 2023

Related Experiment Videos

Last Updated: Jul 12, 2025

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Assessing Binocular Central Visual Field and Binocular Eye Movements in a Dichoptic Viewing Condition

Assessing Binocular Central Visual Field and Binocular Eye Movements in a Dichoptic Viewing Condition

Published on: July 21, 2020

Quantification of Oculomotor Responses and Accommodation Through Instrumentation and Analysis Toolboxes

Quantification of Oculomotor Responses and Accommodation Through Instrumentation and Analysis Toolboxes

Published on: March 3, 2023

Area of Science:

Computer Vision
Deep Learning Architectures
Neural Network Design

Background:

The MetaFormer architecture, an abstraction of the Transformer, has shown promise in achieving competitive performance.
Previous research focused on token mixer designs within these architectures.
This study shifts focus to the inherent capacity of the MetaFormer framework itself.

Purpose of the Study:

To explore the performance potential of the MetaFormer architecture independent of complex token mixer designs.
To validate MetaFormer's effectiveness with basic and conventional token mixers.
To introduce and evaluate a novel activation function, StarReLU.

Main Methods:

Developed baseline MetaFormer models using simple token mixers: identity mapping (IdentityFormer) and random matrices (RandFormer).
Implemented ConvFormer using depthwise separable convolutions, comparing it to ConvNeXt.
Created CAFormer by combining depthwise separable convolutions and self-attention.
Introduced and tested StarReLU, a variant of Squared ReLU, as an activation function.

Main Results:

IdentityFormer achieved over 80% accuracy on ImageNet-1K, establishing a solid performance baseline.
RandFormer surpassed IdentityFormer with over 81% accuracy, showcasing MetaFormer's compatibility with arbitrary mixers.
ConvFormer outperformed ConvNeXt, and CAFormer achieved a new record of 85.5% accuracy on ImageNet-1K.
StarReLU reduced activation FLOPs by 71% compared to GELU while improving performance.

Conclusions:

MetaFormer provides a robust framework that ensures a high performance lower bound, even with minimal token mixers.
The architecture demonstrates versatility, performing well with diverse and even random token mixers.
MetaFormer, when combined with conventional or novel components like StarReLU, can achieve state-of-the-art results efficiently.