Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Nested Grover's Algorithm for Tree Search.

Entropy (Basel, Switzerland)·2026

Same author

Multi-Season Genome-Wide Association Study Reveals Loci and Candidate Genes for Fruit Quality and Maturity Traits in Peach.

Plants (Basel, Switzerland)·2026

Same author

Multilevel Data Representation for Training Deep Helmholtz Machines.

Neural computation·2025

Same author

Quantum Machine Learning-Quo Vadis?

Entropy (Basel, Switzerland)·2024

Same author

Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data?

Biological cybernetics·2024

Same author

Competitive learning to generate sparse representations for associative memory.

Neural networks : the official journal of the International Neural Network Society·2023

Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026

Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026

Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026

Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026

Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026

Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 25, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision

Maria Osório¹, Andreas Wichert²

¹Department of Computer Science and Engineering, INESC-ID and Instituto Superior Técnico, University of Lisbon, 2744-016 Porto Salvo, Portugal maria.osorio@tecnico.ulisboa.pt.

Neural Computation

|May 22, 2024

Summary

This summary is machine-generated.

Convolutional neural networks (CNNs) excel at pattern recognition but differ from human vision. Our study shows CNNs prioritize pixel data over core features like color, texture, and shape, necessitating more semantically aware models.

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Related Experiment Videos

Last Updated: Jun 25, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Convolutional Neural Networks (CNNs) achieve high accuracy in image recognition by learning from raw pixel data.
CNNs' pattern recognition differs from human visual perception, focusing on statistical correlations rather than object semantics.
Understanding these differences is crucial for developing more robust and human-like AI vision systems.

Purpose of the Study:

To investigate the differences between CNNs' visual feature extraction and human perception.
To determine if CNNs prioritize pixel-level correlations over fundamental visual features (color, texture, shape).
To highlight the need for AI models that learn object semantics for improved visual understanding.

Main Methods:

Isolating and individually inputting core visual features (color, texture, shape) into neural networks.
Experimenting on diverse benchmark datasets: Fruits 360, CIFAR-10, and Fashion MNIST.
Evaluating CNN performance on datasets with varying distributions (CIFAR-10 vs. Stanford Dogs) to assess generalization.

Main Results:

Classification accuracy varied depending on the dataset, indicating CNNs learn dataset-specific pixel correlations.
CNNs performed well when training and test data distributions were similar but struggled with distribution shifts.
CNNs showed poor performance on Stanford Dogs images compared to CIFAR-10, despite similar visual content, underscoring a lack of semantic understanding.

Conclusions:

Deep learning models, particularly CNNs, tend to learn superficial pixel-level patterns rather than fundamental visual features crucial for human cognition.
The disparity between CNN performance and human visual perception highlights limitations in current AI models' ability to grasp object semantics.
Developing specialized datasets and models that focus on semantic understanding is essential for advancing computer vision research towards human-like cognition.