Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

631
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
631
Vision01:24

Vision

53.1K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
53.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Nested Grover's Algorithm for Tree Search.

Entropy (Basel, Switzerland)·2026
Same author

Multi-Season Genome-Wide Association Study Reveals Loci and Candidate Genes for Fruit Quality and Maturity Traits in Peach.

Plants (Basel, Switzerland)·2026
Same author

Multilevel Data Representation for Training Deep Helmholtz Machines.

Neural computation·2025
Same author

Quantum Machine Learning-Quo Vadis?

Entropy (Basel, Switzerland)·2024
Same author

Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data?

Biological cybernetics·2024
Same author

Competitive learning to generate sparse representations for associative memory.

Neural networks : the official journal of the International Neural Network Society·2023
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

Related Experiment Video

Updated: Jun 25, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision

Maria Osório1, Andreas Wichert2

  • 1Department of Computer Science and Engineering, INESC-ID and Instituto Superior Técnico, University of Lisbon, 2744-016 Porto Salvo, Portugal maria.osorio@tecnico.ulisboa.pt.

Neural Computation
|May 22, 2024
PubMed
Summary
This summary is machine-generated.

Convolutional neural networks (CNNs) excel at pattern recognition but differ from human vision. Our study shows CNNs prioritize pixel data over core features like color, texture, and shape, necessitating more semantically aware models.

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

520
Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

1.5K

Related Experiment Videos

Last Updated: Jun 25, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K
Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

520
Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

1.5K

Area of Science:

  • Computer Vision
  • Artificial Intelligence
  • Machine Learning

Background:

  • Convolutional Neural Networks (CNNs) achieve high accuracy in image recognition by learning from raw pixel data.
  • CNNs' pattern recognition differs from human visual perception, focusing on statistical correlations rather than object semantics.
  • Understanding these differences is crucial for developing more robust and human-like AI vision systems.

Purpose of the Study:

  • To investigate the differences between CNNs' visual feature extraction and human perception.
  • To determine if CNNs prioritize pixel-level correlations over fundamental visual features (color, texture, shape).
  • To highlight the need for AI models that learn object semantics for improved visual understanding.

Main Methods:

  • Isolating and individually inputting core visual features (color, texture, shape) into neural networks.
  • Experimenting on diverse benchmark datasets: Fruits 360, CIFAR-10, and Fashion MNIST.
  • Evaluating CNN performance on datasets with varying distributions (CIFAR-10 vs. Stanford Dogs) to assess generalization.

Main Results:

  • Classification accuracy varied depending on the dataset, indicating CNNs learn dataset-specific pixel correlations.
  • CNNs performed well when training and test data distributions were similar but struggled with distribution shifts.
  • CNNs showed poor performance on Stanford Dogs images compared to CIFAR-10, despite similar visual content, underscoring a lack of semantic understanding.

Conclusions:

  • Deep learning models, particularly CNNs, tend to learn superficial pixel-level patterns rather than fundamental visual features crucial for human cognition.
  • The disparity between CNN performance and human visual perception highlights limitations in current AI models' ability to grasp object semantics.
  • Developing specialized datasets and models that focus on semantic understanding is essential for advancing computer vision research towards human-like cognition.