A large-scale examination of inductive biases shaping high-level visual representation in brains and machines

Affiliations
  • 1Department of Psychology, Harvard University, Cambridge, MA, USA. conwell@g.harvard.edu.
  • 2Department of Psychology, Harvard University, Cambridge, MA, USA.
  • 3Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA.
  • 4Department of Psychology, Harvard University, Cambridge, MA, USA. tkonkle@fas.harvard.edu.
  • 5Center for Brain Science, Harvard University, Cambridge, MA, USA. tkonkle@fas.harvard.edu.
  • 6Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA. tkonkle@fas.harvard.edu.

Published on:

Abstract

The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity – a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations – suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.

Related Concept Videos

JoVE Research Video for Introduction to Cognitive Psychology 01:20

133

Cognitive psychology is the field of psychology dedicated to examining how people think. It attempts to explain how and why we think the way we do by studying the interactions among human thinking, emotion, creativity, language, and problem-solving, as well as other cognitive processes. Cognitive psychology studies how information is processed and manipulated in remembering, thinking, and knowing.
This field emerged in the mid-20th century, following a period dominated by behaviorism, which…

JoVE Research Video for Parallel Processing 01:20

54

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and…

JoVE Research Video for Visual System 01:26

225

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a…

JoVE Research Video for Vision 01:24

50.8K

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Light is absorbed by the rod and cone…