Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Vision01:24

Vision

59.3K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
59.3K
Visual System01:26

Visual System

1.6K
Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...
1.6K
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

1.8K
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
1.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Scaling 3D Compositional Models for Robust Classification and Pose Estimation.

Proceedings. IEEE International Conference on Computer Vision·2026
Same author

A comprehensive survey of AI agents in healthcare.

Journal of biomedical informatics·2026
Same author

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More.

Proceedings of machine learning research·2026
Same author

Hyperplasia Functions as a Link between Obesity and Cancer.

Cancer research·2026
Same author

Mamba-Reg: Vision Mamba Also Needs Registers.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2025
Same author

Application of a computer vision algorithm to quantify the frequency and duration of children's microactivities in different play scenarios.

Journal of exposure science & environmental epidemiology·2025
Same journal

CARL: A Framework for Equivariant Image Registration.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same journal

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same journal

Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same journal

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same journal

Perceptual Inductive Bias Is What You Need Before Contrastive Learning.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
Same journal

MultiMorph: On-demand Atlas Construction.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition·2026
See all related articles

Related Experiment Video

Updated: Jan 12, 2026

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

2.3K

Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency.

Feng Wang1, Timing Yang1, Yaodong Yu2

  • 1Johns Hopkins University.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
|November 3, 2025
PubMed
Summary
This summary is machine-generated.

The Adventurer models treat images as sequences, using uni-directional language models for visual representation. This approach offers an efficient and accurate trade-off for high-resolution image processing.

More Related Videos

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control
05:47

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

410

Related Experiment Videos

Last Updated: Jan 12, 2026

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

2.3K
Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control
05:47

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

410

Area of Science:

  • Computer Vision
  • Machine Learning
  • Artificial Intelligence

Background:

  • High-resolution and fine-grained images pose significant computational and memory challenges for existing models.
  • Current visual representation learning methods often struggle with scalability due to quadratic complexity.

Purpose of the Study:

  • Introduce the Adventurer series models for efficient visual representation learning.
  • Address the computational and memory limitations of processing high-resolution images.

Main Methods:

  • Treat images as sequences of patch tokens.
  • Employ uni-directional language models for visual representation learning.
  • Utilize a global pooling token and a flipping operation for seamless integration into causal inference frameworks.

Main Results:

  • Adventurer models achieve an optimal efficiency-accuracy trade-off compared to DeiT and Vim.
  • Adventurer-Base attained 84.3% test accuracy on ImageNet-1k with 216 images/s training throughput.
  • Demonstrated 3.8x and 6.2x faster training throughput than Vim and DeiT, respectively.

Conclusions:

  • The Adventurer architecture offers significant computation and memory efficiency.
  • Linear complexity allows for effective scaling with high-resolution and fine-grained images.
  • Potential to benefit future research in long sequence modeling for complex visual data.