Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Perception

Perception

Perception is a fundamental psychological process that enables individuals to organize, interpret, and consciously experience sensory information. This process is crucial for understanding and interacting with the world around us. It includes both bottom-up and top-down processing, each playing a distinct role in how we perceive our environment.
Bottom-up processing begins at the sensory level, where receptors detect external environmental stimuli. These could include the tactile sensation of...

Modeling and Similitude

Modeling and Similitude

Scaled modeling is a fundamental technique in engineering, enabling the study of large and complex systems by creating smaller, manageable replicas that recreate critical characteristics of the original. In hydrology and civil infrastructure, for example, scaled models of dams help analyze water flow, turbulence, and pressure. This method allows for accurate predictions of real-world behavior within a controlled environment, significantly reducing the cost and time involved in full-scale...

Gestalt Principles of Perception

Gestalt Principles of Perception

Gestalt principles provide a framework for understanding how humans perceive objects as unified wholes within their context. These principles are essential in explaining the cognitive processes that make sense of complex visual stimuli by organizing them into coherent groups. One fundamental principle is proximity, which posits that objects located close to each other are perceived as a collective group. For instance, when dots are positioned near one another, the visual system interprets them...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Event-triggered fuzzy logic control for an uncertain robot with coupled output constraints.

ISA transactions·2026

Same author

Window-to-window BEV representation learning for limited FoV cross-view geo-localization.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Nash Equilibrium Strategies for Multicluster Pursuit-Evasion Game With Disturbances: A Prescribed-Time Convergence Approach.

IEEE transactions on cybernetics·2026

Same author

Practical Prescribed-Time Cooperative Path Following of Underactuated Multi-ASVs Without Velocity Measurements via Intermittent Control.

IEEE transactions on cybernetics·2026

Same author

A modern look at simplicity bias in image classification tasks.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Adaptive performance enhancement control for flexible-joint manipulator with model uncertainties and actuator failures.

ISA transactions·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 30, 2026

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination.

Teng Wang, Xinxin Zhao, Wenzhe Cai

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 28, 2026

Summary

This summary is machine-generated.

Autonomous robots can now navigate complex environments without maps using Vision-Language Models (VLMs). ImagineNav++ uses imagined future views for efficient robot navigation and planning, achieving state-of-the-art results.

More Related Videos

Author Spotlight: Enhancing Neurorehabilitation Through EEG, Motor Imagery, and Virtual Reality

Author Spotlight: Enhancing Neurorehabilitation Through EEG, Motor Imagery, and Virtual Reality

Published on: May 10, 2024

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Related Experiment Videos

Last Updated: Apr 30, 2026

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

Author Spotlight: Enhancing Neurorehabilitation Through EEG, Motor Imagery, and Virtual Reality

Author Spotlight: Enhancing Neurorehabilitation Through EEG, Motor Imagery, and Virtual Reality

Published on: May 10, 2024

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Area of Science:

Robotics
Artificial Intelligence
Computer Vision

Background:

Visual navigation is crucial for autonomous robots, especially in home-assistance tasks like object search.
Current Large Language Models (LLMs) struggle with spatial reasoning due to limitations in textual representations for navigation.
There's a need for methods that can effectively use visual data for robot navigation and planning.

Purpose of the Study:

To investigate the potential of Vision-Language Models (VLMs) for mapless visual navigation using only onboard RGB/RGB-D streams.
To develop an imagination-powered navigation framework that enhances spatial perception and planning capabilities.
To overcome the limitations of text-based planning in current LLMs for robot navigation.

Main Methods:

Developed ImagineNav++, an imagination-powered navigation framework for robots.
Introduced a future-view imagination module to generate high-exploration potential viewpoints.
Implemented a selective foveation memory mechanism for hierarchical integration of keyframe observations.
Transformed complex navigation into a best-view image selection problem for VLMs.

Main Results:

ImagineNav++ achieved state-of-the-art performance in mapless visual navigation.
The framework surpassed most map-based methods in open-vocabulary object and instance navigation benchmarks.
Demonstrated the effectiveness of scene imagination and memory in VLM-based spatial reasoning.

Conclusions:

VLMs can achieve effective mapless visual navigation by leveraging imagined future views and robust memory mechanisms.
ImagineNav++ offers a promising direction for enhancing robot autonomy and task execution in complex environments.
Scene imagination and memory are critical components for advanced VLM-based spatial reasoning in robotics.