Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Exploring the Stochastic Regularisation in Normalisation Layers for Semi-Supervised Learning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Synergistic Prompting for Complementarity and Consistency in Incomplete Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

X-Linked USP11 Drives Depression-Like Behaviors by Stabilizing CK2α and Disrupting Mitochondrial Function.

CNS neuroscience & therapeutics·2026

Same author

Paving the Way for Point Cloud Video Representation Learning Using a PDE Model.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Unilateral low-load blood flow restriction vs. high-load training in the Bulgarian split squat: a randomized within-subject design on strength, hypertrophy, and asymmetry.

Frontiers in physiology·2026

Same journal

Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

DA-Cal: Towards Cross-Domain Calibration in Semantic Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Multi-Dimensional Quality Assessment for Single-Image-to-3D Contents: Dataset and Model.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Home
Embodied Spatial Affordance: Spatial-aware Affordance Learning For Embodied Navigation And Manipulation.

Home
Embodied Spatial Affordance: Spatial-aware Affordance Learning For Embodied Navigation And Manipulation.

Related Experiment Video

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

Published on: January 26, 2024

Embodied Spatial Affordance: Spatial-Aware Affordance Learning for Embodied Navigation and Manipulation.

Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|June 4, 2026

View abstract on PubMed

Summary

This summary is machine-generated.

EspA, a new image-to-keypoint model, enhances embodied agents by precisely localizing object and free space affordances from 2D images. This spatial-aware learning model improves robotic navigation and manipulation tasks.

More Related Videos

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

Published on: October 8, 2011

Related Experiment Videos

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

Published on: January 26, 2024

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

Published on: October 8, 2011

Area of Science:

Robotics and Artificial Intelligence
Computer Vision
Human-Robot Interaction

Background:

Embodied agents require understanding spatial context and object affordances for navigation and manipulation.
Current Vision-Language Models (VLMs) struggle with precise spatial understanding and affordance localization from images, limiting their application in robotics.
Bridging the gap between high-level reasoning and low-level actionable commands is crucial for embodied AI.

Purpose of the Study:

To introduce EspA, a novel image-to-keypoint model for spatial-aware affordance learning.
To enable precise pixel-level localization of both object and free space affordances directly from 2D image inputs.
To improve the translation of language instructions into actionable 3D coordinates for embodied agents.

Main Methods:

Developed a hierarchical vision-language architecture for joint reasoning of object and free space affordances.
Introduced the Embodied Spatial Affordance (ESA) dataset with fine-grained annotations for embodied interactions.
Implemented an image-to-keypoint approach to predict affordance keypoints and project them into 3D space using depth information.

Main Results:

EspA demonstrates superior performance in predicting object and free space affordances compared to state-of-the-art VLMs.
The model shows enhanced capabilities in real-world embodied navigation and manipulation tasks.
Achieved significant improvements in image-based spatial reasoning for embodied agents.

Conclusions:

EspA provides a scalable solution for translating high-level instructions into low-level actionable affordances for embodied agents.
The proposed method advances embodied AI by enabling more robust and versatile interaction with physical environments.
The publicly available dataset and code will foster future research in embodied spatial understanding.