Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Exploring the Stochastic Regularisation in Normalisation Layers for Semi-Supervised Learning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Synergistic Prompting for Complementarity and Consistency in Incomplete Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

X-Linked USP11 Drives Depression-Like Behaviors by Stabilizing CK2α and Disrupting Mitochondrial Function.

CNS neuroscience & therapeutics·2026
Same author

Paving the Way for Point Cloud Video Representation Learning Using a PDE Model.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Unilateral low-load blood flow restriction vs. high-load training in the Bulgarian split squat: a randomized within-subject design on strength, hypertrophy, and asymmetry.

Frontiers in physiology·2026
Same journal

Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

DA-Cal: Towards Cross-Domain Calibration in Semantic Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Multi-Dimensional Quality Assessment for Single-Image-to-3D Contents: Dataset and Model.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles
  1. Home
  2. Embodied Spatial Affordance: Spatial-aware Affordance Learning For Embodied Navigation And Manipulation.
  1. Home
  2. Embodied Spatial Affordance: Spatial-aware Affordance Learning For Embodied Navigation And Manipulation.

Related Experiment Video

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise
06:17

Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

Published on: January 26, 2024

Embodied Spatial Affordance: Spatial-Aware Affordance Learning for Embodied Navigation and Manipulation.

Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |June 4, 2026

    View abstract on PubMed

    Summary
    This summary is machine-generated.

    EspA, a new image-to-keypoint model, enhances embodied agents by precisely localizing object and free space affordances from 2D images. This spatial-aware learning model improves robotic navigation and manipulation tasks.

    More Related Videos

    Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind
    09:01

    Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

    Published on: March 27, 2013

    Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation
    20:12

    Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

    Published on: October 8, 2011

    Related Experiment Videos

    Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise
    06:17

    Assessing Human Spatial Navigation in a Virtual Space and its Sensitivity to Exercise

    Published on: January 26, 2024

    Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind
    09:01

    Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

    Published on: March 27, 2013

    Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation
    20:12

    Using MazeSuite and Functional Near Infrared Spectroscopy to Study Learning in Spatial Navigation

    Published on: October 8, 2011

    Area of Science:

    • Robotics and Artificial Intelligence
    • Computer Vision
    • Human-Robot Interaction

    Background:

    • Embodied agents require understanding spatial context and object affordances for navigation and manipulation.
    • Current Vision-Language Models (VLMs) struggle with precise spatial understanding and affordance localization from images, limiting their application in robotics.
    • Bridging the gap between high-level reasoning and low-level actionable commands is crucial for embodied AI.

    Purpose of the Study:

    • To introduce EspA, a novel image-to-keypoint model for spatial-aware affordance learning.
    • To enable precise pixel-level localization of both object and free space affordances directly from 2D image inputs.
    • To improve the translation of language instructions into actionable 3D coordinates for embodied agents.

    Main Methods:

    • Developed a hierarchical vision-language architecture for joint reasoning of object and free space affordances.
    • Introduced the Embodied Spatial Affordance (ESA) dataset with fine-grained annotations for embodied interactions.
    • Implemented an image-to-keypoint approach to predict affordance keypoints and project them into 3D space using depth information.

    Main Results:

    • EspA demonstrates superior performance in predicting object and free space affordances compared to state-of-the-art VLMs.
    • The model shows enhanced capabilities in real-world embodied navigation and manipulation tasks.
    • Achieved significant improvements in image-based spatial reasoning for embodied agents.

    Conclusions:

    • EspA provides a scalable solution for translating high-level instructions into low-level actionable affordances for embodied agents.
    • The proposed method advances embodied AI by enabling more robust and versatile interaction with physical environments.
    • The publicly available dataset and code will foster future research in embodied spatial understanding.