Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Yeast nuclei-mediated precise delivery of synthetic megabase-scale human DNA into mammalian embryos.

Nature protocols·2026

Same author

TFPI2 promotes NK cell-mediated glioblastoma killing through adhesion and checkpoint control.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Single and combined use of the platelet-lymphocyte ratio and neutrophil-lymphocyte ratio in hemorrhagic fever with renal syndrome.

Frontiers in cellular and infection microbiology·2026

Same author

Coarse Labels Matter: Revisiting the Role of Coarse-Grained Supervision in Fine-Grained Learning.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Integrating SAM Supervision for 3D Weakly Supervised Point Cloud Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Engineering carbon nanofiber-supported NiCo/CoNi<sub>2</sub>S<sub>4</sub> Mott-Schottky heterostructure with robust interfacial electric field for boosting oxygen evolution reaction kinetics.

Journal of colloid and interface science·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 8, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Depth and Video Segmentation Based Visual Attention for Embodied Question Answering.

Haonan Luo, Guosheng Lin, Yazhou Yao

IEEE Transactions on Pattern Analysis and Machine Intelligence

|January 4, 2022

Summary

This summary is machine-generated.

This study introduces a novel visual attention mechanism for Embodied Question Answering (EQA) agents, improving both navigation and answering accuracy in real-world environments. The new method enhances semantic understanding and spatial awareness for better robot assistant performance.

More Related Videos

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Related Experiment Videos

Last Updated: Oct 8, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Area of Science:

Artificial Intelligence
Robotics
Computer Vision

Background:

Embodied Question Answering (EQA) involves agents interacting with environments to answer questions.
Current EQA methods struggle with accuracy due to limited semantic and spatial information.
Applications include personal assistants and in-home robots.

Purpose of the Study:

To enhance the accuracy of Embodied Question Answering (EQA).
To address limitations in semantic understanding and spatial reasoning in existing EQA models.

Main Methods:

Proposed a depth and segmentation-based visual attention mechanism for EQA.
Introduced a high-speed video segmentation framework for local semantic feature extraction.
Developed a feature fusion strategy to guide navigator training.

Main Results:

The visual attention mechanism improved Visual Question Answering (VQA) performance.
Achieved significant overall accuracy improvements on House3D (4.9%) and Matterport3D (5.6%) datasets.
Demonstrated effective boosting of both VQA and navigation modules.

Conclusions:

The proposed method enhances EQA by integrating depth, segmentation, and visual attention.
The approach offers improved performance without substantial computational overhead.
This work advances the capabilities of intelligent agents in interactive environments.