Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Continual Learning: Forget-free Winning Subnetworks for Video Representations.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

READRetro: natural product biosynthesis predicting with retrieval-augmented dual-view retrosynthesis.

The New phytologist·2024

Same author

Ego4D: Around the World in 3,600 Hours of Egocentric Video.

IEEE transactions on pattern analysis and machine intelligence·2024

Same author

A domain-agnostic approach for characterization of lifelong learning systems.

Neural networks : the official journal of the International Neural Network Society·2023

Same author

Learning Spherical Convolution for 360<sup>°</sup> Recognition.

IEEE transactions on pattern analysis and machine intelligence·2021

Same author

Emergence of exploratory look-around behaviors through active observation completion.

Science robotics·2020

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 23, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Reading between the lines: object localization using implicit cues from image tags.

Sung Ju Hwang¹, Kristen Grauman

¹Department of Computer Science, University of Texas at Austin, 1616 Guadalupe, Suite 2.408, Austin, TX 78701, USA. sjhwang@cs.utexas.edu

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 21, 2012

Summary

This summary is machine-generated.

This study introduces a novel method for object localization in images by analyzing "unspoken" cues within image tags. This approach enhances detection accuracy and efficiency, even for untagged images, by mimicking human visual attention.

More Related Videos

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Published on: April 8, 2019

Related Experiment Videos

Last Updated: May 23, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Published on: April 8, 2019

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Current image tagging methods primarily use explicit noun-object links.
Implicit information within ordered image tags is often overlooked.
Object localization accuracy and efficiency remain challenges in computer vision.

Purpose of the Study:

To leverage implicit cues from ordered image tags for improved object localization.
To develop a method for enhancing object detection accuracy and efficiency.
To enable object detection in untagged images by learning a shared semantic space.

Main Methods:

Extraction of three novel implicit features from ordered image tags: mention order, scale constraints, and spatial proximity.
Learning a conditional density over localization parameters (position, scale) using these implicit cues.
Developing a technique to learn localization density in a shared semantic space for visual and tag-based features.

Main Results:

Demonstrated significant improvements in object localization accuracy and efficiency on PASCAL VOC, LabelMe, and Flickr datasets.
Outperformed traditional sliding window methods and visual context baselines.
Successfully applied the technique for object detection in untagged images.

Conclusions:

Implicit information in image tags can substantially enhance object localization.
The proposed method offers a robust approach to object detection, improving state-of-the-art performance.
Translating human viewing behavior insights into algorithms improves machine perception capabilities.