Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Continual Learning: Forget-free Winning Subnetworks for Video Representations.

IEEE transactions on pattern analysis and machine intelligence·2025
Same author

READRetro: natural product biosynthesis predicting with retrieval-augmented dual-view retrosynthesis.

The New phytologist·2024
Same author

Ego4D: Around the World in 3,600 Hours of Egocentric Video.

IEEE transactions on pattern analysis and machine intelligence·2024
Same author

A domain-agnostic approach for characterization of lifelong learning systems.

Neural networks : the official journal of the International Neural Network Society·2023
Same author

Learning Spherical Convolution for 360<sup>°</sup> Recognition.

IEEE transactions on pattern analysis and machine intelligence·2021
Same author

Emergence of exploratory look-around behaviors through active observation completion.

Science robotics·2020
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: May 23, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Reading between the lines: object localization using implicit cues from image tags.

Sung Ju Hwang1, Kristen Grauman

  • 1Department of Computer Science, University of Texas at Austin, 1616 Guadalupe, Suite 2.408, Austin, TX 78701, USA. sjhwang@cs.utexas.edu

IEEE Transactions on Pattern Analysis and Machine Intelligence
|April 21, 2012
PubMed
Summary
This summary is machine-generated.

This study introduces a novel method for object localization in images by analyzing "unspoken" cues within image tags. This approach enhances detection accuracy and efficiency, even for untagged images, by mimicking human visual attention.

More Related Videos

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)
05:57

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Published on: April 8, 2019

Related Experiment Videos

Last Updated: May 23, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)
05:57

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster (Nephrops norvegicus)

Published on: April 8, 2019

Area of Science:

  • Computer Vision
  • Machine Learning
  • Artificial Intelligence

Background:

  • Current image tagging methods primarily use explicit noun-object links.
  • Implicit information within ordered image tags is often overlooked.
  • Object localization accuracy and efficiency remain challenges in computer vision.

Purpose of the Study:

  • To leverage implicit cues from ordered image tags for improved object localization.
  • To develop a method for enhancing object detection accuracy and efficiency.
  • To enable object detection in untagged images by learning a shared semantic space.

Main Methods:

  • Extraction of three novel implicit features from ordered image tags: mention order, scale constraints, and spatial proximity.
  • Learning a conditional density over localization parameters (position, scale) using these implicit cues.
  • Developing a technique to learn localization density in a shared semantic space for visual and tag-based features.

Main Results:

  • Demonstrated significant improvements in object localization accuracy and efficiency on PASCAL VOC, LabelMe, and Flickr datasets.
  • Outperformed traditional sliding window methods and visual context baselines.
  • Successfully applied the technique for object detection in untagged images.

Conclusions:

  • Implicit information in image tags can substantially enhance object localization.
  • The proposed method offers a robust approach to object detection, improving state-of-the-art performance.
  • Translating human viewing behavior insights into algorithms improves machine perception capabilities.