Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Visual System

Visual System

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Gestalt Principles of Perception

Gestalt Principles of Perception

Gestalt principles provide a framework for understanding how humans perceive objects as unified wholes within their context. These principles are essential in explaining the cognitive processes that make sense of complex visual stimuli by organizing them into coherent groups. One fundamental principle is proximity, which posits that objects located close to each other are perceived as a collective group. For instance, when dots are positioned near one another, the visual system interprets them...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Four decades of circumpolar super-resolved satellite land surface temperature data.

Scientific data·2026

Same author

The stroke risk gene Foxf2 maintains brain endothelial cell function via Tie2 signaling.

Nature neuroscience·2025

Same author

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

A large-scale dataset for training deep learning segmentation and tracking of extreme weather.

Scientific data·2025

Same author

A cis-regulatory element controls expression of histone deacetylase 9 to fine-tune inflammasome-dependent chronic inflammation in atherosclerosis.

Immunity·2025

Same author

Nanocarrier imaging at single-cell resolution across entire mouse bodies with deep learning.

Nature biotechnology·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 19, 2026

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Monocular visual scene understanding: understanding multi-object traffic scenes.

Christian Wojek¹, Stefan Walk, Stefan Roth

¹Max Planck Institute for Informatics, Campus E1 4, 66123 Saarbrücken, Germany. cwojek@mpi-inf.mpg.de

IEEE Transactions on Pattern Analysis and Machine Intelligence

|August 15, 2012

Summary

This summary is machine-generated.

This study introduces a probabilistic 3D scene model for advanced computer vision, enabling robust multi-object tracking even with occlusions. The model significantly improves 3D tracking accuracy for people, cars, and trucks using monocular video.

More Related Videos

Investigating Object Representations in the Macaque Dorsal Visual Stream Using Single-unit Recordings

Investigating Object Representations in the Macaque Dorsal Visual Stream Using Single-unit Recordings

Published on: August 1, 2018

Related Experiment Videos

Last Updated: May 19, 2026

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Investigating Object Representations in the Macaque Dorsal Visual Stream Using Single-unit Recordings

Investigating Object Representations in the Macaque Dorsal Visual Stream Using Single-unit Recordings

Published on: August 1, 2018

Area of Science:

Computer Vision
Robotics
Artificial Intelligence

Background:

Scene understanding is a key area in computer vision, driven by recent progress in object detection, context modeling, and tracking.
Existing methods often struggle with complex object interactions, occlusions, and tracking objects with incomplete visibility.

Purpose of the Study:

To develop a novel probabilistic 3D scene model for enhanced scene understanding and multi-object tracking.
To integrate state-of-the-art multiclass object detection, tracking, scene labeling, and geometric 3D reasoning.
To enable robust 3D tracking of multiple object categories from monocular video, even under partial occlusion.

Main Methods:

A probabilistic 3D scene model incorporating multiclass object detection, object tracking, scene labeling, and geometric reasoning.
Explicit occlusion reasoning to handle partially or fully occluded objects over extended periods.
A joint scene tracklet model utilizing evidence from multiple frames to improve tracking performance.

Main Results:

The model successfully represents complex object interactions, including inter-object occlusion and physical exclusion.
Achieved state-of-the-art performance in 3D multi-people tracking using only monocular video.
Demonstrated significant performance gains in multiclass 3D tracking of cars and trucks on challenging datasets.

Conclusions:

The proposed probabilistic 3D scene model offers a robust solution for 3D multi-object tracking from monocular video.
Explicit occlusion reasoning and joint tracklet modeling are crucial for handling challenging real-world scenarios.
The approach shows broad applicability and significant improvements across various object categories and challenging datasets.