Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
Inductive Reasoning00:59

Inductive Reasoning

Inductive reasoning is a form of logical thinking that uses related observations to arrive at a general conclusion. It is uncertain and operates in degrees to which the conclusions are credible. As such, inductive arguments can be weak or strong, rather than valid or invalid, and conclusions can be used to formulate testable, falsifiable hypotheses.Inductive reasoning is common in descriptive science. A life scientist makes observations and records them. This data can be qualitative or...
Sensory Perception: Organization of the Somatosensory System01:11

Sensory Perception: Organization of the Somatosensory System

The somatosensory system is the central and peripheral nervous system component that senses and processes touch, pressure, pain, temperature, and body position or proprioception. The process of sensation takes place at three levels:
The receptor level:
The receptor level is the first stage of sensation. It involves the detection of a stimulus by specialized sensory receptors. The stimulus must arrive within the receptor's receptive field. Next, the receptor converts the energy of the stimulus...
Chunking and Rehearsal in Sensory Memory01:22

Chunking and Rehearsal in Sensory Memory

Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of information more...
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
Introduction to Special Senses01:26

Introduction to Special Senses

Sensory receptors play an integral part in comprehending our external and internal environments. They receive diverse stimuli, converting them into the nervous system's electrochemical signals. This conversion occurs as the stimulus alters the sensory neuron's cell membrane potential, instigating the generation of an action potential. This action potential is subsequently transmitted to the central nervous system (CNS), which integrates with other sensory data or higher cognitive functions.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Uncertainty-Aware Source-Free Domain Adaptive Semantic Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2023
Same author

Adaptive assistive robotics: a framework for triadic collaboration between humans and robots.

Royal Society open science·2023
Same author

A Haptic Sleeve as a Method of Mechanotactile Feedback Restoration for Myoelectric Hand Prosthesis Users.

Frontiers in rehabilitation sciences·2022
Same author

Deep Learning for Free-Hand Sketch: A Survey.

IEEE transactions on pattern analysis and machine intelligence·2022
Same author

Toward Fine-Grained Sketch-Based 3D Shape Retrieval.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2021
Same author

HapFIC: An Adaptive Force/Position Controller for Safe Environment Interaction in Articulated Systems.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society·2021
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jun 28, 2026

Creating Objects and Object Categories for Studying Perception and Perceptual Learning
14:38

Creating Objects and Object Categories for Studying Perception and Perceptual Learning

Published on: November 2, 2012

Structure inference for Bayesian multisensory scene understanding.

Timothy M Hospedales1, Sethu Vijayakumar

  • 1Institute of Perception, Action, and Behaviour, university of Edinburgh, Edinburgh EH8 9AB. t.hospedales@ac.uk

IEEE Transactions on Pattern Analysis and Machine Intelligence
|November 8, 2008
PubMed
Summary
This summary is machine-generated.

This article introduces a new computational method that helps computers understand complex environments by mimicking how humans combine or separate information from different senses, such as sight and sound. By using a mathematical framework, the system can automatically decide when to group sensory data together or keep it separate, improving how machines track multiple subjects in busy settings.

Keywords:
multimodal data associationmachine perception systemsaudio-visual trackingprobabilistic reasoning

Frequently Asked Questions

Related Experiment Videos

Last Updated: Jun 28, 2026

Creating Objects and Object Categories for Studying Perception and Perceptual Learning
14:38

Creating Objects and Object Categories for Studying Perception and Perceptual Learning

Published on: November 2, 2012

Area of Science:

  • Computational neuroscience and structure inference within machine perception
  • Multisensory integration research in cognitive science

Background:

No prior work had resolved how machine perception systems might effectively manage both the integration and segregation of multimodal sensory inputs. Prior research has shown that humans possess a robust ability to associate diverse sensory streams when appropriate. That uncertainty drove a focus on optimal fusion models, which often neglected the necessity of separating distinct signals. This gap motivated the development of a more comprehensive approach to scene understanding. It was already known that existing models failed to exploit the full potential of data association in temporal contexts. The current literature lacks a unified framework that addresses these dual requirements for multisensory processing. Consequently, many machine perception systems struggle to interpret complex, multi-party environments accurately. This study addresses these limitations by proposing a probabilistic method for managing sensory data.

Purpose Of The Study:

The aim of this study is to formulate a solution to multi-sensor scene understanding using Bayesian model selection and structure inference. The researchers seek to address the limitations of existing machine perception systems that focus solely on optimal fusion. They intend to develop a unified framework that accounts for both the integration and segregation of sensory inputs. The authors aim to demonstrate that explicit probabilistic reasoning about data association is vital for effective perception. They want to show that this approach is applicable to complex, multi-party audio-visual scenarios. The study seeks to provide a theoretical basis for understanding human psychophysics experiments related to cue integration. By implementing unsupervised learning, the team plans to automate the tracking of individual subjects. This work is motivated by the need for more sophisticated models that mimic human-like sensory processing capabilities.

Main Methods:

The review approach utilizes a Bayesian model selection framework to address multisensory scene understanding. Investigators implement probabilistic reasoning to manage the association of multimodal inputs over time. This design focuses on creating a unified architecture that handles both integration and segregation tasks. The team applies unsupervised learning techniques to automatically segment audio-visual sequences. They test the model by tracking individual subjects within a multi-party environment. The methodology emphasizes the importance of temporal context in refining sensory data interpretation. Researchers contrast this dynamic approach with static fusion methods that ignore segregation. This systematic strategy provides a clear pathway for evaluating the efficacy of the proposed mathematical model.

Main Results:

The strongest finding indicates that the proposed framework successfully segments and tracks subjects in complex audio-visual scenarios. The authors demonstrate that explicit probabilistic reasoning accounts for both integration and segregation, which prior models failed to achieve. Their results show that unsupervised learning effectively identifies the underlying structure of multimodal data. The study highlights that this approach explains previously confounding results in human psychophysics experiments. The researchers report that their method provides a more accurate representation of multisensory perception than traditional fusion-only systems. Data association is shown to be a key factor in improving the reliability of machine perception. The team observes that their model handles multi-party environments with high precision. These findings suggest that the theoretical foundation established here is robust for diverse sensory applications.

Conclusions:

The authors propose that their framework offers a robust theoretical foundation for explaining various confounding outcomes observed in human psychophysics. Their approach demonstrates that explicit probabilistic reasoning is sufficient to handle both integration and segregation tasks. The researchers suggest that this method provides a superior way to model multisensory perception compared to traditional fusion-only techniques. By accounting for data association, the model achieves more accurate tracking of individual subjects in audio-visual scenarios. The study indicates that unsupervised learning plays a key role in the successful implementation of this structure inference. The authors maintain that their work bridges the gap between machine perception and human-like sensory processing. Their findings imply that explicit inference is necessary for higher-level understanding of complex multisensory data. The team concludes that this unified Bayesian solution effectively resolves long-standing challenges in the field of scene understanding.

The researchers propose a Bayesian model selection framework that utilizes explicit probabilistic reasoning. Unlike traditional fusion systems that only combine inputs, this approach enables the machine to dynamically decide whether to integrate or segregate sensory signals based on temporal data association.

The authors employ unsupervised learning to perform structure inference. This component allows the system to automatically segment, associate, and track individual subjects within complex audio-visual sequences without needing pre-labeled training data.

The authors argue that explicit inference of data association is necessary to resolve confounding results in human psychophysics. This technical requirement ensures that the model can distinguish between related and unrelated sensory cues in multi-party environments.

The researchers utilize temporal context to inform their probabilistic reasoning. This data type allows the system to maintain consistent tracking of subjects over time, which is a significant improvement over static models that lack temporal awareness.

The study measures the effectiveness of the model by its ability to segment and track subjects in multi-party audio-visual scenarios. This phenomenon highlights the system's capacity to handle complex, real-world inputs compared to simpler, controlled laboratory settings.

The authors claim that their framework provides the theoretical foundation required to explain human multisensory cue integration. They suggest this approach offers a more accurate representation of biological perception than previous, limited computational models.