Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

353
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
353
Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

860
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
860
Linear Approximation in Frequency Domain01:26

Linear Approximation in Frequency Domain

319
Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....
319
Sampling Continuous Time Signal01:11

Sampling Continuous Time Signal

634
In signal processing, a continuous-time signal can be sampled using an impulse-train sampling technique, followed by the zero-order hold method. Impulse-train sampling involves the use of a periodic impulse train, which consists of a series of delta functions spaced at regular intervals determined by the sampling period. When a continuous-time signal is multiplied by this impulse train, it generates impulses with amplitudes corresponding to the signal's values at the sampling points.
In the...
634
Facial Feedback Hypothesis01:24

Facial Feedback Hypothesis

506
Charles Darwin proposed that facial expressions are an evolutionary adaptation for communication. He argued that these expressions are not influenced by culture but are universal across species. For example, a snarling expression with exposed teeth signals a threat in many animals, including humans. Darwin also suggested that displaying an emotion can intensify the feeling. Smiling, for example, could enhance one's sense of happiness. This idea laid the foundation for understanding the role...
506
Sampling Plans01:23

Sampling Plans

842
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
842

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Learning 3-D Ultrasound Segmentation under Extreme Label Deficiency.

Ultrasound in medicine & biology·2026
Same author

Integrating a Large Language Model Into a Socially Assistive Robot in a Hospital Geriatric Unit: Two-Wave Comparative Study on Performance, Engagement, and User Perceptions.

JMIR human factors·2025
Same author

Enhancing surgical object detection in laparoscopic cholecystectomy with explicit positional relationship modeling.

Computational and structural biotechnology journal·2025
Same author

Acceptability and Usability of a Socially Assistive Robot Integrated With a Large Language Model for Enhanced Human-Robot Interaction in a Geriatric Care Institution: Mixed Methods Evaluation.

JMIR human factors·2025
Same author

Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos.

IEEE transactions on pattern analysis and machine intelligence·2024
Same author

A multimodal dynamical variational autoencoder for audiovisual speech representation learning.

Neural networks : the official journal of the International Neural Network Society·2024
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jan 3, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.0K

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers.

Yutong Ban, Xavier Alameda-Pineda, Laurent Girin

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |November 22, 2019
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces an audio-visual fusion model for tracking multiple speakers, enhancing accuracy and handling missing data. The generative model accurately estimates speaker trajectories and acoustic status in real-world meetings.

    More Related Videos

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.1K
    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
    09:27

    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

    Published on: October 13, 2018

    10.6K

    Related Experiment Videos

    Last Updated: Jan 3, 2026

    Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
    05:48

    Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

    Published on: August 9, 2024

    2.0K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.1K
    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
    09:27

    Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

    Published on: October 13, 2018

    10.6K

    Area of Science:

    • Computer Vision
    • Machine Learning
    • Signal Processing

    Background:

    • Accurate tracking of multiple speakers is crucial for applications like meeting analysis and human-computer interaction.
    • Existing methods often struggle with modality absence or accurately estimating speaker activity.
    • Integrating visual and auditory cues offers complementary information for robust tracking.

    Purpose of the Study:

    • To develop a novel audio-visual fusion model for multi-speaker tracking.
    • To accurately estimate speaker trajectories and speaking status.
    • To robustly handle temporary missing visual or auditory data.

    Main Methods:

    • A generative audio-visual fusion model formulated as a latent-variable temporal graphical model.
    • Variational inference to approximate intractable posterior distributions.
    • A closed-form expectation-maximization procedure for parameter estimation.

    Main Results:

    • The proposed model accurately estimates smooth trajectories of multiple speakers.
    • It effectively handles short periods of missing audio or visual information.
    • The system successfully estimates the speaking status (speaking/silent) of each individual.
    • Performance evaluation shows superior results compared to baseline methods in informal meeting scenarios.

    Conclusions:

    • The proposed audio-visual fusion approach provides a robust and accurate solution for multi-speaker tracking.
    • The model's ability to integrate complementary modalities and handle data absence is a key strength.
    • This method shows significant promise for real-world applications involving dynamic group interactions.