Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Auditory Pathway

Auditory Pathway

Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

Masking and Demasking Agents

Masking and Demasking Agents

EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Air-entraining Agents

Air-entraining Agents

Air-entraining agents improve the durability and workability of concrete in climates with frequent freezing and thawing. These agents prevent cracks by introducing small air bubbles into the mix, creating spaces accommodating water expansion when temperatures drop. The air-entraining agents lower the surface tension of water, forming stable, small air bubbles. This method is more effective than having accidental large voids, as the intentional, smaller, and evenly distributed air voids improve...

Auditory Perception

Auditory Perception

The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Comparison of SERS spectral data sets of blood serum samples of hypopharyngeal cancer using silver and gold nanoparticles as substrates.

Bioanalysis·2026

Same author

Structure-Based Virtual Screening of Natural Product-Derived Inhibitors Targeting Rv3806c in the Decaprenylphosphoryl-d-Arabinose Biosynthetic Pathway of <i>Mycobacterium tuberculosis</i>.

International journal of molecular sciences·2026

Same author

Green-Synthesized Zinc Oxide Nanoparticles from Selenicereus grandiflorus Exhibit Potent Nematicidal Activity Against Meloidogyne incognita.

Current microbiology·2026

Same author

Exogenous pipecolic acid ameliorates lead-induced phytotoxicity in chili (<i>Capsicum annuum</i> L.) by enhancing antioxidant capacity and physiological homeostasis.

Plant signaling & behavior·2026

Same author

Field-scale isotope tracing reveals combined organic fertilizer-antibiotic effects in amplifying antimicrobial resistance in the soil-lettuce continuum.

Journal of hazardous materials·2026

Same author

Microbiome-host interactions driving the transition from non-pregnant to pregnant states in a goat model.

Journal of animal science·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 3, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model.

Rehan Ahmad¹, Syed Zubair², Hani Alquhayz³

¹Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan.

Sensors (Basel, Switzerland)

|November 29, 2019

Summary

This summary is machine-generated.

This study introduces a new multimodal speaker diarization method using audio-visual synchronization to identify who spoke when. The technique significantly improves accuracy over audio-only methods for speaker diarization.

Keywords:

Gaussian mixture model MFCC SyncNet diarization error rate speaker diarization speech activity detection

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Related Experiment Videos

Last Updated: Jan 3, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Area of Science:

Computer Science
Signal Processing
Artificial Intelligence

Background:

Speaker diarization systems identify speakers in recordings.
Existing methods often rely solely on audio data.
Multimodal approaches offer potential for improved accuracy.

Purpose of the Study:

To propose a novel multimodal speaker diarization technique.
To leverage audio-visual synchronization for identifying active speakers.
To enhance the accuracy of speaker diarization systems.

Main Methods:

Utilized a pre-trained audio-visual synchronization model.
Employed face detection to extract face-only video segments.
Matched audio frames with synchronized visual input using a two-streamed network.
Trained Gaussian Mixture Model (GMM)-based clusters on high-confidence segments.

Main Results:

Achieved significant improvement in Diarization Error Rate (DER) compared to audio-only methods.
Demonstrated performance close to state-of-the-art multimodal diarization techniques.
Validated the effectiveness on the AMI meeting corpus.

Conclusions:

The proposed multimodal approach offers a simple yet effective solution for speaker diarization.
Audio-visual synchronization is a valuable component for accurate speaker identification.
This method provides a strong baseline for future multimodal diarization research.