Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Auditory Pathway01:15

Auditory Pathway

6.9K
Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...
6.9K
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

351
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
351
Masking and Demasking Agents01:19

Masking and Demasking Agents

3.4K
EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on...
3.4K
Classification of Signals01:30

Classification of Signals

1.3K
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
1.3K
Air-entraining Agents01:27

Air-entraining Agents

224
Air-entraining agents improve the durability and workability of concrete in climates with frequent freezing and thawing. These agents prevent cracks by introducing small air bubbles into the mix, creating spaces accommodating water expansion when temperatures drop. The air-entraining agents lower the surface tension of water, forming stable, small air bubbles. This method is more effective than having accidental large voids, as the intentional, smaller, and evenly distributed air voids improve...
224
Auditory Perception01:17

Auditory Perception

949
The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...
949

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Comparison of SERS spectral data sets of blood serum samples of hypopharyngeal cancer using silver and gold nanoparticles as substrates.

Bioanalysis·2026
Same author

Structure-Based Virtual Screening of Natural Product-Derived Inhibitors Targeting Rv3806c in the Decaprenylphosphoryl-d-Arabinose Biosynthetic Pathway of <i>Mycobacterium tuberculosis</i>.

International journal of molecular sciences·2026
Same author

Green-Synthesized Zinc Oxide Nanoparticles from Selenicereus grandiflorus Exhibit Potent Nematicidal Activity Against Meloidogyne incognita.

Current microbiology·2026
Same author

Exogenous pipecolic acid ameliorates lead-induced phytotoxicity in chili (<i>Capsicum annuum</i> L.) by enhancing antioxidant capacity and physiological homeostasis.

Plant signaling & behavior·2026
Same author

Field-scale isotope tracing reveals combined organic fertilizer-antibiotic effects in amplifying antimicrobial resistance in the soil-lettuce continuum.

Journal of hazardous materials·2026
Same author

Microbiome-host interactions driving the transition from non-pregnant to pregnant states in a goat model.

Journal of animal science·2026
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Jan 3, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

760

Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model.

Rehan Ahmad1, Syed Zubair2, Hani Alquhayz3

  • 1Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan.

Sensors (Basel, Switzerland)
|November 29, 2019
PubMed
Summary
This summary is machine-generated.

This study introduces a new multimodal speaker diarization method using audio-visual synchronization to identify who spoke when. The technique significantly improves accuracy over audio-only methods for speaker diarization.

Keywords:
Gaussian mixture modelMFCCSyncNetdiarization error ratespeaker diarizationspeech activity detection

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.0K
Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks
08:32

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

5.9K

Related Experiment Videos

Last Updated: Jan 3, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

760
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.0K
Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks
08:32

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

5.9K

Area of Science:

  • Computer Science
  • Signal Processing
  • Artificial Intelligence

Background:

  • Speaker diarization systems identify speakers in recordings.
  • Existing methods often rely solely on audio data.
  • Multimodal approaches offer potential for improved accuracy.

Purpose of the Study:

  • To propose a novel multimodal speaker diarization technique.
  • To leverage audio-visual synchronization for identifying active speakers.
  • To enhance the accuracy of speaker diarization systems.

Main Methods:

  • Utilized a pre-trained audio-visual synchronization model.
  • Employed face detection to extract face-only video segments.
  • Matched audio frames with synchronized visual input using a two-streamed network.
  • Trained Gaussian Mixture Model (GMM)-based clusters on high-confidence segments.

Main Results:

  • Achieved significant improvement in Diarization Error Rate (DER) compared to audio-only methods.
  • Demonstrated performance close to state-of-the-art multimodal diarization techniques.
  • Validated the effectiveness on the AMI meeting corpus.

Conclusions:

  • The proposed multimodal approach offers a simple yet effective solution for speaker diarization.
  • Audio-visual synchronization is a valuable component for accurate speaker identification.
  • This method provides a strong baseline for future multimodal diarization research.