Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Pulse amplitude and quality

Pulse amplitude and quality

Pulse amplitude is a crucial indicator of cardiac health because it provides valuable insights into the strength of left ventricular contractions and the overall uniformity of blood circulation within the vasculature. The strength of the pulse is directly related to the force with which the heart contracts and the volume of blood being pumped.
A weak or absent pulse may indicate reduced cardiac output or poor left ventricular contraction, which can be signs of cardiovascular dysfunction or...

Force Classification

Force Classification

Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...

Unsoundness of Aggregate due to Volume Change

Unsoundness of Aggregate due to Volume Change

Unsoundness in aggregates due to volume changes is primarily caused by the physical alterations aggregates undergo, such as freezing and thawing, thermal changes, and wetting and drying. Unsound aggregates, when subjected to these changes, result in volume change upon disintegration. This, in turn, contributes to the deterioration of concrete, including scaling, pop-outs, and cracking. Particular types of aggregates, such as porous flints, cherts, and those containing clay minerals, are...

Downsampling

Downsampling

When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Capecitabine maintenance therapy in metastatic colorectal cancer patients with no evidence of disease: CAMCO trial.

Future oncology (London, England)·2023

Same author

Effectiveness of an online/offline mixed-mode Tai Chi cardiac rehabilitation program on microcirculation in patients with coronary artery disease: A randomized controlled study.

Clinical hemorheology and microcirculation·2023

Same author

Adverse effects of exposure to fine particles and ultrafine particles in the environment on different organs of organisms.

Journal of environmental sciences (China)·2023

Same author

Bipedicular percutaneous kyphoplasty versus unipedicular percutaneous kyphoplasty in the treatment of asymmetric osteoporotic vertebral compression fractures: a case control study.

BMC surgery·2023

Same author

Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly.

IEEE transactions on cybernetics·2023

Same author

5-ALA Improves the Low Temperature Tolerance of Common Bean Seedlings through a Combination of Hormone Transduction Pathways and Chlorophyll Metabolism.

International journal of molecular sciences·2023

Same journal

<math></math> Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2025

Same journal

Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation.

IEEE/ACM transactions on audio, speech, and language processing·2025

Same journal

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2024

Same journal

Glottal Airflow Estimation using Neck Surface Acceleration and Low-Order Kalman Smoothing.

IEEE/ACM transactions on audio, speech, and language processing·2023

Same journal

Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform.

IEEE/ACM transactions on audio, speech, and language processing·2023

Same journal

Attentive Training: A New Training Framework for Speech Enhancement.

IEEE/ACM transactions on audio, speech, and language processing·2023

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 12, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection.

Jianwei Zhang¹, Julie Liss², Suren Jayasuriya³

¹School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281, USA.

IEEE/ACM Transactions on Audio, Speech, and Language Processing

|October 30, 2023

Summary

This summary is machine-generated.

This study introduces a novel deep learning framework for accurate automatic detection of voice impairments (dysphonia). The method generates robust acoustic embeddings, improving performance across different datasets and conditions.

Keywords:

Dysphonic voice contrastive loss embedding learning

More Related Videos

Author Spotlight: Advancements in the Fabrication of Synthetic Vocal Fold Models for Phonetic and Robotic Applications

Author Spotlight: Advancements in the Fabrication of Synthetic Vocal Fold Models for Phonetic and Robotic Applications

Published on: January 5, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Related Experiment Videos

Last Updated: Jul 12, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Author Spotlight: Advancements in the Fabrication of Synthetic Vocal Fold Models for Phonetic and Robotic Applications

Author Spotlight: Advancements in the Fabrication of Synthetic Vocal Fold Models for Phonetic and Robotic Applications

Published on: January 5, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Area of Science:

Speech processing
Biomedical engineering
Machine learning

Background:

Approximately 1.2% of the global population experiences impaired voice production, necessitating reliable automated assessment tools.
Current automated voice analysis methods often lack generalizability across different datasets and applications.
There is a significant need for robust and accurate methods for dysphonic voice detection.

Purpose of the Study:

To develop a deep learning framework for generating acoustic feature embeddings that are sensitive to vocal quality.
To enhance the robustness of voice analysis models across diverse corpora and conditions.
To improve the accuracy and generalizability of automatic dysphonic voice detection.

Main Methods:

A deep learning model was trained using a combination of contrastive and classification loss functions.
Data warping techniques were applied to input voice samples to increase model robustness.
The framework was designed to generate acoustic feature embeddings sensitive to voice quality.

Main Results:

The proposed method achieved high classification accuracy both within and across different corpora.
The generated embeddings demonstrated sensitivity to voice quality and robustness across varied datasets.
The model consistently outperformed three baseline methods on clean and deteriorated voice datasets.

Conclusions:

The developed deep learning framework offers a robust and accurate solution for automatic dysphonic voice detection.
The method's ability to generalize across corpora makes it suitable for diverse clinical and research applications.
The generated acoustic embeddings hold potential for further advancements in voice quality assessment.