Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Linear Approximation in Frequency Domain

Linear Approximation in Frequency Domain

Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....

Larynx

Larynx

The human larynx, often referred to as the voice box, is an intricate organ located in the neck. It serves as a pathway for air to enter the lungs during respiration and is an essential component of voice production.
Anatomy of the Larynx
The larynx consists of various components, including cartilage, muscles, and vocal cords. Its structure includes three large unpaired cartilages—the thyroid, cricoid, and epiglottis—and three smaller paired cartilages—the arytenoids,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Respiratory Inhaler Sound Event Classification Using Self-Supervised Learning.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025

Same author

Association of hypoperfusion intensity ratio and cerebral blood volume Index with good outcome in patients transferred for thrombectomy.

Interventional neuroradiology : journal of peritherapeutic neuroradiology, surgical procedures and related neurosciences·2025

Same author

Surgical Aortic Valve Replacement Using a Porcine Model: A Low-Cost Simulation for Surgical Trainees.

Cureus·2024

Same author

Neurological outcomes for patients meeting radiographic criteria for DEFUSE 3 and SELECT2.

Journal of neurointerventional surgery·2024

Same author

Predictors of social risk for post-ischemic stroke reintegration.

Scientific reports·2024

Same author

Understanding and Predicting Cognitive Improvement of Young Adults in Ischemic Stroke Rehabilitation Therapy.

Frontiers in neurology·2022

Same journal

High-resolution depth estimation for multiple wideband sources in deep sea via sparse Bayesian learninga).

The Journal of the Acoustical Society of America·2026

Same journal

Depression markers in speech: An approach based on tract variables dynamics.

The Journal of the Acoustical Society of America·2026

Same journal

The oyster toadfish (Opsanus tau) alters active and diurnal calling amid vessel noise in New York City.

The Journal of the Acoustical Society of America·2026

Same journal

Experimental noise characterisation of phase-locked tandem-rotor in edgewise flight.

The Journal of the Acoustical Society of America·2026

Same journal

The tune-text-temporal synergy: Prosodic effects of final segmental weakening in Neapolitan.

The Journal of the Acoustical Society of America·2026

Same journal

Monitoring vessel movement above critical offshore infrastructure using distributed acoustic sensing.

The Journal of the Acoustical Society of America·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Speech quality estimation with deep lattice networks.

Michael Chinen¹, Jan Skoglund¹, Andrew Hines²

¹Chrome Media Audio, Google LLC, San Francisco, USA.

The Journal of the Acoustical Society of America

|July 9, 2021

Summary

This summary is machine-generated.

Deep lattice networks (DLNs) improve speech quality estimation by providing monotonic mapping of similarity scores to mean opinion scores (MOS). This novel approach enhances prediction accuracy and offers uncertainty measures for more reliable speech quality assessment.

More Related Videos

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Related Experiment Videos

Last Updated: Oct 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Area of Science:

Speech processing and signal analysis
Machine learning for audio quality assessment
Human-computer interaction and subjective evaluation

Background:

Objective speech quality estimation typically maps similarity scores to Mean Opinion Scores (MOS) using fitted functions.
Advanced models like Support Vector Regression (SVR) and deep neural networks offer multidimensional inputs but lack monotonic property between similarity and quality.
Existing methods like PESQ and POLQA have limitations in accurately reflecting subjective speech quality.

Purpose of the Study:

To investigate a multidimensional mapping function using Deep Lattice Networks (DLNs) for speech quality estimation.
To incorporate monotonic constraints into the mapping of similarity features to MOS.
To develop a more accurate and reliable objective speech quality assessment model.

Main Methods:

Utilized Deep Lattice Networks (DLNs) for multidimensional mapping of speech features.
Employed input features from ViSQOL (Virtual Speech Quality Objective Listener) for the DLN model.
Trained and evaluated the DLN on diverse datasets including Voice over IP (VoIP) and codec degradations.

Main Results:

Achieved a mean-square error of 0.24 in speech mapping, outperforming 1-D functions, SVR, PESQ, and POLQA.
Demonstrated that DLNs can learn a well-calibrated quantile function for uncertainty measurement.
Showcased improved mapping of similarity representations to human-interpretable scales, offering quantile intervals instead of point estimates.

Conclusions:

Deep Lattice Networks provide a robust framework for monotonic speech quality mapping, enhancing prediction accuracy.
The DLN model offers a valuable measure of uncertainty through quantile functions, improving the interpretability of predictions.
This research advances objective speech quality estimation by integrating deep learning with monotonic constraints and uncertainty quantification.