Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

563
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
563
Linear Approximation in Frequency Domain01:26

Linear Approximation in Frequency Domain

199
Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....
199
Larynx01:21

Larynx

2.6K
The human larynx, often referred to as the voice box, is an intricate organ located in the neck. It serves as a pathway for air to enter the lungs during respiration and is an essential component of voice production.
Anatomy of the Larynx
The larynx consists of various components, including cartilage, muscles, and vocal cords. Its structure includes three large unpaired cartilages—the thyroid, cricoid, and epiglottis—and three smaller paired cartilages—the arytenoids,...
2.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Respiratory Inhaler Sound Event Classification Using Self-Supervised Learning.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same author

Association of hypoperfusion intensity ratio and cerebral blood volume Index with good outcome in patients transferred for thrombectomy.

Interventional neuroradiology : journal of peritherapeutic neuroradiology, surgical procedures and related neurosciences·2025
Same author

Surgical Aortic Valve Replacement Using a Porcine Model: A Low-Cost Simulation for Surgical Trainees.

Cureus·2024
Same author

Neurological outcomes for patients meeting radiographic criteria for DEFUSE 3 and SELECT2.

Journal of neurointerventional surgery·2024
Same author

Predictors of social risk for post-ischemic stroke reintegration.

Scientific reports·2024
Same author

Understanding and Predicting Cognitive Improvement of Young Adults in Ischemic Stroke Rehabilitation Therapy.

Frontiers in neurology·2022
Same journal

High-resolution depth estimation for multiple wideband sources in deep sea via sparse Bayesian learninga).

The Journal of the Acoustical Society of America·2026
Same journal

Depression markers in speech: An approach based on tract variables dynamics.

The Journal of the Acoustical Society of America·2026
Same journal

The oyster toadfish (Opsanus tau) alters active and diurnal calling amid vessel noise in New York City.

The Journal of the Acoustical Society of America·2026
Same journal

Experimental noise characterisation of phase-locked tandem-rotor in edgewise flight.

The Journal of the Acoustical Society of America·2026
Same journal

The tune-text-temporal synergy: Prosodic effects of final segmental weakening in Neapolitan.

The Journal of the Acoustical Society of America·2026
Same journal

Monitoring vessel movement above critical offshore infrastructure using distributed acoustic sensing.

The Journal of the Acoustical Society of America·2026
See all related articles

Related Experiment Video

Updated: Oct 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K

Speech quality estimation with deep lattice networks.

Michael Chinen1, Jan Skoglund1, Andrew Hines2

  • 1Chrome Media Audio, Google LLC, San Francisco, USA.

The Journal of the Acoustical Society of America
|July 9, 2021
PubMed
Summary
This summary is machine-generated.

Deep lattice networks (DLNs) improve speech quality estimation by providing monotonic mapping of similarity scores to mean opinion scores (MOS). This novel approach enhances prediction accuracy and offers uncertainty measures for more reliable speech quality assessment.

More Related Videos

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

580
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

626

Related Experiment Videos

Last Updated: Oct 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K
Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

580
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

626

Area of Science:

  • Speech processing and signal analysis
  • Machine learning for audio quality assessment
  • Human-computer interaction and subjective evaluation

Background:

  • Objective speech quality estimation typically maps similarity scores to Mean Opinion Scores (MOS) using fitted functions.
  • Advanced models like Support Vector Regression (SVR) and deep neural networks offer multidimensional inputs but lack monotonic property between similarity and quality.
  • Existing methods like PESQ and POLQA have limitations in accurately reflecting subjective speech quality.

Purpose of the Study:

  • To investigate a multidimensional mapping function using Deep Lattice Networks (DLNs) for speech quality estimation.
  • To incorporate monotonic constraints into the mapping of similarity features to MOS.
  • To develop a more accurate and reliable objective speech quality assessment model.

Main Methods:

  • Utilized Deep Lattice Networks (DLNs) for multidimensional mapping of speech features.
  • Employed input features from ViSQOL (Virtual Speech Quality Objective Listener) for the DLN model.
  • Trained and evaluated the DLN on diverse datasets including Voice over IP (VoIP) and codec degradations.

Main Results:

  • Achieved a mean-square error of 0.24 in speech mapping, outperforming 1-D functions, SVR, PESQ, and POLQA.
  • Demonstrated that DLNs can learn a well-calibrated quantile function for uncertainty measurement.
  • Showcased improved mapping of similarity representations to human-interpretable scales, offering quantile intervals instead of point estimates.

Conclusions:

  • Deep Lattice Networks provide a robust framework for monotonic speech quality mapping, enhancing prediction accuracy.
  • The DLN model offers a valuable measure of uncertainty through quantile functions, improving the interpretability of predictions.
  • This research advances objective speech quality estimation by integrating deep learning with monotonic constraints and uncertainty quantification.