AI-Powered Voice Separation Algorithms: Testing Accuracy in Reconstructing Fundamental Frequency for Vocal Analysis | JoVE Visualize

Area of Science:

Music Information Retrieval
Signal Processing
Acoustic Analysis

Background:

Singing voice analysis typically requires high-quality recordings, limiting research on commercial recordings (CRs).
CRs present challenges like complex accompaniment, recording artifacts, and post-production, hindering accurate fundamental frequency (f_o) extraction.
Existing voice separation methods need evaluation for their efficacy in analyzing f_o from imperfect CRs.

Purpose of the Study:

To evaluate the effectiveness of voice separation techniques for reliable fundamental frequency (f_o) extraction from commercial recordings.
To compare the performance of different AI-based separation methods against baselines under varying signal-to-noise ratios (SNRs).
To determine the feasibility of using these methods for large-scale analysis of singing voice characteristics in archival and commercial audio.

Main Methods:

Synthesized vocals with ground-truth f_o were mixed with instrumental introductions from seven commercial recordings at five SNRs (-12 to +12 dB).
Applied methods included unfiltered baseline, bandpass filtering (2.2-6 kHz), iZotope RX10, Music.ai, and robust principal component analysis (RPCA).
Extracted f_o contours using Praat and compared them to ground truth via success rate, resolved rate (≤10 cents deviation), and receiver operating characteristic analysis.

Main Results:

Music.ai demonstrated the most robust performance, achieving over 80% success at SNR ≥ 0 dB and degrading least at lower SNRs.
iZotope RX10 performed similarly at positive SNRs but showed greater decline with increased noise.
Bandpass filtering was comparable to top separation methods at higher SNRs; RPCA showed lower overall accuracy. Accuracy decreased below 0 dB and was influenced by accompaniment complexity and recording quality.

Conclusions:

AI-based voice separation methods, particularly Music.ai, can significantly enhance the analysis of singing voices in commercial and archival recordings.
These advancements make large-scale studies of vocal style and technique more feasible.
Continued validation across diverse genres and recording conditions is crucial to ensure reliable insights into vocal expression and performance nuances.