Abstract
This study utilizes a Fourier transform infrared spectroscopy (FTIR)-based detection system to obtain and analyze the infrared spectra of cigarette smoke aerosols. To reduce the workload of spectral data acquisition and improve efficiency, we developed the Spectral Attention Denoising Autoencoder (SADA) model, which integrates an autoencoder (AE) architecture with a self-attention mechanism and incorporates a noise injection strategy. Compared to mainstream generative models, the SADA model performs better in generating accurate and high-fidelity spectra. To further validate the effectiveness of the generated spectra, we conducted classification experiments on hybrid datasets. By augmenting real spectral data with generated spectra, we observed significant improvements in classification accuracy across several mainstream classification models. Ablation experiments confirmed the critical roles of the self-attention mechanism and noise injection strategy in feature extraction and stable training. Additionally, the model exhibited excellent generalization capabilities across multiple public spectral datasets. The proposed SADA model not only alleviates the burden of spectral data acquisition but also provides an effective data augmentation strategy for spectral analysis tasks.