Abstract
Daqu is a traditional Chinese brewing ingredient that serves dual functions of saccharification and fermentation during the brewing process. The acidity content during the Daqu fermentation process directly affects the quality of the Daqu. Traditional methods for measuring Daqu acidity are complex and exhibit lag, making it difficult to monitor fermentation acidity in real time. Given the strong correlation between Daqu acidity and environmental variables, this paper proposes a time series prediction model for Daqu acidity based on the KNN-Attention-LSTM-XGBoost model. Upon collecting and analyzing the microenvironmental parameters of Daqu, the XGBoost model was used to select two optimal imputation methods (LFBI and KNN). Partial Least Squares Regression (PLSR) was employed to extract key parameters, and feature extraction using the lag and rolling window methods was performed to capture temporal trends and fluctuations. Comparative analysis revealed that KNN preprocessing combined with the Attention-LSTM-XGBoost model performed best in predicting Daqu acidity, with R2 values reaching 0.9790, 0.9768, and 0.9636 for the upper, middle, and lower Daqu layers, respectively. This combination outperformed the LSTM-XGBoost and XGBoost models, with improvements of 3.87%, 1.11%, and 2.84% compared to LSTM-XGBoost, and 4.70%, 4.37%, and 8.46% compared to XGBoost. This study addresses the challenge of predicting Daqu acidity during fermentation and provides insights into the optimization of the Daqu fermentation process.