A Hierarchical RF-XGBoost Model for Short-Cycle Agricultural Product Sales Forecasting
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a hierarchical RF-XGBoost model for precise agricultural sales forecasting, significantly reducing food waste by improving demand prediction. The model enhances supply chain efficiency from farm to table.
Area Of Science
- Agricultural Economics
- Data Science
- Supply Chain Management
Background
- Short-cycle agricultural product sales forecasting is crucial for minimizing food waste by aligning supply with demand.
- Forecasting accuracy is challenged by volatile and discontinuous sales data due to uncertain factors.
- Existing models often struggle with the inherent complexities of agricultural market dynamics.
Purpose Of The Study
- To develop and evaluate a novel hierarchical prediction model for enhanced short-term agricultural product sales forecasting.
- To improve demand prediction accuracy, thereby reducing food waste and optimizing the agricultural supply chain.
- To address the volatility and discontinuity in sales data through an advanced modeling approach.
Main Methods
- A hierarchical model combining Random Forest (RF) and Extreme Gradient Boosting (XGBoost) was developed.
- The first layer uses RF with Grey Relation Analysis (GRA) for initial predictions and residual extraction.
- The second layer employs XGBoost on residual clustering features for refined forecasting.
Main Results
- The proposed RF-XGBoost model demonstrated superior performance compared to standalone RF and XGBoost.
- Achieved a 10% and 12% reduction in Mean Absolute Percentage Error (MAPE) over RF and XGBoost, respectively.
- Showcased a 22% and 24% increase in the coefficient of determination (R²), indicating higher prediction accuracy.
Conclusions
- The hierarchical RF-XGBoost model significantly enhances the precision of short-term agricultural sales forecasting.
- The model's effectiveness was validated across diverse agricultural products, demonstrating broad applicability.
- Implementation offers substantial benefits for supply chain optimization and food waste reduction.
Related Concept Videos
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
In the equation, is the dependent...
Regression analysis in Microsoft Excel is a powerful statistical method for examining the relationship between a dependent variable and one or more independent variables. It's used extensively in fields such as economics, biology, and business to predict outcomes, understand relationships, and make data-driven decisions. The most common type is linear regression, which attempts to fit a straight line through the data points to model the relationship between variables.
To perform regression...
Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
Physiological and compartmental models are valuable tools used in studying biological systems. These models rely on differential equations to maintain mass balance within the system, ensuring an accurate representation of the dynamic processes at play.
Physiological models take a detailed approach by considering specific molecular processes. They can predict drug distribution, metabolism, and elimination changes, providing a comprehensive understanding of how drugs interact with the body.

