An interpretable machine learning approach based on SHAP, Sobol and LIME values for precise estimation of daily soybean crop coefficients

  • 0College of Agricultural Science and Engineering, Hohai University, Nanjing, 211100, China.

|

|

Summary

This summary is machine-generated.

Machine learning models accurately predict daily crop coefficients (Kc) for soybean irrigation management. The Extra Tree model showed the highest accuracy, improving water use efficiency in arid regions.

Area Of Science

  • Agricultural Science
  • Environmental Science
  • Data Science

Background

  • Increasing water scarcity and climate variability necessitate precise agricultural irrigation management.
  • Accurate estimation of crop coefficients (Kc) is crucial for determining crop water needs, especially in arid and semi-arid regions.
  • Conventional Kc estimation methods may not capture local climatic variations.

Purpose Of The Study

  • To predict the daily crop coefficient (Kc) for soybean using machine learning models.
  • To evaluate the interpretability and physical consistency of these models.
  • To provide a robust framework for improving Kc estimation and supporting sustainable irrigation.

Main Methods

  • Four machine learning models (Extra Tree, XGBoost, Random Forest, CatBoost) were employed.
  • Models were trained on meteorological data from Suhaj Governorate, Egypt (1979-2014).
  • Interpretability was assessed using SHapley Additive exPlanations (SHAP), Sobol sensitivity analysis, and Local Interpretable Model-agnostic Explanations (LIME).

Main Results

  • The Extra Tree (ET) model achieved the highest accuracy (r=0.96, NSE=0.93, RMSE=0.05, MAE=0.02).
  • XGBoost and Random Forest models also demonstrated high performance.
  • Antecedent crop coefficient and solar radiation were identified as key influential variables by SHAP and Sobol analyses.

Conclusions

  • Interpretable machine learning models enhance the accuracy and reliability of daily Kc estimation.
  • The study highlights the importance of aligning model predictions with physical processes for robust agricultural management.
  • The proposed framework supports sustainable irrigation practices and climate-resilient agriculture.

Related Concept Videos

Multiple Regression 01:25

3.7K

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Calibration Curves: Linear Least Squares 01:20

4.1K

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

Calculating and Interpreting the Linear Correlation Coefficient 01:11

7.9K

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:

where n = the number of data points.
The 95% critical values of the sample correlation coefficient table can be used to give you a...

Light Acquisition 02:16

9.4K

In order to produce glucose, plants need to capture sufficient light energy. Many modern plants have evolved leaves specialized for light acquisition. Leaves can be only millimeters in width or tens of meters wide, depending on the environment. Due to competition for sunlight, evolution has driven the evolution of increasingly larger leaves and taller plants, to avoid shading by their neighbors with contaminant elaboration of root architecture and mechanisms to transport water and nutrients.

Regression Analysis 01:11

8.0K

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

In the equation,  is the dependent...

Residuals and Least-Squares Property 01:11

9.1K

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...