hossin shekofte; maryam doustaky; aezam maseodi
Abstract
Introduction: Soil quality is defined as the capacity of a soil to function within different land uses and ecosystem boundaries, sustain biological productivity, maintain environmental quality and promote plant, animal, and human health. Soil quality cannot be directly measured but can be evaluated on ...
Read More
Introduction: Soil quality is defined as the capacity of a soil to function within different land uses and ecosystem boundaries, sustain biological productivity, maintain environmental quality and promote plant, animal, and human health. Soil quality cannot be directly measured but can be evaluated on the basis of several parameters; the type of parameter to be used depends on research scale and goals. Soil quality indicators (SQIs) are used to evaluate the effect of different management and types of land use on soil quality and can be achieved by easily-measured soil physicochemical properties. Soil quality indicators are measurable characteristics of the soil affecting the soil capacity for crop production or environmental performance. Air capacity (AC), relative field capacity (RFC) and plant available water (PAWC) are the most important indicators. Selection of appropriate input parameters is the first and most important step in predicting SQIs. Feature selection can be defined as the identification and selection of a subset of useful features among the primary data collected. One of the methods for choosing the features is the Pearson coefficient, which shows the correlation between the input variables and target variable. When the coefficient is close to one, there is a strong relationship between the input and the target variable. The features having a correlation coefficients of greater than or equal to 0.9 are considered important and less than that are considered non-important. Decision tree algorithm is one of the prediction approaches in statistics and data mining literature. This algorithm can select the property with the highest separation capability. Working with this algorithm and interpret its results is very straightforward. The aims of this study were to select the best set of input properties influencing SQIs using Pearson correlation coefficient and then model the effect of the input properties by decision tree and multiple linear regression.
Materials and Methods: In this study, the Pearson correlation coefficient was used for selecting effective soil properties influencing SQIs and these indices were modeled and predicted by the decision tree algorithm with selected input properties. For this purpose, 104 soil samples were collected from the soil surface (0-15 cm depth) of four land uses including a garden with 20 year-old walnut trees, pasture, agriculture and a mountain almond in a semi-arid area in Iran (Rabor region, 29 27′ N to 38 54′ N and 56 45′ E to 57 16′ E). A multiple linear regression (MLR) model was constructed as the benchmark for the comparison of performances. Sensitivity analysis of decision tree model was performed with input variables using StatSoft method. The predictive capabilities of the proposed models were evaluated by the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) between measured and predicted SQIs values.
Results and Discussion: The soil properties including porosity, bulk density, clay and sand content for air capacity, porosity and sand, clay and silt content for relative field capacity, and bulk density, electrical conductivity, porosity, and sand, clay and silt content for plant available water were selected as important input parameters. In addition, the values of r2 for the decision tree model for air capacity, relative field capacity and plant available water were 0.95, 0.84 and 0.85, respectively, while the r2 values for multiple linear regression for AC, RFC and PAWC were 0.63, 0.62 and 0.61, respectively. According to the evaluation indices, it appears that the conventional regression model was poor in predicting SQIs. Therefore, conventional regression techniques (i.e., multiple-linear regression) may not be reliable for predicting the SQIs. The results of sensitivity analysis for decision tree model showed that porosity and bulk density for air capacity, porosity for relative field capacity and bulk density for plant available water had the greatest influence.
Conclusion: This research work provided a basis for predicting soil physical quality indicators and identifying important parameters impacting these indicators in agricultural soils, grassland and forests in semi-arid regions which can be generalized to other areas. Further studies are needed to assess the effects of selected input variables under different conditions.