Water and Soil

Soil science

Evaluation of Regression and Intelligent Models for Estimating Mean Weight Diameter of Wet Aggregates

Sh. Asghari; K. Heidari; M. Hasanpour Kashani; H. Shahab Arkhazloo

Volume 38, Issue 6 , January and February 2024, , Pages 764-749

https://doi.org/10.22067/jsw.2025.91071.1454

Abstract

Introduction The study of soil mean weight diameter (MWD) of wet aggregates that is important for sustainable soil management, has recently received much attention. As the prediction of MWD is challenging, laborious, and time-consuming, there is a crucial need to develop a predictive estimation ... Read More Introduction The study of soil mean weight diameter (MWD) of wet aggregates that is important for sustainable soil management, has recently received much attention. As the prediction of MWD is challenging, laborious, and time-consuming, there is a crucial need to develop a predictive estimation method to generate helpful information required for the soil health assessment to save time and cost involved in soil analysis. Therefore, it is useful to use different models such as multiple linear regression (MLR) and intelligent models including artificial neural network (ANN) and gene expression programming (GEP) to estimate MWD of wet aggregates through easily accessible and low-cost soil properties. The objectives of this study were (1) to creating MLR, ANN and GEP models for predicting MWD from the easily measurable soil variables in forest, range and cultivated lands of the Fandoghloo region of Ardabil province, (2) to compare the precision of the mentioned models in the prediction of MWD of wet aggregates using the coefficient of determination (R2), root mean square error (RMSE), mean error (ME) and Nash-Sutcliffe coefficient (NS) criteria. Materials and Methods Disturbed and undisturbed soil samples (n= 80) were nearly systematically taken from 0-10 cm depth with nearly 50 m distance in forest (n= 20), range (n= 23) and cultivated (n= 37) lands of the Fandoghloo region of Ardabil province, Iran (lat. 38° 24' 10" to 38° 24' 25" N, long. 48° 32' 45" to 48° 33' 5" E) in summer 2023. The contents of sand, silt, clay, CaCO3, pH, EC, bulk (BD) and particle (PD) density, organic carbon (OC), geometric mean diameter (GMD) of dry aggregates were determined in the laboratory using standard methods. Total porosity (n) was calculated using BD and PD data (n= 1-BD/PD). The mean geometric diameter (dg) and geometric standard deviation (σg) of soil particles were computed by sand, silt and clay percentages. The mean weight diameter (MWD) of wet aggregates was measured in the aggregates smaller than 4.75 mm by wet sieving equipment using sieves with 2, 1, 0.5, 0.25 and 0.106 mm pore diameter. All data were randomly divided into two series as 60 data for training and 20 data for testing of models. The SPSS 22 software with the stepwise method, MATLAB and Gene Xpro Tools 4.0 software were used to derive multiple linear regression (MLR), artificial neural network (ANN) and gene expression programming (GEP) models, respectively. A feed forward three-layer (9, 8, 6 and 6 neurons in the hidden layer) perceptron network and the tangent sigmoid transfer function were used for the ANN modeling. A set of optimal parameters were chosen before developing the best GEP model. The number of chromosomes and genes, head size and linking function were selected by the trial and error method, and they are 30, 3, 8, and +, respectively. The rates of genetic operators were chosen according to literature studies. The precision of MLR, ANN and GEP models in predicting MWD of wet aggregates were evaluated by the coefficient of determination (R2), root mean square error (RMSE), mean error (ME) and Nash-Sutcliffe coefficient (NS) statistics. Results and Discussion The values of sand (13.14 to 64.79 %), silt (21.11 to 74.96 %), clay (3 to 42.18 %), OC (1.01 to 7.17 %), PD (2.00 to 2.67 g cm-3), n (0.39 to 0.87 cm3 cm-3), GMD of dry aggregates (0.8 to 1.33 mm) and MWD of wet aggregates (0.35 to 2.65 mm) showed good variations in the soils of the studied region. The studied soils had clay loam (n= 11), sandy clay loam (n= 6), sandy loam (n= 12), loam (n= 13), silty clay loam (n= 14), silty clay (n= 1) and silt loam (n= 23) textural classes. There were found significant correlations between MWD with OC (r= 0.67**), sand (r= 0.70**), GMD (r= 0.30**) and PD (r= -0.46**). Also, significant and positive correlation was found between OC and sand (r= 0.59**). Due to the multicollinearity of sand with dg (r= 0.87**), we did not use the dg as an input variable to estimate MWD of wet aggregates. Generally, four MLR, ANN and GEP models were constructed to predict MWD of wet aggregates from measured readily available soil variables. The results of MLR, ANN and GEP models indicated that the most suitable variables to estimate MWD of wet aggregates were sand, OC and GMD of dry aggregates. The values of R2, RMSE, ME and NS criteria were obtained equal 0.52, 0.48 mm, 0.13 mm and 0.48, and 0.85, 0.30 mm, 0.03 mm and 0.78, 0.79, 0.35 mm, -0.10 mm, 0.95 for the best MLR, ANN and GEP models in the testing data set, respectively. Many researchers also reported that there is a positive and significant correlation between MWD of wet aggregates and OC. Conclusion The results showed that sand, OC and GMD of dry aggregates were the most important and readily available soil variables to predict the mean weight diameter (MWD) of wet aggregates in the Fandoghloo region of Ardabil province. According to the lowest values of RMSE and the highest values of R2 and NS, the precision of ANN models to predict MWD of wet aggregates was more than MLR and GEP models in this study. Because ANN is more flexible and effectively captures non-linear relationships, it performed better than the other models in predicting MWD.

Evaluation of the Efficiency of Data Preprocessing Methods on Improving the Performance of Gene Expression Programming Model (Case Study: Ab Zal River)

F. Ahmadi

Volume 35, Issue 2 , May and June 2021, , Pages 153-165

https://doi.org/10.22067/jsw.2021.14975.0

Abstract

Introduction: Surface water has always been one of the most essential pillars of water projects and, with modeling and predicting the river flow, in addition to the management and utilization of water resources, it is possible to inhibit the natural disasters such as drought and floods. Therefore, researchers ... Read More Introduction: Surface water has always been one of the most essential pillars of water projects and, with modeling and predicting the river flow, in addition to the management and utilization of water resources, it is possible to inhibit the natural disasters such as drought and floods. Therefore, researchers have always tried to improve the accuracy of hydrological parameters estimation by using new tools and combining them. In this study, the effect of seasonal coefficients and mathematical methods of signal analysis and signal processing on wavelet transform to improve the performance of the Gene Expression Programming (GEP) model were discussed. Materials and Methods: In the present study, for the prediction of the monthly flow of Ab Zal River, the information of Pol Zal hydrometric station in period 1972 to 2017 was used. In the next step, different input patterns need to be ready. To this purpose, the data are presented in three different modes: (a) the use of flow data and considering the role of memory up to four delays; (b) the involvement of the periodic term in both linear (?-GEP) and nonlinear (PT-GEP) states, and (c): data analysis using the Haar wavelet, Daubechies 4 (db4), Symlet (sym), Meyer (mey), and Coiflet (coif), was done in two subscales, prepared, and introduced to the GEP model. To better analyze the effect of mathematical functions used in the GEP method, two linear modes (using Boolean functions including addition, multiplication, division, and minus) and nonlinear (including quadratic functions, etc.) were considered. The wavelet transform is a powerful tool in decomposing and reconstructing the original time series. Wavelet function is a type of function that has an oscillating property and can be quickly attenuated to zero. Modeling was done based on 80% of recorded data (432 months) and the validation was done based on the remaining 20% (108 months). To evaluate the performance of each of models, statistical indices such as mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) were used. Results and Dissection: The results of linear and nonlinear GEP models showed that in both cases, the four-delay model achieved the most accuracy in river flow prediction. Still the performance of nonlinear GEP model according to RMSE (4.093 (m3/s)), MAE (2.782 (m3/s)) and R (0.660) were better than another, respectively. In the next step, the periodic term was added to the model inputs. Based on the results, the PT-GEP model with M4 pattern had the lowest error, the highest accuracy and was able to reduce the RMSE index by 8%. Then, in the third step, the river flow data were divided into approximate subdivisions and details using five wavelet functions. The most appropriate level of analysis based on the number of data was considered as number three. The results of the W-GEP modes showed an excellent performance of this method so that the model was able to reduce the RMSE statistics with 48.6%, 41.2%, and 31.1% compared to the L-GEP, NL-GEP and PT-GEP methods, respectively. Also, the best performance of the W-GEP model with the Symlet wavelet and the decomposition level of one had the highest accuracy (R=0.847) and the lowest error (RMSE =2.898 (m3/s) and MAE =1.745 (m3/s) among all models (35 models) such as linear and nonlinear, seasonal and non-seasonal and wavelet hybrid models. Conclusion: Based on the results, it can be concluded that the overall use of data preprocessing methods (including seasonal coefficients and wavelet functions) has improved the performance of the GEP model. However, the combination of wavelet functions with the GEP model has significantly increased the accuracy of the modeling. Therefore, it is recommended as the most suitable tool for river flow forecasting.

Intelligent Models Performance Improvement Based on Wavelet Algorithm and Logarithmic Transformations in Suspended Sediment Estimation

Reza Hajiabadi; S. Farzin; Y. Hassanzadeh

Volume 30, Issue 1 , March and April 2016, , Pages 112-124

https://doi.org/10.22067/jsw.v30i1.37635

Abstract

Introduction One reason for the complexity of hydrological phenomena prediction, especially time series is existence of features such as trend, noise and high-frequency oscillations. These complex features, especially noise, can be detected or removed by preprocessing. Appropriate preprocessing causes ... Read More Introduction One reason for the complexity of hydrological phenomena prediction, especially time series is existence of features such as trend, noise and high-frequency oscillations. These complex features, especially noise, can be detected or removed by preprocessing. Appropriate preprocessing causes estimation of these phenomena become easier. Preprocessing in the data driven models such as artificial neural network, gene expression programming, support vector machine, is more effective because the quality of data in these models is important. Present study, by considering diagnosing and data transformation as two different preprocessing, tries to improve the results of intelligent models. In this study two different intelligent models, Artificial Neural Network and Gene Expression Programming, are applied to estimation of daily suspended sediment load. Wavelet transforms and logarithmic transformation is used for diagnosing and data transformation, respectively. Finally, the impacts of preprocessing on the results of intelligent models are evaluated. Materials and Methods In this study, Gene Expression Programming and Artificial Neural Network are used as intelligent models for suspended sediment load estimation, then the impacts of diagnosing and logarithmic transformations approaches as data preprocessor are evaluated and compared to the result improvement. Two different logarithmic transforms are considered in this research, LN and LOG. Wavelet transformation is used to time series denoising. In order to denoising by wavelet transforms, first, time series can be decomposed at one level (Approximation part and detail part) and second, high-frequency part (detail) will be removed as noise. According to the ability of gene expression programming and artificial neural network to analysis nonlinear systems; daily values of suspended sediment load of the Skunk River in USA, during a 5-year period, are investigated and then estimated.4 years of data are applied to models training and one year is estimated by each model. Accuracy of models is evaluated by three indexes. These three indexes are mean absolute error (MAE), root mean squared error (RMSE) and Nash-Sutcliffecoefficient (NS). Results and Discussion In order to suspended sediment load estimation by intelligent models, different input combination for model training evaluated. Then the best combination of input for each intelligent model is determined and preprocessing is done only for the best combination. Two logarithmic transforms, LN and LOG, considered to data transformation. Daubechies wavelet family is used as wavelet transforms. Results indicate that diagnosing causes Nash Sutcliffe criteria in ANN and GEPincreases 0.15 and 0.14, respectively. Furthermore, RMSE value has been reduced from 199.24 to 141.17 (mg/lit) in ANN and from 234.84 to 193.89 (mg/lit) in GEP. The impact of the logarithmic transformation approach on the ANN result improvement is similar to diagnosing approach. While the logarithmic transformation approach has an adverse impact on GEP. Nash Sutcliffe criteria, after Ln and Log transformations as preprocessing in GEP model, has been reduced from 0.57 to 0.31 and 0.21, respectively, and RMSE value increases from 234.84 to 298.41 (mg/lit) and 318.72 (mg/lit) respectively. Results show that data denoising by wavelet transform is effective for improvement of two intelligent model accuracy, while data transformation by logarithmic transformation causes improvement only in artificial neural network. Results of the ANN model reveal that data transformation by LN transfer is better than LOG transfer, however both transfer function cause improvement in ANN results. Also denoising by different wavelet transforms (Daubechies family) indicates that in ANN models the wavelet function Db2 is more effective and causes more improvement while on GEP models the wavelet function Db1 (Harr) is better. Conclusions: In the present study, two different intelligent models, Gene Expression Programming and Artificial Neural Network, have been considered to estimation of daily suspended sediment load in the Skunk river in the USA. Also, two different procedures, denoising and data transformation have been used as preprocessing to improve results of intelligent models. Wavelet transforms are used for diagnosing and logarithmic transformations are used for data transformation. The results of this research indicate that data denoising by wavelet transforms is effective for improvement of two intelligent model accuracy, while data transformation by logarithmic transformation causes improvement only in artificial neural network. Data transformation by logarithmic transforms not only does not improve results of GEP model, but also reduces GEP accuracy.

Articles in Press

Current Issue

Volume 39 (2025)

Volume 38 (2024)

Volume 37 (2023)

Volume 36 (2022)

Volume 35 (2021)

Volume 34 (2020)

Volume 33 (2019)

Volume 32 (2018)

Volume 31 (2017)

Volume 30 (2016)

Volume 29 (2015)

Volume 28 (2014)

Volume 27 (2013)

Volume 26 (2012)

Volume 25 (2011)

Volume 24 (2010)

Volume 23 (2009)

Volume 22 (2008)

Keywords = برنامه‌ریزی بیان ژن

Evaluation of Regression and Intelligent Models for Estimating Mean Weight Diameter of Wet Aggregates

Abstract

Evaluation of the Efficiency of Data Preprocessing Methods on Improving the Performance of Gene Expression Programming Model (Case Study: Ab Zal River)

Abstract

Intelligent Models Performance Improvement Based on Wavelet Algorithm and Logarithmic Transformations in Suspended Sediment Estimation

Abstract