Water and Soil

Irrigation

Numerical Estimation of Drinking Water Quality Index Using Tree Methods and Combined Wavelet Approaches and Principal Component Analysis

M.T. Sattari; S. Javidan

Volume 36, Issue 6 , January and February 2023, , Pages 695-709

https://doi.org/10.22067/jsw.2022.78452.1196

Abstract

Introduction Surface and underground waters are one of the world's most important problems and environmental concerns. In the last few decades, due to the rapid growth of the population, the water needs have increased, followed by the input load to the water. In order to classify the quality of underground ... Read More Introduction Surface and underground waters are one of the world's most important problems and environmental concerns. In the last few decades, due to the rapid growth of the population, the water needs have increased, followed by the input load to the water. In order to classify the quality of underground water and water level according to the type of consumption, there are many methods, one of the most used methods is the use of quality indicators. Considering the facilities available in water quality monitoring stations and the need to save time and money, using alternative methods of modern data mining methods can be good for predicting and classifying water quality. The process of water extraction for domestic use, agricultural production, mineral industrial production, electricity production, and ester methods can lead to the deterioration of water quality and quantity, which affects the aquatic ecosystem, that is, the set of organisms that live and interact. Therefore, it is very important to evaluate the quality of surface water in water-environmental management and in monitoring the concentration of pollutants in rivers. The aim of the current research was to estimate the numerical values of the drinking water quality index (WQI) using the tree method and investigate the effect of wavelet transformation, the Bagging method, and principal component analysis. Materials and Methods In this research, to calculate the WQI index from the quality parameters of the Bagh Kalaye hydrometric station including total hardness (TH), alkalinity (pH), electrical conductivity (EC), total dissolved solids (TDS), calcium (Ca), sodium (Na), Magnesium (Mg), potassium (K), chlorine (Cl), carbonate (CO3), bicarbonate (HCO3) and sulfate (SO4) were used in the statistical period of 23 years (1998-2020). Quantitative values calculated with the WQI index were considered as target outputs. By using the relief and correlation method, the types of input combinations were determined. The random tree method was used to estimate the numerical values of the WQI index. Then, the capability of the combined approach of wavelet, principal component analysis, and Bagging method with random tree base algorithm was evaluated. To compare the values obtained from the data mining methods with the values calculated from the WQI index, the evaluation criteria of correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and modified Wilmot coefficient (Dr) were used. Results and Discussion The use of the wavelet transform method and the Bagging method has improved the modeling results. Considering that the Bagging classification method with the random tree base algorithm is a combination of the results of several random trees, so using this method has increased the accuracy of the RT model. So, in general, it was concluded that the use of wavelet transformation and classification methods increases accuracy and reduces errors. The best scenario with the highest accuracy and the lowest error was related to scenario 10 of the W-B-RT model with Total Hardness, Electrical Conductivity, Total Dissolved Solid, Sulphate, Calcium, Bicarbonate, Magnesium, Chlorine, Sodium, and potassium parameters. The results showed that the effect impact of pH in estimating the numerical value of the WQI index is considered lower than other parameters. When the principal component analysis method was used, by reducing the value of the eigenvalue from F1 to F12, the value of the factor also decreased; As a result,so F1, F2, and F3 factors were selected as the basic components. Considering 3 main factors, modeling was done employed and R=0.98, RMSE=2.17, MAE=1.52, and Dr=0.97 were obtained. In general, the results showed that the PCA method, despite reducing the dimension of the input vectors and simplifying it, can improve the accuracy and speed of the model and is introduced as the best method for estimating the numerical value of the WQI index. Conclusion The results obtained from the present research showed that the use of wavelet transform, Bagging and PCA methods had a positive effect on improving the results and increasing higherthe accuracy. In estimating the numerical values of WQI index, PCA-B-RT method considering 3 main factors, with correlation coefficient equal to 0.98, root mean square error equal to 2.17, average absolute value error equal to 1.52 and tThe modified Wilmot coefficient equal to 0.97 had the highest accuracy. Considering that all the methods used in the estimation of quantitative values had acceptable accuracy, therefore, in case of lack of data and lack of access to all chemical parameters, it is possible to obtain appropriate and acceptable results by using a limited number of parameters and data mining methods achieved.

Agricultural Meteorology

Tabriz Daily Rainfalls Modeling via Hybridized Tree Based and Seasonal-Trend Component Bagging Method

S. Javidan; M.T. Sattari; Sh. Mohsenzadeh

Volume 36, Issue 3 , July and August 2022, , Pages 407-420

https://doi.org/10.22067/jsw.2022.76512.1161

Abstract

IntroductionPrecipitation is one of the most important components of water cycle. Accurate precipitation measurement is essential for flood forecasting and control, drought analysis, runoff modeling, sediment control and management, watershed management, agricultural irrigation planning, and water quality ... Read More IntroductionPrecipitation is one of the most important components of water cycle. Accurate precipitation measurement is essential for flood forecasting and control, drought analysis, runoff modeling, sediment control and management, watershed management, agricultural irrigation planning, and water quality studies. Determining the correct amount of precipitation in cities and rural areas is also important for managing floods. The precipitation process is completely non-linear and involves randomness in terms of time and space. Therefore, it is not easy to explain that with simple linear models due to various climatic factors and may contain major errors. Therefore, various methods and models have been proposed to evaluate, and predict precipitation. This study aimed to estimate the daily precipitation of Tabriz based on hybridized tree-based and Bagging methods by using neighboring stations.Materials and MethodsIn the present study, the rainfall data of adjacent stations in Urmia lake basin (Sahand, Sarab, Urmia, Maragheh and Mahabad) were employed in 1986-2021 to estimate the daily rainfall in Tabriz. About 70% of data were considered for calibration and 30% of data were applied for validation. Using the correlation matrix and Relief algorithm, various input components were identified. Modeling was performed using tree-based data mining methods including M5P, RT and REPT and Bagging method. The daily precipitations of Tabriz was decomposed into their components by seasonal-trend analysis method. Its components, including trend, seasonal and residual, were used in different input scenarios to investigate the effect of these components on improving the modeling results. To evaluate the modeling performance, the indices of correlation coefficient, Root Mean Square Error, Nash-Sutcliffe Efficiency and modified Wilmot coefficient were applied.Results and DiscussionRT and REPT methods increased the accuracy of the model and decreased its error when they were used as the basic algorithm of the Bagging method. This was not the case with the M5P method, as the results were slightly weaker. It was also observed that Tabriz rainfall is largely influenced by Sahand rainfall, as the most models gave reliable estimates by using the rainfall data for Sahand station. This can be explained by the high correlation between Tabriz rainfall and Sahand. The results showed that the first scenario (Sahand) for M5P, RT, REPT and B-M5P method, the fifth scenario (Sahand, Sarab, Urmia, Maragheh and Mahabad) for the B-RT method, and the fourth scenario (Sahand, Sarab, Urmia and Mahabad) for the B-REPT method were the best scenarios. The best performance was found for the scenario 1 of the M5P decision tree model, followed by the Bagging method with the M5P base algorithm. In general, it was concluded that application of the Bagging method produced reliable results. Modeling without considering the decomposition components was compared with modeling with decomposition components. Adding seasonal, trend and residual components to the modeling input combinations significantly improved the accuracy of the results. Application of Bagging method in most cases also increased the modeling accuracy. The first scenario (Sahand and residual) for M5P and B-M5P methods, the tenth scenario (residual, trend, seasonal, Sahand and Sarab) for RT, REPT and B-REPT methods, and the eighth scenario (residual, trend and Sahand) for B-RT method were selected as the best scenarios. As a result, among the stations, Sahand, due to proximity and high correlation, and Sarab, due to greater correlation, had a great impact on precipitation in Tabriz. In general, the Bagging method with the basic M5P algorithm (B-M5P) was best suited in the first scenario. Thus, adding precipitation analysis components and using the Bagging method improve the modeling results with tree-based data mining methods.ConclusionOur results showed that Bagging method provided acceptable results in most cases. In the first case, the first scenario of M5P method including Sahand precipitation data was selected as the superior method and scenario. As a result, Sahand was the most effective station in estimating Tabriz rainfall with the highest correlation and the shortest distance from Tabriz. In the second case, with the decomposition components, the accuracy of the results increased significantly. The Bagging method with the basic M5P algorithm, the parameters of Sahand precipitation and the residual of Tabriz precipitation was considered as the best modeling algorithm. It can be concluded that using Bagging method and decomposition components with the closest station to the studied station results in the highest accuracy. Therefore, Bagging models with tree-based algorithm can be considered as simple and widely used methods.

Articles in Press

Current Issue

Volume 39 (2025)

Volume 38 (2024)

Volume 37 (2023)

Volume 36 (2022)

Volume 35 (2021)

Volume 34 (2020)

Volume 33 (2019)

Volume 32 (2018)

Volume 31 (2017)

Volume 30 (2016)

Volume 29 (2015)

Volume 28 (2014)

Volume 27 (2013)

Volume 26 (2012)

Volume 25 (2011)

Volume 24 (2010)

Volume 23 (2009)

Volume 22 (2008)

Author = Javidan, S.

Numerical Estimation of Drinking Water Quality Index Using Tree Methods and Combined Wavelet Approaches and Principal Component Analysis

Abstract

Tabriz Daily Rainfalls Modeling via Hybridized Tree Based and Seasonal-Trend Component Bagging Method

Abstract