Numerical Estimation of Drinking Water Quality Index Using Tree Methods and Combined Wavelet Approaches and Principal Component Analysis

Document Type : Research Article

Authors

Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Abstract

Introduction
Surface and underground waters are one of the world's most important problems and environmental concerns. In the last few decades, due to the rapid growth of the population, the water needs have increased, followed by the input load to the water. In order to classify the quality of underground water and water level according to the type of consumption, there are many methods, one of the most used methods is the use of quality indicators. Considering the facilities available in water quality monitoring stations and the need to save time and money, using alternative methods of modern data mining methods can be good for predicting and classifying water quality. The process of water extraction for domestic use, agricultural production, mineral industrial production, electricity production, and ester methods can lead to the deterioration of water quality and quantity, which affects the aquatic ecosystem, that is, the set of organisms that live and interact. Therefore, it is very important to evaluate the quality of surface water in water-environmental management and in monitoring the concentration of pollutants in rivers. The aim of the current research was to estimate the numerical values of the drinking water quality index (WQI) using the tree method and investigate the effect of wavelet transformation, the Bagging method, and principal component analysis.
Materials and Methods
In this research, to calculate the WQI index from the quality parameters of the Bagh Kalaye hydrometric station including total hardness (TH), alkalinity (pH), electrical conductivity (EC), total dissolved solids (TDS), calcium (Ca), sodium (Na), Magnesium (Mg), potassium (K), chlorine (Cl), carbonate (CO3), bicarbonate (HCO3) and sulfate (SO4) were used in the statistical period of 23 years (1998-2020). Quantitative values calculated with the WQI index were considered as target outputs. By using the relief and correlation method, the types of input combinations were determined. The random tree method was used to estimate the numerical values of the WQI index. Then, the capability of the combined approach of wavelet, principal component analysis, and Bagging method with random tree base algorithm was evaluated. To compare the values obtained from the data mining methods with the values calculated from the WQI index, the evaluation criteria of correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and modified Wilmot coefficient (Dr) were used.
Results and Discussion
The use of the wavelet transform method and the Bagging method has improved the modeling results. Considering that the Bagging classification method with the random tree base algorithm is a combination of the results of several random trees, so using this method has increased the accuracy of the RT model. So, in general, it was concluded that the use of wavelet transformation and classification methods increases accuracy and reduces errors. The best scenario with the highest accuracy and the lowest error was related to scenario 10 of the W-B-RT model with Total Hardness, Electrical Conductivity, Total Dissolved Solid, Sulphate, Calcium, Bicarbonate, Magnesium, Chlorine, Sodium, and potassium parameters. The results showed that the effect impact of pH in estimating the numerical value of the WQI index is considered lower than other parameters. When the principal component analysis method was used, by reducing the value of the eigenvalue from F1 to F12, the value of the factor also decreased; As a result,so F1, F2, and F3 factors were selected as the basic components. Considering 3 main factors, modeling was done employed and R=0.98, RMSE=2.17, MAE=1.52, and Dr=0.97 were obtained. In general, the results showed that the PCA method, despite reducing the dimension of the input vectors and simplifying it, can improve the accuracy and speed of the model and is introduced as the best method for estimating the numerical value of the WQI index.
Conclusion
The results obtained from the present research showed that the use of wavelet transform, Bagging and PCA methods had a positive effect on improving the results and increasing higherthe accuracy. In estimating the numerical values of WQI index, PCA-B-RT method considering 3 main factors, with correlation coefficient equal to 0.98, root mean square error equal to 2.17, average absolute value error equal to 1.52 and tThe modified Wilmot coefficient equal to 0.97 had the highest accuracy. Considering that all the methods used in the estimation of quantitative values had acceptable accuracy, therefore, in case of lack of data and lack of access to all chemical parameters, it is possible to obtain appropriate and acceptable results by using a limited number of parameters and data mining methods achieved.

Keywords

Main Subjects


  1. Abdi, H., & Williams, L.J. (2010). Principal component analysis, Wiley interdisciplinary reviews: Computational Statistics 2(4): 433-459.
  2. Ajayram, K.A., Jegadeeshwaran, R., Sakthivel, G., Sivakumar, R., & Patange, A.D. (2021). Condition monitoring of carbide and non-carbide coated tool insert using decision tree and random tree – A statistical learning. Materials Today. https://doi.org/10.1016/j.matpr.2021.02.065.
  3. Al-Mukhtar,, & Al-Yaseen, F. (2019). Modeling water quality parameters using data-driven models, a case study Abu-Ziriq Marsh in South of Iraq. Hydrology 6(24). https://doi.org/10.3390/hydrology6010024.
  4. Batur, E., & Maktav, D. (2019). Assessment of surface water quality by using satellite images fusion based on PCA method in the Lake Gala, Turkey. IEEE Transactions on Geoscience and Remote Sensing 57(5): 2983–2989. http://doi.org/10.1109/TGRS.2018.2879024.
  5. Breiman, L. (1996). Bagging predictors. Machine Learning 24: 123–140.
  6. Chen, K., Chen, H., Zhou, C., Huang, Y., Qi, X., Shen, R., & Ren, H. (2020). Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Researchhttp://doi.org/10.1016/j.watres.2019.115454.
  7. Denil, M., Matheson, D., & de Freitas, N. (2014). Narrowing the Gap: Random Forests in Theory and in Practice. Proceedings of the 31st International Conference on Machine Learning, Beijing, China. JMLR: W and P. Vol.32. 9 pages.
  8. Hameed, M., Shargi, S., Yaseen, Z., Afan, H., Hussain, A., & Elshafie A. (2017). Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in a tropical region, Malaysia. Neural Computing and Applications 28: 893-905. https://doi.org/10.1007/s00521-016-2404-7.
  9. Haar, A. (1910). The theory of orthogonal function systems. Mathematical Annals 69(3): 331-371. http://doi.org/10.1007/BF01456326.
  10. Hosseini, H., Shakeri, A., Rezaei, M., Dashti Barmaki, M., & Shahraki, M. (2019). Application of water quality index (WQI) and hydro-geochemistry for surface water quality assessment, Chahnimeh reservoirs in the Sistan and Baluchestan Province. Iranian Journal of Health and Environment 11(4): 575-586.
  11. Karbasi, M., & Dindar, S. (2019). Comparison of wavelet-MLP and wavelet-GMDH models in forecasting EC and SAR at Zayandeh-Rood River. Environmental Sciences 16(4): 135-152. (In Persian with English abstract)
  12. Kavita, D., & Jagdish, S. (2012). Water resources management and water quality, case of Bhopa l‰, International Conference on Chemical, Ecology and Environmental Sciences (ICEES'2012) 17-18march, Bangkok.
  13. Khalil, B., Ouarda, T., & St-Hilaire, A. (2011). Estimation of water quality characteristics at ungauged sites using artificial neural networks and canonical correlation analysis. Journal of Hydrology 405: 277–287.
  14. Kheirabadi, Kh., Fayazi, J., Roshanfekr, H., & Abdollahi-Arpanahi, R. (2017). Evaluation of the effectiveness of bootstrap aggregating sampling technique in the accuracy of the genomic best linear unbiased prediction method. Iranian Journal of Animal Science 48(4): 573-584. http://doi.org/10.22059/ijas.2018.248547.653596.
  15. Khoi, D.N., Quan, N.T., Linh, D.Q., Nhi, P.T.T., & Thuy, N.T.D. (2022). Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam. Water https://doi.org/10.3390/w14101552.
  16. Kira, K., & Rendell, L.A. (1992). The feature selection problem: traditional methods and a new algorithm. AAAI-92 Proceedings of the tenth national conference on Artificial intelligence, Menlo Park, California. 129-134.
  17. Kolli,, & Seshadri, R. (2013). Ground water quality assessment using data mining techniques. International Journal of Computer Applications 76(15): 39-45.
  18. Lau, K.M., & Weng, H.Y. (1995). Climate signal detection using wavelet transform, How to make time-series sing, Bulletin of the American Meteorological Society 76: 2391-2402.
  19. Mat Nawi, N., Chen, G., Jensen, T., & Abdanan Mehdizadeh, S. (2013). Prediction and classification of sugarcane Brix based on skin scanning using visible and shortwave near infrared. Biosystems Engineering 115(2): 154–161.
  20. Nihalani, S.M., & Meeruty, A. (2020). Water quality index evaluation for major rivers in Gujarat. Environmental Science and Pollution Research 28: 63523–63531. http://doi.org/10.1007/s11356-020-10509-5.
  21. Othman, F., Alaaeldin, M., Seyam, M., Ahmed, A., Teo, F., Ming, Fai, Ch., Afan, H., Sherif, M., Sefelnasr, A., & Shafie, A. (2020). Efficient river water quality index prediction considering a minimal number of inputs variables. Engineering Applications of Computational Fluid Mechanics 14(1): 751-763. https://doi.org/10.1080/19942060.2020.1760942.
  22. Sattari, M.T., Mirabbasi, R., & Abbasgholi, M. (2017). The use of data mining in predicting the quality of surface water (case study: the rivers of the northern slopes of Sahand). Ecohydrology 4(2): 407-419. (In Persian)
  23. Singh, D.F. (1992). Studies on the water quality index of some major rivers of Pune, Maharashtra. Proceedings Academy Environmental Biology 1: 61–66.
  24. Soleimanpour, S.M., Mesbah, S.H., & Hedayati, B. (2018). Application of CART decision tree data mining to determine the most effective drinking water quality factors (case study: Kazeroon plain, Fars province). Iranian Journal of Health and Environment 11(1): 1-14. (In Persian with English abstract)
  25. Solgi, A., Pourhaghi, A., Zarei, H., & Ansari, H. (2017). Modeling and forecast biological oxygen demand (BOD) using combination support vector machine with wavelet transform. Journal of Water and Soil 31(1): 86-100.
  26. Trabelsi,, & Hadj Ali, S. (2022). Exploring machine learning models in predicting irrigation groundwater quality indices for effective decision making in Medjerda River Basin, Tunisia. Sustainability 14. https://doi.org/10.3390/su14042341.
  27. Vishwanath, V., Mahesh Kumar, N., & Wakif, A. (2021). Haar wavelet scrutinization of heat and mass transfer features during the convective boundary layer flow of a nanofluid moving over a nonlinearly stretching sheet. Partial Differential Equations in Applied Mathematics 4, https://doi.org/10.1016/j.padiff.2021.100192.
CAPTCHA Image
Volume 36, Issue 6 - Serial Number 86
January and February 2023
Pages 695-709
  • Receive Date: 26 August 2022
  • Revise Date: 08 November 2022
  • Accept Date: 28 November 2022
  • First Publish Date: 28 November 2022