الگو بندی متغیرهای کیفی آب با روش داده مبنا در رودخانه سفیدرود

نوع مقاله : مقالات پژوهشی

نویسندگان

دانشگاه تهران

چکیده

افزایش برداشت از منابع آب سطحی، به عنوان در دسترس ترین منبع آب و افزایش تخلیه پساب ها به این منابع منجر به کاهش کیفیت آب های سطحی شده است. لذا پایش و الگو بندی کیفیت منابع سطحی بیش از پیش احساس می شود. از آنجایی که روش های داده مبنا توانایی بالایی در الگوبندی دارند، در این تحقیق ابتدا با دو روش پیش پردازش ضریب همبستگی و تجزیه مولفه اصلی (PCA)، ورودی های روش رگرسیون بردار پشتیبان حداقل مربعات (LSSVR) تعیین و سپس الگوریتم ژنتیک-رگرسیون بردار پشتیبان حداقل مربعات (GA-LSSVR) توسعه داده شد که توانایی تنظیم خودکار و بهینه ضرایب روش LSSVR را دارد. الگوریتم GA-LSSVR برای الگوبندی متغیرهای کیفی Na+، K+، Mg2+، So42-، Cl-، pH، هدایت الکتریکی (EC) و مجموع باقی مانده خشک (TDS) رودخانه سفیدرود برای طول دوره آماری 20 سال (1364-1384) به کار گرفته شد. نتایج الگوبندی با الگوریتم GA-LSSVR و روش های پیش پردازش ضریب همبستگی و PCA نشان داد که مقادیر ضریب تشخیص (R2) متغیرهای کیفی EC، TDS، Cl-و Na+ به ترتیب 98/0، 98/0، 97/0 و 94/0 می باشد که الگوبندی این متغیرها نسبت به دیگر متغیرهای کیفی از دقت بیش تری برخوردار است. در مجموع با توجه به مثبت بودن مقادیر نش-سایتکلیف (NS) هر دو روش از قابلیت بالایی برای انتخاب ورودی-های الگو برخوردار هستند.

کلیدواژه‌ها


عنوان مقاله [English]

Modeling Water Quality Parameters Using Data-driven Methods

نویسندگان [English]

  • Shima Soleimani
  • Omid Bozorg Haddad
  • Mojtaba Moravej
University of Tehran
چکیده [English]

Introduction: Surface water bodies are the most easily available water resources. Increase use and waste water withdrawal of surface water causes drastic changes in surface water quality. Water quality, importance as the most vulnerable and important water supply resources is absolutely clear. Unfortunately, in the recent years because of city population increase, economical improvement, and industrial product increase, entry of pollutants to water bodies has been increased. According to that water quality parameters express physical, chemical, and biological water features. So the importance of water quality monitoring is necessary more than before. Each of various uses of water, such as agriculture, drinking, industry, and aquaculture needs the water with a special quality. In the other hand, the exact estimation of concentration of water quality parameter is significant.
Material and Methods: In this research, first two input variable models as selection methods (namely, correlation coefficient and principal component analysis) were applied to select the model inputs. Data processing is consisting of three steps, (1) data considering, (2) identification of input data which have efficient on output data, and (3) selecting the training and testing data. Genetic Algorithm-Least Square Support Vector Regression (GA-LSSVR) algorithm were developed to model the water quality parameters. In the LSSVR method is assumed that the relationship between input and output variables is nonlinear, but by using a nonlinear mapping relation can create a space which is named feature space in which relationship between input and output variables is defined linear. The developed algorithm is able to gain maximize the accuracy of the LSSVR method with auto LSSVR parameters. Genetic algorithm (GA) is one of evolutionary algorithm which automatically can find the optimum coefficient of Least Square Support Vector Regression (LSSVR). The GA-LSSVR algorithm was employed to model water quality parameters such as Na+, K+, Mg2+, So42-, Cl-, pH, Electric conductivity (EC) and total dissolved solids (TDS) in the Sefidrood River. For comparison the selected input variable methods coefficient of determination (R2), root mean square error (RMSE), and Nash-Sutcliff (NS) are applied.
Results and Discussion: According to Table 5, the results of the GA-LSSVR algorithm by using correlation coefficient and PCA methods approximately show similar results. About pH, EC, and TDS quality parameters, the results of PCA method have, the more accuracy, but the difference of RMSE between the PCA method and correlation coefficient method is not significant. The PCA method cause improvement in NS values to 22 and 0.1 percentages in pH and TDS water quality parameters to the correlation coefficient method, respectively,and NS criteria value for EC water quality parameter did not change in both methods. As a result, according to positive values of NS criteria in both PCA and correlation methods, it is clear that GA-LSSVR has a high ability for modeling of water quality parameters. Because of summation of NS criteria for PCA method is 5.53 and for correlation coefficient is 5.62, we can say that the correlation coefficient method has more applicable as a data processing method, but both methods have a high ability. Orouji et all. (18) used assumed models to model Na+, K+, Mg2+, So42- , Cl- , pH, EC, and TDS by Genetic programming (GP) method. The RMSE criteria of the better models for testing data are 2.1, 0.02, 0.85, 0.93, 2.18, 0.33, 404.15, and 246.15, respectively. For comparison the orouji et al. (18) and table (5), the Results show using the correlation coefficient method as a data processing method can improve the results to 5.5 times. The results indicate the superiority of developingalgorithm increases the modeling accuracy. It is worth mentioning that according to NS criteria both selected inputs variable methods (correlation coefficient and PCA) are capable to model the water quality parameters. Also the result shows that using correlation coefficient method lead to more accurate results than PCA.
Conclusion: In this study, GA algorithm as one of the most applicable optimization algorithms in the different sciences was used to optimize the LSSVR coefficients and Then GA-LSSVR was developed to model the water quality parameters. To comparison data processing methods (correlation coefficient and PCA methods), the input variables of both methods were determined and GA-LSSVR was performed for each of the input variables. To compare the results of the PCA and correlation coefficient methods, some statistics were used. It is worth mentioning that according to NS criteria both input selection methods are capable to model water quality parameters. Also the results show that using correlation coefficient method lead to more accurate results than PCA.

کلیدواژه‌ها [English]

  • GA-LSSVR Algorithm
  • Pearson correlation coefficient
  • PCA method
1- Bozorg Haddad O., Afshar A., and Mariño M.A. 2011. Multireservoir optimisation in discrete and continuous domains, Proceedings of the Institution of Civil Engineers: Water Management, 164(2), 57-72.
2- Bozorg Haddad O., Fallah-Mehdipour E., Mirzaei-Nodoushan F., and Mariño M.A. 2014a. Discussion of A GA-based support vector machine model for the prediction of monthly reservoir storage, Journal of Hydrologic Engineering, DOI: 10.1061.(ASCE)HE.1943-5584.0001086.
3- Bozorg Haddad O., Moravej M., and Loaiciga H. 2014b. Application of the water cycle algorithm to the optimal operation of reservoir systems, Journal of Irrigation and Drainage Engineering, DOI: 10.1061.(ASCE)IR.1943-4774.0000832 ,04014064.
4- Camdevyren H., Demyr N., Kanik A., and Keskyn, S. 2005. Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs, Ecological Modelling, 181 )4), 581-589.
5- Chiu D.Y., and Chen P.J. 2009. Dynamically exploring internal mechanism of stock market by fuzzy-based support vector machines with high dimension input space and genetic algorithm, Expert Systems with Applications, 36(2), 1240-1248.
6- Fallah-Mehdipour E., Bozorg Haddad O., and Mariño M.A. 2013. Prediction and simulation of monthly groundwater levels by genetic programming, Journal of Hydro-Environment Research, 7(4), 253-260.
7- Ghavidel S. Z.Z., and Montaseri M. 2014. Application of different data-driven methods for the prediction of total dissolved solids in the Zarinehroud basin, Stochastic Environmental Research and Risk Assessment, 28(8), 2101-2118.
8- Johnson R.A., and Wichern D.W. 1982. Applied multivariate statistical analysis, Prentice Hall, No 3, Englewood Cliffs, SA.
9- Koza J. R. 1990. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems, Department of Computer Science, Stanford University, 131pp.
10- Liu S., Tai H., Ding Q., Li D., Xu L., and Wei, Y. 2013, A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction, Mathematical and Computer Modelling, 58(3), 458-465.
11- Noori R., Ashrafi Kh., and Ajdarpour A. 2008. Comparison of ANN and PCA based multivariate linear regression applied to predict the daily average concentration of Co: A case study of Tehran, Journal of Physics Earth Space, 34(1), 135-152.
12- Noori R., Karbassi A., and Salman Sabahi, M. 2010. Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction, Journal of Environmental Management, 91(3), 767-771.
13- Noori R., Karbassi A. R., Moghaddamnia A., Han D., Zokaei-Ashtiani M.H., Farokhnia A., and Gousheh M. G. 2011. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401(3), 177-189.
14- Orouji H., Bozorg Haddad O., Fallah-Mehdipour E., and Mariño M.A. 2013. Modeling of water quality parameters using data-driven models, Journal of Environmental Engineering, 139(7), 947-957.
15- Ouyang, Y. 2005. Evaluation of river water quality monitoring stations by principal component analysis, Water Research, 39(12), 2621-2635.
16- Raghavendra N.S., and Deka P.C. 2014. Support vector machine applications in the field of hydrology: A review, Applied Soft Computing, 19, 372-386.
17- Rajaee T., Mirbagheri S.A., Zounemat-Kermani M., and Nourani, V. 2009. Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models, Science of the Total Environment, 407(17), 4916-4927.
18- Singh K.P., Basant N., and Gupta S. 2011. Support vector machines in water quality management, Analytica Chimica Acta, 703(2), 152-162.
19- Soltani F., Kerachian R., and Shirangi E. 2010. Developing operating rules for reservoirs considering the water quality issues: Application of ANFIS-based surrogate models, Expert Systems with Applications, 37(9), 6639-6645.
20- Su J., Wang X., Liang Y., and Chen B. 2013. A GA-based support vector machine model for the prediction of monthly reservoir storage, Journal of Hydrologic Engineering, 19(7), 1430-1437.
21- Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., and Vandewalle J. 2002. Least squares support vector machines, World Scientific Publishing, No. 4, Singapore.
22- Tabachnick B.G., and Fidell, L.S. 2001. Using multivariate statistics, Pearson, No. 2, 963 pp.
23- Tan G., Yan J., Gao C., and Yang, S. 2012. Prediction of water quality time series data based on least squares support vector machine, Procedia Engineering, 31, 1194-1199.
24- Vapnik V.N. 1995. The nature of statistical learning theory, Springer, New York, USA.
25- Wang W.C., Chau K.W., Cheng C.T., and Qiu L. 2009. A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series, Journal of Hydrology, 374(3), 294-306.
26- Yoon H., Jun S.C., Hyun Y., Bae G.O., and Lee K.K. 2011. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer, Journal of Hydrology, 396(1-2), 128-138.
27- Yunrong X., and Liangzhong J. 2009. Water quality prediction using LS-SVM with particle swarm optimization, Second International Workshop on Knowledge Discovery and Data Mining, IEEE 2009, Moscow, Russia, January 23-25.
CAPTCHA Image