Document Type : Research Article
Authors
Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
Abstract
Introduction
Precipitation is one of the most important components of water cycle. Accurate precipitation measurement is essential for flood forecasting and control, drought analysis, runoff modeling, sediment control and management, watershed management, agricultural irrigation planning, and water quality studies. Determining the correct amount of precipitation in cities and rural areas is also important for managing floods. The precipitation process is completely non-linear and involves randomness in terms of time and space. Therefore, it is not easy to explain that with simple linear models due to various climatic factors and may contain major errors. Therefore, various methods and models have been proposed to evaluate, and predict precipitation. This study aimed to estimate the daily precipitation of Tabriz based on hybridized tree-based and Bagging methods by using neighboring stations.
Materials and Methods
In the present study, the rainfall data of adjacent stations in Urmia lake basin (Sahand, Sarab, Urmia, Maragheh and Mahabad) were employed in 1986-2021 to estimate the daily rainfall in Tabriz. About 70% of data were considered for calibration and 30% of data were applied for validation. Using the correlation matrix and Relief algorithm, various input components were identified. Modeling was performed using tree-based data mining methods including M5P, RT and REPT and Bagging method. The daily precipitations of Tabriz was decomposed into their components by seasonal-trend analysis method. Its components, including trend, seasonal and residual, were used in different input scenarios to investigate the effect of these components on improving the modeling results. To evaluate the modeling performance, the indices of correlation coefficient, Root Mean Square Error, Nash-Sutcliffe Efficiency and modified Wilmot coefficient were applied.
Results and Discussion
RT and REPT methods increased the accuracy of the model and decreased its error when they were used as the basic algorithm of the Bagging method. This was not the case with the M5P method, as the results were slightly weaker. It was also observed that Tabriz rainfall is largely influenced by Sahand rainfall, as the most models gave reliable estimates by using the rainfall data for Sahand station. This can be explained by the high correlation between Tabriz rainfall and Sahand. The results showed that the first scenario (Sahand) for M5P, RT, REPT and B-M5P method, the fifth scenario (Sahand, Sarab, Urmia, Maragheh and Mahabad) for the B-RT method, and the fourth scenario (Sahand, Sarab, Urmia and Mahabad) for the B-REPT method were the best scenarios. The best performance was found for the scenario 1 of the M5P decision tree model, followed by the Bagging method with the M5P base algorithm. In general, it was concluded that application of the Bagging method produced reliable results. Modeling without considering the decomposition components was compared with modeling with decomposition components. Adding seasonal, trend and residual components to the modeling input combinations significantly improved the accuracy of the results. Application of Bagging method in most cases also increased the modeling accuracy. The first scenario (Sahand and residual) for M5P and B-M5P methods, the tenth scenario (residual, trend, seasonal, Sahand and Sarab) for RT, REPT and B-REPT methods, and the eighth scenario (residual, trend and Sahand) for B-RT method were selected as the best scenarios. As a result, among the stations, Sahand, due to proximity and high correlation, and Sarab, due to greater correlation, had a great impact on precipitation in Tabriz. In general, the Bagging method with the basic M5P algorithm (B-M5P) was best suited in the first scenario. Thus, adding precipitation analysis components and using the Bagging method improve the modeling results with tree-based data mining methods.
Conclusion
Our results showed that Bagging method provided acceptable results in most cases. In the first case, the first scenario of M5P method including Sahand precipitation data was selected as the superior method and scenario. As a result, Sahand was the most effective station in estimating Tabriz rainfall with the highest correlation and the shortest distance from Tabriz. In the second case, with the decomposition components, the accuracy of the results increased significantly. The Bagging method with the basic M5P algorithm, the parameters of Sahand precipitation and the residual of Tabriz precipitation was considered as the best modeling algorithm. It can be concluded that using Bagging method and decomposition components with the closest station to the studied station results in the highest accuracy. Therefore, Bagging models with tree-based algorithm can be considered as simple and widely used methods.
Keywords
Main Subjects
- Adnan A., Yolanda A. M., and Natasya F. 2021. A comparison of bagging and boosting on classification data: Case study on rainfall data in Sultan Syarif Kasim II meteorological station in Pekanbaru. Journal of Physics. https://doi.org/10.1088/1742-6596/2049/1/012053.
- Asakereh H., and Akbarzadeh Y. 2017. Simulation of temperature and precipitation changes of Tabriz Synoptic Station using statistical downscaling and Canesm2 climate change model output. Journal of Geography and Environmental Hazards 21: 153-174. https://doi.org/10.22067/GEO.V6I. (In Persian with English abstract)
- Barrera A., Oyedele L., Bilal M., Akinosho T., Delgado J., and Akanbi L. 2022. Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting. Machine Learning with Applications 7. https://doi.org/10.1016/j.mlwa.2021.100204.
- Breiman L. 1996. Bagging predictors. Machine Learning 24: 123–140.
- Bushara N., and Abraham A. 2015. Novel Ensemble Method for Long Term Rainfall Prediction. International Journal of Computer Information Systems and Industrial Management Applications 7: 116-130.
- Cabezuelo 2022. Prediction of Rainfall in Australia Using Machine Learning. Information 13(163). https://doi.org/10.3390/info13040163.
- Choubin B., Zehtabian Gh., Azareh A., Rafiei‑Sardooi E., Sajedi‑Hosseini F., and Kisi O. 2018. Precipitation forecasting using classification and regression trees(CART) model: a comparative study of different approaches. Environmental Earth Sciences. https://doi.org/10.1007/s12665-018-7498-z.
- Dastourani M. T., Habibipoor A., Ekhtesasi M. R., Talebi A., and Mahjoobi J. 2013. Evaluation of the Decision Tree Model in Precipitation Prediction (Case study: Yazd Synoptic Station). Iran-Water Resources Research 8(3): 14-27. (In Persian with English abstract)
- Endalie D., Hailea G., and Taye W. 2022. Deep learning model for daily rainfall prediction: case study of Jimma, Ethiopia. Water Supply 3(22). https://doi.org/10.2166/ws.2021.391.
- Kalmegh S. 2015. Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News. International Journal of Innovative Science, Engineering & Technology 2: 438-446.
- Kira K., and Rendell L.A. 1992. The feature selection problem: Traditional methods and a new algorithm. AAAI-92 Proceedings of the tenth national conference on Artificial intelligence 129-134.
- Liyew , and Melese H. 2021. Machine learning techniques to predict daily rainfall amount. Journal of Big Data 8(153). https://doi.org/10.1186/s40537-021-00545-4.
- Mishra , Soni H., Sharma S., and Upadhyay A. 2017. A Comprehensive survey of data mining techniques on time series data for rainfall prediction. Journal of ICT Research and Applications 11(2): 168-184. https://doi.org/10.5614/ itbj.ict.res.appl.2017.11.2.4.
- Nagahamulla , Ratnayake U., and Ratnaweera A. 2014. Selecting most suitable members for neural network ensemble rainfall forecasting model. Recent Advances on Soft Computing and Data Mining 591–601. https://doi.org/10.1007/978-3-319-07692-8_56.
- Omidvar K., and Azhdarpoor M. 2013. Comparison of artificial neural network and HEC-HMS model in assessment- runoff in Herat Azam catchment river. Geographical Research Quarterly 4: 139-159. (In Persian)
- Omidvar K., Shafie Sh., Taghizade Z., and Alipoor M. 2014. Evaluating the efficiency of the decision tree model in predicting rainfall in Kermanshah synoptic station. Journal of Applied Research in Geographical Sciences 14(34): 89-110. (In Persian)
- Sattari M.T., and Nahrein F. 2014. Monthly rainfall prediction using Artificial Neural Networks and M5 model tree (Case study: Station s of Ahar and Jolfa ). Journal of Irrigation and Water Engineering 4(14): 83-98. (In Persian with English abstract)
- Sattari M. T., Falsafian K., Irvem A., Shahab S., and Qasem S. 2020. Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Engineering Applications of Computational Fluid Mechanics 14(1): 1078-1094. https://doi.org/10.1080/19942060.2020.1803971.
- Tahroudi M., Ahmadi F., and Khalili K. 2017. Evaluation the Trend and Trend Chang Point of Urmia Lake Basin Precipitation. Journal of Water and Soil 31: 644-659. https://doi.org/22067/JSW.V31I2.55338. (In Persian with English abstract)
- Wang Y., and Witten I.H. 1997. "Inducing model trees for continuous classes", in Proceedings of the Ninth European Conference on Machine Learning. Prague, Czech Republic: Springer 128-137.
- Yobero C. 2018. Determining Creditworthiness for Loan Applications Using C5.0 Decision Trees. RPubs by RStudio.
- Yu N., and Haskins T. 2021. Bagging Machine Learning Algorithms: A Generic Computing Framework Based on Machine-Learning Methods for Regional Rainfall Forecasting in Upstate New York. Informatics 8 (47). https://doi.org/10.3390/informatics8030047.
- Zhou Z.H. 2012. Ensemble Methods: Foundations and Algorithms (New York (US): Chapman & Hall/CRC Press).
Send comment about this article