دوماه نامه

نوع مقاله : مقالات پژوهشی

نویسندگان

1 دانشگاه بیرجند

2 دانشجوی دکتری منابع آب گروه مهندسی آب دانشگاه بیرجند

3 گروه علوم ومهندسی آب دانشگاه بیرجند

4 دانشجوی دکتری منابع اب گروه مهندسی آب دانشگاه بیرجند

چکیده

افزایش غلظت گازهای گلخانه­ای در اتمسفر باعث تغییرات زیادی در مؤلفه­های اقلیمی کره زمین شده است که این تغییرات در پارامترهای اقلیمی به صورت افزایشی یا کاهشی است. امروزه تغییر اقلیم یکی از چالش‌های بشر در بهره‌برداری و مدیریت منابع آب است، همچنین شرایط کنونی آب و هوای جهانی نشان دهنده افزایش خطرات ناشی از پدیده خشکی در بسیاری از مناطق جهان در آینده است. مدل‌های گردش کلی جوی یکی از مهم‌ترین و پرکاربردترین روش‌ها در مطالعات تغییرات اقلیمی در مقیاس منطقه­ای می‌باشد. یکی از اولویت­های اصلی ریزمقیاس‌نمایی آماری انتخاب پیش‌بینی‌کننده­ها به عنوان ورودی به مدل ریزمقیاس‌نمایی در پژوهش می­باشد. برای انتخاب پیش‌بینی‌کننده­های مهم از بین 26 متغیر جو بالا، از چهار الگوریتم یادگیری ماشین شامل لاسو، ستیغی، GBM، SPSA در ریزمقیاس‌نمایی آماری دمای بیشینه در ایستگاه بیرجند استفاده گردید و عملکرد این روش‌ها با سه شاخص نش-ساتکلیف نسبی، کلینگ-گوپتا و بازده حجمی در بخش صحت‌سنجی مورد بررسی قرار گرفت. نتایج نشان داد که بیشترین میزان اهمیت برای مؤلفه سرعت نصف‌النهاری نزدیک سطح و کمترین مقدار آن در مؤلفه سرعت مداری در ارتفاع 500 هکتو پاسکال می­باشد که مقادیر آن به‌ترتیب 2/73% و 15% تعیین شد. همچنین نتایج شاخص­های ارزیابی عملکرد نش- ساتکلیف نسبی و کلینگ-گوپتا، نشان دادند که الگوریتم SPSA دارای عملکرد بهتری از سایر الگوریتم­ها درانتخاب پیش‌بینی‌کننده­ها و به تبع آن ریزمقیاس‌­نمائی دمای بیشینه می­باشد. مقایسه میانگین و واریانس خروجی ریزمقیاس شده توسط الگوریتم­های مورد استفاده و داده­های مشاهداتی در بخش صحت‌سنجی نشان داد که الگوریتمSPSA  نسبت به سایر الگوریتم­ها در باز تولید میانگین و واریانس دمای بیشینه مشاهداتی در ایستگاه سینوپتیک بیرجند دارای توانایی بیشتری می­باشد.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Comparison of Machine Learning Methods in the Selection of Predictors of Atmospheric-Ocean General Circulation Models

نویسندگان [English]

  • M. Amirabadizadeh 1
  • Mahdieh Frozanmehr 2
  • M. Yaghoobzadeh 3
  • Saeideh Hosainabadi 4

1 University of Birjand

2 Ph.D student of water resources . University of Birjand

3 University of Birjand

4 Ph.D student of water resources, water engineering department, university of Birjand

چکیده [English]

 
Introduction
Nowadays, climate change is one of the human challenges in the exploitation and management of water resources. Temperature along with precipitation is one of the most important climatic elements and is one of the main factors in zoning and climatic classification. Due to location of Iran within the drought belt and proximity to the high-pressure tropical zone, this country has an arid and semi-arid climate and suffers from drought in majority of years. Therefore, temperature fluctuations and variability are important issues, and make the study of temperature changes a necessity. In the current study, four data mining algorithms in selecting predictors for downscaling of maximum temperature in Birjand synoptic station have been studied, compared and the superior algorithm has been introduced. As the number of large scale features are high, selection of machine learning algorithm will play as an important role in statistical downscaling of climatic variables such as maximum temperature. 
Materials and Methods
Today, the data set is such that many variables are used to describe the climatic phenomenon in environmental studies. As the number of data is huge, choosing the predictors is one of the most important steps in preprocessing machine learning. In this study, four machine learning methods including stochastic approximation of simultaneous turbulence (SPSA), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Gradient Boosting Method (GBM) in selecting important features in downscaling of maximum temperature in Birjand synoptic station during the statistical period of 1961-2019 were studied and compared. It is a mechanism to find a combination of predictors that with a minimum number of predictors can produce an acceptable evaluation index in estimating the variable under study. For the present study, the weather information of Birjand Synoptic Meteorological Station has been prepared by the Meteorological Organization of Iran. In order to calibrate and validate the machine learning algorithms, 70% and 30% of the available monthly data, respectively, were allocated for this purpose. To conduct this research, coding in R-Studio environment and Caret and Fscaret packages were used. In this study, to evaluate the performance of the algorithms, three indices includes relative Nash-Sutcliffe Efficiency (rNSE), Volume Efficiency (VE) and Kling-Gupta Efficiency (KGE) were used.
Results and Discussion
Before using the algorithms in selecting large-scale predictors, the correlation between these variables and the maximum observational temperature at Birjand station was investigated. Large scale variables mslp, P1_v, P8_v, P8_u, P850 Temp, with a maximum correlation temperature of 0.6 showed that the correlation is acceptable given the complexity of the climate change phenomenon. In addition, these results show that all the algorithms used the important factors including F1, F2, F15, F16, F18, F20 and F26 by more than 50% and the first variable (mean pressure at the ocean surface) was the most important parameter in downscaling of maximum temperature. Also, the highest importance was for P1_v and the lowest value related to P5_u, as 73.2% and 15%, respectively. Violin plots of downscaled maximum temperature in validation step of different algorithms along with the observed maximum temperature in Birjand synoptic station in each of the algorithms showed that the values of the first and third quartiles in the output data of SPSA algorithm compared to other algorithms were closer to the observed data. According to the evaluation criteria, SPSA algorithm has a higher performance than other algorithms in reproducing the maximum monthly temperature values in Birjand synoptic station. Also, based on the volumetric efficiency evaluation criteria and relative Nash-Sutcliffe, GBM algorithm was more successful in selecting predictors than Ridge and LASSO algorithms. It is also observed that SPSA algorithm shows different results than other algorithms. In comparison of mean and variance of downscaled and observed maximum temperature, the results of t-test and F-test showed that SPSA algorithm has higher efficiency than other algorithms in regenerating mean and variance of observed maximum temperature in Birjand synoptic station at the 5% significance level.
Conclusion
The data used in this study included large scale atmospheric variables and the maximum observed temperature at Birjand station. The algorithms were used to select important predictors and the performance of these methods in the validation part. According to the results of this study, the highest importance among large-scale variables is related to P1_v and the lowest value is related to P5_u, the values of which were 73.2% and 15%, respectively. The SPSA algorithm also performs better than other algorithms in selecting predictors and consequently the maximum temperature.

کلیدواژه‌ها [English]

  • Atmosphere- Ocean general circulation model
  • Downscaling
  • Feature selection
  • Machine learning algorithm
  • Maximum temperature
  1. Aksakalli, , & Malekipirbazari, M. (2016). Feature selection via binary simultaneous perturbation stochastic approximation, Pattern Recognition Letters 75: 41-47. https://doi.org/10.1016/j.patrec.2016.03.002.
  2. Algin, R., Alkaya, A.F., & Agaoglu, M. (2022). Performance of simultaneous perturbation stochastic approximation for feature selection. In International Conference on Intelligent and Fuzzy Systems (pp. 348-354). Springer, Cham.
  3. Babazadeh, , Shamsnia, S.H., Bostani, F., Norozieghdam, A., & Khoda‌karami‌dwhkordi, D. (2012). Evaluation of drought, wet and prediction of Shiraz climatic parameters precipitation and temperature by using stochastic methods. Journal of Geography and Urban Planning 16(41): 23-42. (In Persian)
  4. Balling Jr, R.C., & Idso, S.B. (1990). Effects of greenhouse warming on maximum summer temperatures. Agricultural and Forest Meteorology 53(1-2): 143-147.
  5. Chen, H., Xu, C.Y., & Guo, S.L. (2012). Comparison and evaluation of multiple GCMs, statistical downscaling and hydrological models in the study of climate change impacts on runoff. Journal of Hydrology 434: 36–45. https://doi.org/10.1016/j.jhydrol.2012.02.040.
  6. Diamantopoulu, M.J., Georgiou, P.E., & Papamichial, D.M. (2010). Evaluation of artificial neural network in estimating reference evapotranspiration with minimal meteorological data. Global Nest Journal 13(1): 18-27.
  7. Fatahi, M.H., Bamdad, A., & Rahimikhob, A. (2012). Application of association rules to monitor rainfall and drought events using sea surface temperature (Case study: Khozestan). Journal of Water Resource Engineering 109-118. (In Persian with English abstract)
  8. Hashmi, M.Z., Shamseldin, A.Y., & Melville, B.W. (2011). Comparison of SDSM and LARS-WG for simulation and downscaling of extreme precipitation events in a watershed. Stochastic Environmental Research and Risk Assessment 25(4): 475-484.
  9. He, R.R., Chen, Y., Huang, Q., & Kang, Y. (2019). LASSO as a tool for downscaling summer rainfall over the Yangtze River valley. Journal of Hydrology 64(1): 92–104. https://doi.org/10.1080/02626667.2019.1570210.
  10. Hessami, M., Gachon, P., Ouarda, T., & St-Hilaire, A. (2008). Automated regression-based statistical downscaling tool. Environ Model Software 23: 813–834. (In Persian)
  11. Jafarzadeh, A., Pourreza-Bilondi, M., Khashei Siuki, A., & Ramezani Moghadam, J. (2021). Examination of various feature selection approaches for daily precipitation downscaling in different climates. Water Resources Management 35(2): 407-427.
  12. Kharin, V.V., & Zwiers, F.W. (2000). Changes in the extremes in an ensemble of transient climate simulations with a coupled atmosphere–ocean GCM. Journal of Climate13(21): 3760-3788.‏ https://doi.org/10.1175/1520-0442(2000)013<3760:CITEIA>2.0.CO;2.
  13. Meenu, , Rehana, S., & Mujumdar, PP. (2013). Assessment of hydrologic impacts of climate change in Tunga-Bhadra River basin, India with HEC-HMS and SDSM. Hydrological Process 27(11): 1572–1589. https://doi.org/10.1002/hyp.9220. ‏
  14. Muthukrishnan, R., & Rohini, R. (2016). LASSO: A feature selection technique in predictive modeling for machine learning. In 2016 IEEE international conference on advances in computer applications (ICACA) (pp. 18-20). IEEE.
  15. Nasseri, , & Zahraie, B. (2013). Performance assessment of different data mining methods in statistical downscaling of daily precipitation. Journal of Hydrology 492: 1–14. https://doi.org/10.1016/j.jhydrol.2013.04.017.    
  16. Natekin, , & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics 7: 21. https://doi.org/10.3389/fnbot.2013.00021.
  17. Nazeri Tahroudi, M., Amirabadizadeh, M., & Zaineli, M.J. (2017). Investigating artificial intelligence and regression methods in simulating daily temperature values. Meteorology and Atmospheric Sciences 1(1): 65-76. (In Persian)
  18. Niknam, F. (2013). Climatic data mining to present a climate forecasting model in Isfahan Province. University of Shiraz.
  19. Omidvar, , Shafii, Sh., Taghizadeh, Z., & Alipur, M. (2015). Efficient evaluation of decision tree model in Kermanshah Synoptic station rainfall forecast. Journal of Applied Geosciences Research 14(34): 89-110. (In Persian)
  20. Pal, , & Deswal, S. (2009). M5 model tree based modeling of reference evapotranspiration. Hydrologic Process 23: 1437-1443. https://doi.org/10.1002/hyp.7266.
  21. Panahi, , & Mirshahi, S.H. (2016). Assessment among two data mining algorithms CART and CHAID in forecast air temperature of the Synoptic station of Arak. Journal of Environmental Science 13(4): 52-58. (In Persian)
  22. Salahi, , & Fateminiya, F.S. (2017). Forecasting frost changes in the city of Kashan based on the simulation of general atmospheric circulation model. Journal of Geography and Environmental Planning 28(3): 20-36. (In Persian)
  23. Sfandiari, , Hosseini, S.H., Azadimobaraki, M., & Hejazizadeh, Z. (2010). Predict the average monthly temperature in Sanandaj station using the model (MLP) MLP Network, Journal of Iran Geographic 8(27): 45-65. (In Persian)
  24. Troncoso, , Salcedo_Sanz, S., Casanova_ Mateo, C., Riquelme, J.C., & Prieto, L. (2015). Local model based regression trees for very short-term wind speed prediction. Renewable Energy 81: 589-598.
  25. Zhang, X., Yan, X., & Chen, Z. (2016). Reconstructed regional mean climate with Bayesian model averaging: a case study for temperature reconstruction in the Yunnan–Guizhou plateau, China. Journal of Climate 29(14): 5355–5361. https://doi.org/10.1175/JCLI-D-15-0603.1.
CAPTCHA Image