Ferdowsi University of MashhadWater and Soil2008-475737120230421Comparison of Machine Learning Methods in the Selection of Predictors of Atmospheric-Ocean General Circulation ModelsComparison of Machine Learning Methods in the Selection of Predictors of Atmospheric-Ocean General Circulation Models1291434316210.22067/jsw.2022.76605.1166FAM. AmirabadizadehUniversity of Birjand0000-0003-2814-0948Mahdieh FrozanmehrPh.D student of water resources . University of BirjandM. YaghoobzadehUniversity of BirjandSaeideh HosainabadiPh.D student of water resources, water engineering department, university of BirjandJournal Article20220511 <br />Introduction<br />Nowadays, climate change is one of the human challenges in the exploitation and management of water resources. Temperature along with precipitation is one of the most important climatic elements and is one of the main factors in zoning and climatic classification. Due to location of Iran within the drought belt and proximity to the high-pressure tropical zone, this country has an arid and semi-arid climate and suffers from drought in majority of years. Therefore, temperature fluctuations and variability are important issues, and make the study of temperature changes a necessity. In the current study, four data mining algorithms in selecting predictors for downscaling of maximum temperature in Birjand synoptic station have been studied, compared and the superior algorithm has been introduced. As the number of large scale features are high, selection of machine learning algorithm will play as an important role in statistical downscaling of climatic variables such as maximum temperature. <br />Materials and Methods<br />Today, the data set is such that many variables are used to describe the climatic phenomenon in environmental studies. As the number of data is huge, choosing the predictors is one of the most important steps in preprocessing machine learning. In this study, four machine learning methods including stochastic approximation of simultaneous turbulence (SPSA), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Gradient Boosting Method (GBM) in selecting important features in downscaling of maximum temperature in Birjand synoptic station during the statistical period of 1961-2019 were studied and compared. It is a mechanism to find a combination of predictors that with a minimum number of predictors can produce an acceptable evaluation index in estimating the variable under study. For the present study, the weather information of Birjand Synoptic Meteorological Station has been prepared by the Meteorological Organization of Iran. In order to calibrate and validate the machine learning algorithms, 70% and 30% of the available monthly data, respectively, were allocated for this purpose. To conduct this research, coding in R-Studio environment and Caret and Fscaret packages were used. In this study, to evaluate the performance of the algorithms, three indices includes relative Nash-Sutcliffe Efficiency (rNSE), Volume Efficiency (VE) and Kling-Gupta Efficiency (KGE) were used.<br />Results and Discussion<br />Before using the algorithms in selecting large-scale predictors, the correlation between these variables and the maximum observational temperature at Birjand station was investigated. Large scale variables mslp, P1_v, P8_v, P8_u, P850 Temp, with a maximum correlation temperature of 0.6 showed that the correlation is acceptable given the complexity of the climate change phenomenon. In addition, these results show that all the algorithms used the important factors including F1, F2, F15, F16, F18, F20 and F26 by more than 50% and the first variable (mean pressure at the ocean surface) was the most important parameter in downscaling of maximum temperature. Also, the highest importance was for P1_v and the lowest value related to P5_u, as 73.2% and 15%, respectively. Violin plots of downscaled maximum temperature in validation step of different algorithms along with the observed maximum temperature in Birjand synoptic station in each of the algorithms showed that the values of the first and third quartiles in the output data of SPSA algorithm compared to other algorithms were closer to the observed data. According to the evaluation criteria, SPSA algorithm has a higher performance than other algorithms in reproducing the maximum monthly temperature values in Birjand synoptic station. Also, based on the volumetric efficiency evaluation criteria and relative Nash-Sutcliffe, GBM algorithm was more successful in selecting predictors than Ridge and LASSO algorithms. It is also observed that SPSA algorithm shows different results than other algorithms. In comparison of mean and variance of downscaled and observed maximum temperature, the results of t-test and F-test showed that SPSA algorithm has higher efficiency than other algorithms in regenerating mean and variance of observed maximum temperature in Birjand synoptic station at the 5% significance level.<br />Conclusion<br />The data used in this study included large scale atmospheric variables and the maximum observed temperature at Birjand station. The algorithms were used to select important predictors and the performance of these methods in the validation part. According to the results of this study, the highest importance among large-scale variables is related to P1_v and the lowest value is related to P5_u, the values of which were 73.2% and 15%, respectively. The SPSA algorithm also performs better than other algorithms in selecting predictors and consequently the maximum temperature. <br />Introduction<br />Nowadays, climate change is one of the human challenges in the exploitation and management of water resources. Temperature along with precipitation is one of the most important climatic elements and is one of the main factors in zoning and climatic classification. Due to location of Iran within the drought belt and proximity to the high-pressure tropical zone, this country has an arid and semi-arid climate and suffers from drought in majority of years. Therefore, temperature fluctuations and variability are important issues, and make the study of temperature changes a necessity. In the current study, four data mining algorithms in selecting predictors for downscaling of maximum temperature in Birjand synoptic station have been studied, compared and the superior algorithm has been introduced. As the number of large scale features are high, selection of machine learning algorithm will play as an important role in statistical downscaling of climatic variables such as maximum temperature. <br />Materials and Methods<br />Today, the data set is such that many variables are used to describe the climatic phenomenon in environmental studies. As the number of data is huge, choosing the predictors is one of the most important steps in preprocessing machine learning. In this study, four machine learning methods including stochastic approximation of simultaneous turbulence (SPSA), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Gradient Boosting Method (GBM) in selecting important features in downscaling of maximum temperature in Birjand synoptic station during the statistical period of 1961-2019 were studied and compared. It is a mechanism to find a combination of predictors that with a minimum number of predictors can produce an acceptable evaluation index in estimating the variable under study. For the present study, the weather information of Birjand Synoptic Meteorological Station has been prepared by the Meteorological Organization of Iran. In order to calibrate and validate the machine learning algorithms, 70% and 30% of the available monthly data, respectively, were allocated for this purpose. To conduct this research, coding in R-Studio environment and Caret and Fscaret packages were used. In this study, to evaluate the performance of the algorithms, three indices includes relative Nash-Sutcliffe Efficiency (rNSE), Volume Efficiency (VE) and Kling-Gupta Efficiency (KGE) were used.<br />Results and Discussion<br />Before using the algorithms in selecting large-scale predictors, the correlation between these variables and the maximum observational temperature at Birjand station was investigated. Large scale variables mslp, P1_v, P8_v, P8_u, P850 Temp, with a maximum correlation temperature of 0.6 showed that the correlation is acceptable given the complexity of the climate change phenomenon. In addition, these results show that all the algorithms used the important factors including F1, F2, F15, F16, F18, F20 and F26 by more than 50% and the first variable (mean pressure at the ocean surface) was the most important parameter in downscaling of maximum temperature. Also, the highest importance was for P1_v and the lowest value related to P5_u, as 73.2% and 15%, respectively. Violin plots of downscaled maximum temperature in validation step of different algorithms along with the observed maximum temperature in Birjand synoptic station in each of the algorithms showed that the values of the first and third quartiles in the output data of SPSA algorithm compared to other algorithms were closer to the observed data. According to the evaluation criteria, SPSA algorithm has a higher performance than other algorithms in reproducing the maximum monthly temperature values in Birjand synoptic station. Also, based on the volumetric efficiency evaluation criteria and relative Nash-Sutcliffe, GBM algorithm was more successful in selecting predictors than Ridge and LASSO algorithms. It is also observed that SPSA algorithm shows different results than other algorithms. In comparison of mean and variance of downscaled and observed maximum temperature, the results of t-test and F-test showed that SPSA algorithm has higher efficiency than other algorithms in regenerating mean and variance of observed maximum temperature in Birjand synoptic station at the 5% significance level.<br />Conclusion<br />The data used in this study included large scale atmospheric variables and the maximum observed temperature at Birjand station. The algorithms were used to select important predictors and the performance of these methods in the validation part. According to the results of this study, the highest importance among large-scale variables is related to P1_v and the lowest value is related to P5_u, the values of which were 73.2% and 15%, respectively. The SPSA algorithm also performs better than other algorithms in selecting predictors and consequently the maximum temperature.https://jsw.um.ac.ir/article_43162_a26a7b77084bebcd054eee8ca0494f97.pdf