نوع مقاله : مقالات پژوهشی
نویسندگان
دانشگاه فردوسی مشهد
چکیده
بارش و دما از مهمترین متغیرهای هوا و اقلیمشناسی هستند. طول دوره آماری اهمیت بسزایی در دقت تحلیل این دو متغیر دارد. حجم نمونه کمتر از 100 سال نمیتواند نوسانات دراز مدت را به خوبی منعکس کند. طولانیترین آمار مربوط به دما و بارش ماهانه مشهد نزدیک به 125 سال (از حدود 1893 الی 2017) است. متاسفانه این آمار مفقودی دارد. ترمیم دادههای مفقود و افزایش دقت برآورد آنها هدف این پژوهش است. ایستگاههایی از کشورهای مجاور بهعنوان ایستگاههای مبنا انتخاب شدند. ابتدا دادههای مفقود با برازش ده الگوی رگرسیونی چندگانه برای بارش ماهانه (با ضرایب تعیین 63/0 تا 81/0) و شش الگو برای دمای ماهانه (986/0تا 993/0) ترمیم شدند. سپس برای کاهش خطاها، پارامترهای الگوهای رگرسیونی با روشهای GA و ACO بهینه شدند. افزون بر این دو روش ANN و SVR نیز بهمنظور الگوسازی این دادهها نیز بهکار گرفته شدند. نتایج نشان داد GA و ACO دقت برآورد دادههای مفقود بارش را نسبت به روشهای رگرسیونی فوق به طور چشمگیری افزایش میدهد. کمترین RMSE بین تمام الگوهای رگرسیونی بارش 79/9 میلیمتر است. این معیار با روش GA به 560/2 میلیمتر و با ACO به 559/2 کاهش میبابد. کمترین RMSE بین الگوهای رگرسیونی دما 986/0 میلیمتر است. این معیار با روش ANN به 726/0 میلیمتر و با SVR نیز به 551/0 کاهش میبابد. مقایسه ترمیم دما و بارش نشان میدهد که روشهای تکاملی برای بارش و روشهای یادگیری ماشین برای دما عملکرد بهتری دارند.
کلیدواژهها
عنوان مقاله [English]
Imputation of Missing Meteorological Data with Evolutionary and Machine Learning Methods Case Study: Long-term Monthly Precipitation and Temperature of Mashhad
نویسندگان [English]
- mahboobeh farzandi
- Seyed Hossein Sanaeinejad
- Bijan Ghahraman
- Majid Sarmad
Ferdowsi university of Mashhad
چکیده [English]
Introduction: Temperature and precipitation are two of the main variables in meteorology and climatology. These are basic inputs in water resource management. The length of the statistical period plays a pivotal role in the accurate analysis of these variables. Observation data at Iran's first synoptic station from 1330 (1951) is available at the Iranian Meteorological Organization website The historical monthly precipitation and temperature of five stations in Iran is available since 1880 with missing data. These data measured by the Embassy of the United States and Britain from the Qajar period and recorded in World Weather records books. These synoptic stations include Mashhad, Isfahan, Tehran, Bushehr, and Jask. The monthly missing data were predominantly recorded during World War II (1941-1949). Unfortunately, these data have missing. Therefore, the accuracy of simulating these variables is very important. The current research aimed to predict the missing values of monthly temperature and precipitation in Mashhad station. The stations in the neighboring countries were selected due to the distance to Mashhad, relationship, and completeness of data since 1880, as the predictive variables. Monthly precipitation of Ashgabat from Tajikistan and Sarakhs, Kooshkah, Bayram Ali, Kerki and Repetek from Turkmenistan were selected as an independent variable in the making of Missing Rainfall in Mashhad. Also, the temperature of Ashgabat, Bayram Ali, Gudan, Sarakhs, and Tajan were selected to restore the monthly temperature of the Mashhad station. This research has fitted ten multiple regression models to monthly rainfall of Mashhad station and has fitted 6 multiple regression to the monthly temperature of Mashhad. then the parameters of these patterns are optimized by genetic and Ant Colony algorithm. Also, the Artificial Neural Network (MLP) model and Support vector regression have been selected and implemented in order to simulate monthly precipitation and temperature data of Mashhad.
Materials and Methods: In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover, and selection. Ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Artificial neural networks are one of the main tools used in machine learning. As the “neural” part of their name suggests, they are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting).
Results and Discussion: At the first stage, several multiple regressions were fitted to monthly precipitation (with coefficients ranging from 0.63 to 0.81) and six patterns for monthly temperature (0.986-0.993). Afterward, GA and ACO were applied to improve the accuracy of the selected regression models by optimizing their parameters. At the next stage, ANN and SVR were used to estimate the monthly missing values separately. Finally, the results of the previous stages were compared using the root mean square error (RMSE), and the optimal models were applied to determine the missing values of monthly temperature and precipitation of Mashhad. The results showed that the Genetic Algorithm and Ant Colony increase the accuracy of the estimation of missing rainfall data significantly more than the previous methods. The lowest error criterion (RMSE) between regression patterns is 9.8 millimeters. By genetic algorithm, this criterion is reduced to 2.56 mm, and by ant colony algorithm to 2.559.
Conclusion: Comparison of the above methods in restoration temperature and precipitation shows that evolutionary methods (GA and ACO) are the best for estimating the missing monthly precipitation and machine learning methods (ANN and SVR) are the best to imputation missing data of monthly temperature.
کلیدواژهها [English]
- Missing data
- Artificial neural network
- support Vector Regression
- Ant colony
- Genetic algorithm
ارسال نظر در مورد این مقاله