%0 Journal Article
%T Evaluating the Capability of Data Mining Models in Predicting Irrigated Wheat Yield in Iran
%J Water and Soil
%I Ferdowsi University of Mashhad
%Z 2008-4757
%A Uossef gomrokchi, A,.
%A Baghani, J.
%A Abbasi, F.
%D 2021
%\ 06/22/2021
%V 35
%N 2
%P 189-202
%! Evaluating the Capability of Data Mining Models in Predicting Irrigated Wheat Yield in Iran
%K Data mining
%K Grouping
%K Modeling
%K Water consumption volume
%R 10.22067/jsw.2021.15029.0
%X Introduction: One of the modeling methods researchers have considered in various sciences in recent years is artificial neural network modeling. In addition to the artificial neural network and regression models, today, the capabilities of data mining methods have been used to improve the output results of prediction models and field information analysis. Tree models (decision trees) along with decision rules are one of the data mining methods. Tree models are a way of representing a set of rules that lead to a category or value. These models are made by sequentially separating data into separate groups, and the goal in this process is to increase the distance between groups in each separation. Research shows that plant yield is a function of various plant, climatic, and water, and soil management conditions. Therefore, calculating the amount of plant yield and related indices follows complex nonlinear relationships that also have special difficulty in modeling. Considering that the response of irrigated wheat to different inputs in different climates by field method is time-consuming, costly, and in some cases impossible, so the introduction of an efficient model that can predict yield and analyze yield sensitivity to various parameters is a great help. It will be to solve this problem. This study aimed to develop and evaluate the capability of three models of the neural network, tree, and multivariate linear regression in predicting wheat yield based on parameters affecting its yield in major wheat production hubs in the country. Materials and Methods: The information used in this study includes the volume of water consumption and yield of irrigated wheat and the committees related to these two indicators in irrigated wheat fields under the management of farmers (241 farms) in the provinces of Khuzestan, Fars, Golestan, Hamadan, Kermanshah, Khorasan Razavi, Ardabil, East Azerbaijan, West Azerbaijan, Semnan, south of Kerman and Qazvin, which were harvested in a field study in the 2016-17 growing season. According to the Ministry of Jihad for Agriculture statistics, these provinces have the highest area under irrigated wheat cultivation in the country and cover about 70% of the area under cultivation and production of this crop in the country. One of the most widely used monitored neural networks is the Perceptron multilayer network with error replication algorithm, which is suitable for a wide range of applications such as pattern recognition, interpolation, prediction, and process modeling. In the present study, in order to develop the neural network, the capabilities of R software with Neuralnet package have been used. After the normalization step, the data were randomized. This step aims to have a set of inputs and outputs in which the input-output categories do not have a special system. After the randomization of the data, the amount of information that should be used in the network training process is determined. This part of the data was considered for training (70%) and another part for network test (30%). Perceptron neural network activator functions in the implementation of network training and testing. The hyperbolic tangent activity function has been used to limit the range of output data from each neuron and the pattern-to-pattern training process. In the present study and the neural network modeling capability, the tree model method has been used to predict wheat yield. Tree modeling is one of the most powerful and common tools for classification and forecasting. The tree model, unlike the neural network model, produces the law. One of the advantages of the decision tree over the neural network is that it is resistant to input data noise. The tree model divides the data into different sections based on binary divisions. Each data partition can be re-subdivided into another binary, and a model fitted to each subdivision. In this research, the capabilities of WEKA software have been used to run a tree model. It is worth noting that after grouping, the prediction model is applied to the grouped data. Results and Discussion: In this study, the efficiency of three models of the artificial neural network, multivariate linear regression, and tree model to predict the performance of irrigated wheat in major production areas in the country was evaluated based on field information recorded in 241 farms. The results showed that the coefficient of explanation of the model in predicting the yield of wheat production in the model of artificial neural network and a multivariate linear regression model was 0.672 and 0.577, respectively, which was applied by grouping the data by tree method. The coefficient of explanation has been increased to 0.762. The output results of the tree model showed that the major wheat production areas in Iran in terms of water consumption could be divided into four independent groups. Finally, it can be concluded that the tree model, considering the purposeful grouping in the input data, can be used as a powerful tool in estimating irrigated wheat yield in major wheat production areas in Iran. Conclusion: In this study, the need to use data mining methods in analyzing field information and organizing large databases and the usefulness of data mining methods, especially the decision tree in estimating wheat crop yield, were investigated and compared with other forecasting methods. The general results of the research show that purposeful separation of input data into forecasting models can increase the output accuracy of forecasting models. However, it is not possible to provide a general approach to selecting or not selecting a forecasting model in different regions. In some studies, neural networks have shown a high ability to predict the performance of different products, but it is important to note that if there is sufficient data and correct understanding of the factors affecting the dependent variable, the accuracy of the models can be applied by data mining methods. It also improved the neural network. In a general approach, considering the accuracy of estimating the predicted models under study, these techniques can be used to estimate other late-finding characteristics of plants and soil.
%U https://jsw.um.ac.ir/article_39609_c51dac09a5b87b566c9eb73680ed0479.pdf