عنوان مقاله [English]
Introduction: Soil salinization is increasing across developing world countries and agricultural production is decreasing as a result of this stress. Climate change could adversely affect soil salinization trend through the decrease in rainfall and increased evapotranspiration in arid regions. Policy and decision makers require continuous and quantitative monitoring of soil salinity to adapt with the adverse effects of climate change and increasing need for food. Indices derived from near surface or satellite based sensors are increasingly applied for monitoring of soil salinity so a considerable number of these indices are introduced already for soil salinity monitoring. Different regression methods have been already used for modeling and verification of developed models amongst them multiple linear regression (including stepwise, forward selection and backward elimination) and partial least square regression are the most important methods.
Materials and Methods: To evaluate different approaches for modeling soil salinity against remotely sensed data, an area of about 50000 ha was selected in Sabzevar- Davarzan plain during 2013 and 2014 years. The locations of sampling points were determined using Latin Hypercube Sampling (LHS) strategy. Sampling density was 97 points for 2013 and 25 points for 2014. All points were sampled down to 90 cm depth in 30 cm increments. Totally 366 soil samples were analyzed in the laboratory for electrical conductivity of saturated extract. Electromagnetic induction device (EM38) was also used to measure bulk soil electrical conductivity for the sampling points at the first year and sampling points and 8 points around it at the second year. Totally 97 and 225 EM measurements were also recorded for first and second years respectively. Mean measured soil EC data were calibrated against the EM measurements. Finding the fair correlations, the EM and EC data could be converted to each other. 23 spectral indices derived from Landsat 8 images in the sampling dates along with DEM were used as independent variables. Multiple Linear Regression (MLR) and Partial Least Square Regression (PLSR) methods were evaluated for their fitness in predicting soil salinity from independent variables in different calibration and verification datasets.
Results and Discussion: Different multiple linear regression approaches using the first year data for training and second year data for testing the models and vice versa were evaluated which produced determination coefficients of about 22 to 88 percent in the training dataset but this regression did not reach to 29 percent in the test dataset. Due to the multiple co-linearity amongst the independent variables the multiple linear regression methods were not applicable to all variables. Excluding the co-linear variables, log- transforming and randomizing them into train and test datasets improved the determination coefficient of model and its validation at an acceptable level. Application of partial least square regression using the original and log- transformed data of first and second years as train and test datasets and vice versa introduced determination coefficients of about 39 to 85 percent in the training dataset but were not able to predict in the test dataset. Random dividing of all data into train and test datasets considerably increased the determination coefficient in the verification dataset. Repeating the randomization showed that the approach has the required consistency for predicting the coefficients of variables.
Conclusions: Wide range of independent variable could be used for predicting soil salinity from remotely sensed data and indices. On the other hand the independent variables generally show multi-colinearity amongst themselves. Correlation matrix, variance inflation factor and tolerance indices could be used to identify multi-colinearity. Removing or scaling the variable with high colinearity could improve the regression. Different data transformation methods including log- transformation could also significantly improve the strength of regression. In this research EM data showed more significant correlations with spectral indices in comparison with laboratorial measured EC data. As the EM38 device measures the reflectance in special range of spectrum this higher correlation could be expected. Such models should be calibrated and verified against ground truth data. Generally a part of data set is used for calibrating (making the model) and the remained for verifying (testing the model). Random dividing of the total data of 2 years into calibration (2/3 of data) and verification (1/3 of data) could significantly improve the regression in the verification data set. This procedure increases the range of variability for data used for calibration and verification and prevents outlier predictions.