کارایی روش‌های مختلف انتخاب متغیر کمکی در نقشه‌برداری رقومی کلاس خاک با استفاده از الگوریتم‌های داده‌کاوی

نوع مقاله : مقالات پژوهشی

نویسندگان

1 دانشگاه ایلام

2 دانشگاه صنعتی اصفهان-دانشکده کشاورزی

3 دانشگاه تهران

چکیده

         تهیه نقشه­ های خاک با صحت مناسب یک ابزار توانمند برای دست یافتن به استفاده پایدار از اراضی در عرصه­های کشاورزی و منابع طبیعی محسوب می­شود. پژوهش حاضر در بخشی از اراضی وَرگَر شهرستان آبدانان در استان ایلام به­ منظور نقشه­ برداری رقومی کلاس‌های خاک با استفاده از مدل­های جنگل تصادفی و منطق فازی اجرا گردید. در اراضی مورد مطالعه موقعیت 44 خاکرخ تعیین، حفر، تشریح و نمونه ­برداری از کلیه افق­های ژنتیکی صورت پذیرفت. پس از انجام آزمایش‌های فیزیکوشیمیایی لازم رده‌بندی خاک­ها انجام شد. از مدل رقومی ارتفاع ماهواره آلوس پالسار و نرم‌افزار ساگا جی‌آی‌اس برای تهیه متغیرهای کمکی ژئومورفومتری استفاده گردید. سه رویکرد انتخاب متغیر شامل الگوریتم باروتا، شاخص تورم واریانس و میانگین کاهش صحت به ­همراه دو مدل داده‌کاوی جنگل تصادفی و منطق فازی برای مدل‌سازی روابط خاک-زمین­نما به کار گرفته شد. نتایج نشان داد که رویکرد انتخاب متغیر میانگین کاهش صحت به‌عنوان مناسب‌ترین روش، از تعداد 35 متغیر کمکی ژئومورفومتری منجر به انتخاب شش متغیر گردید. همچنین رویکرد مدل‌سازی جنگل تصادفی-میانگین کاهش صحت، در سطح زیرگروه با صحت عمومی و شاخص کاپای 84 و 57 درصد دارای بالاترین دقت بود. بررسی نتایج حاصل از رویکرد فازی حاکی از این بود که مقادیر شاخص کاپا و صحت عمومی این روش با سه سناریو دیگر مشابه و اختلاف ناچیزی بین صحت نتایج در سطح فامیل خاک مشاهده گردید. به‌طورکلی استفاده از رویکردهای مختلف انتخاب متغیر می‌توانند موجب افزایش دقت تهیه نقشه­ های رقومی خاک گردند. همچنین افزایش تعداد مشاهدات میدانی و استفاده از سایر متغیرهای محیطی تأثیرگذار بر روی تشکیل خاک­ها را می توان برای پیش‌بینی کلاس‌های خاک  با صحت پایین به کارگیری نمود.

کلیدواژه‌ها


عنوان مقاله [English]

Efficiency of Different Feature Selection Methods in Digital Mapping of Subgroup and Soil Family Classes with Data Mining Algorithms

نویسندگان [English]

  • S. Nazari 1
  • M. Rostaminia 1
  • shamsollah Ayoubi 2
  • A. Rahmani 3
  • S.R. Mousavi 3
1 Ilam University
3 Tehran University
چکیده [English]

Abstract
Background and objectives: High-accuracy of soil maps is a powerful tool for achieving land sustainability in agricultural and natural resources. The present study was conducted in Vargar lands of Abdanan city related to Ilam province for digital mapping of soil classes at two taxonomic level from subgroup up to family by random forest (RF) and fuzzy logic models.
Materials and methods: Study area with 1027 hectare have 628.6 mm and 22.6 C° mean annual precipitation and temperature respectively. Three major physiographic units included Hilland, Piedmont plain and Alluvial plain were observed. Soil moisture and temperature regimes are ustic and hyperthermic calculated based on Newhall model in JNSM 6.1 version software. A total of 44 soil profile observation with random sampling pattern was determined based on standardized soil surveys then digging, description and after sampling from all genetic horizons then soil samples were transferred to laboratory. Finally, all of soil profiles were classified based on soil taxonomy system (2014) up to family level. Geomorphometric covariates as a representative of soil forming factors were prepared from digital elevation model (ALOS PALSAR Satellite,2011) with 12.5 m resolution in SAGA GIS 7.4 version software. Three feature selection approaches included Boruta, Variance inflation factors (VIF) and Mean decrease accuracy (MDA) with two Random forest (RF) and Fuzzy logic data mining algorithms were applied for relating soil-landscape relationship by using “randomforest”, “caret” packages in R 3.5.1 and SoLIM solution version 2015 software. Sample based project used for predicting soil classes in Fuzzy logic modeling process. In totally observation profile split into two data set included 80 percent (n=36) for calibrating and 20 percent for validating (n=8) based on bootstraps sampling algorithm random forest. Internal validation of random forest algorithm was done based on out of bag error percentage (OOB%). The best model performance was determined based on overall accuracy (OA) and kappa index, also for each individual class user accuracy (UA) and producer accuracy (PA) were applied.
Results: The results shown that from number of 40 geomorphometrics covariates, six covariates included Terrain classification index for lowlands, Annual insolation, Topographic position Index, Upslope curvature, Real surface area and Terrain surface convexity were selected by MDA as the best environmental covariates. Also, RF-MDA method with overall accuracy 84% and Kappa index 0.56 had the best performance compared to other methods (RF_VIF, RF-BO, Fuzzy-MDA) in subgroup level with 58, 55, 50 and 0.3, 0.67 and 0.18 respectively. Out of bag error results (%OOB) for RF-MDA, RF-VIF and RF-Boruta were obtained that 72.42%, 67.86% and 82.76% for subgroup level and 93.10%, 93.10% and 86.21% for family level respectively. while there was little difference between the accuracy of the method at the family taxonomic level and performed similar results in modeling of soil classes process. The results of the fuzzy approach showed that the kappa index values ​​and overall accuracy of this method were similar to the other three scenarios and there was a slight difference between the accuracy of the results at the soil family level. In the fuzzy method, it was observed that the kappa and overall accuracy values ​​at the subgroup level were lower than the other scenarios. Fuzzy approaches in contrasted to RF modeling prevented continues spatial variability by generating of fuzzy maps for each of soil class in the landscape. These results indicate that the random forest method is superior to the fuzzy method in family class mapping and soil subgroups. Based on MDA sensitivity analysis index, similarly, three geomorphometrics covariate included Terrain surface convexity (convexity), Terrain classification index for lowlands (TCI_Low) and Real surface area (Surface_Ar) had highest importance for predicting soil classes at two taxonomic level. With regarded to final soil predicted maps area, two classes (Fine-silty, carbonatic, hyperthermic Typic Haplustepts) and Typic Calciustolls with 32.70% and 48.90% and (Fine-silty, carbonatic, hyperthermic Typic Calciustolls) and Typic Haplustepts with 0.18% and 1.85% had the highest and lowest content at family and subgroup maps respectively.
Conclusion: In general, using different variable selection approaches in situations where soil classes have a relatively imbalanced abundance can increase the accuracy of digital mapping in soil studies. Increasing the number of field observations and the use of other environmental variables affecting soil formation can also be used for gradating in prediction low-accuracy soil classes.

کلیدواژه‌ها [English]

  • Soil mapping
  • random forest
  • Fuzzy logic
  • Environmental covariates
1- Abbaszadeh F., Ayubi Sh., and Jafari A. 2018. Spatial forecasting of large soil groups using regression and decision tree models in the southeast region of Iran. Crop Engineering (Journal of Agricultural Science) 41: 123-146. (In Persian with English abstract)
2- Akinwande M., Dikko H., and Samson A. 2015. Variance Inflation Factor: As a condition for the inclusion of suppressor variable(s) in regression analysis. Open Journal of Statistics 5: 754-767.
3- Breiman L. 2001. Random forests. Machine Learning 45(1): 5-32.
4- Breiman L., and Cutler A. 2004. Random Forests, URL: http://www. stat. berkeley. edu/users/breiman. Random Forests/cc_papers. htm.
5- Brungard C.W., Boettiger J.L., Duniway M.C., Wiks S.A., and Edwards T.C. 2015. Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma 239: 68-83.
6- Campos A.R., Giasson E., Costa J.J.F., Machado I.R., Silva E.B.D., and Bonfatti B.R. 2018. Selection of environmental covariates for classifier training applied in digital soil mapping. Revista Brasileira de Ciência do Solo 42.
7- Chen T., Niu R.Q., Li P.X., Zhang L.P., and Du B. 2011. Regional soil erosion risk mapping using RUSLE, GIS, and remote sensing: a case study in Miyun watershed, North China. Environmental Earth Sciences 63(3): 533-541.
8- Conrad O., Bechtel B., Bock M., Dietrich H., Fischer E., Gerlitz L., Wehberg J., Wichmann V., and Böhner J. 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4, Geoscientific Model Development Discussions 8(2).
9- Fatehi Sh. 2015. Scale descending properties and agglomeration of soil classes in part of Karkheh River Watershed in Kermanshah Province. PhD Thesis-Faculty of Agriculture-Shahrekord University.
10- Gee G.W., and Bauder J.W. 1986. Particle-size analysis 1. Methods of soil analysis: Part 1— Physical and mineralogical methods, (methodsofsoilan1), 383-411.
11- Hengel T., Rossiter D.G., and Stein A. 2003. Soil sampling strategies for spatial prediction by correlation with auxiliary maps. Geoderma 120: 75-93.
12- Heung B., HO H.C., Zhang J., Knudby A., Bulmer C. E., and Schmidt M.G. 2016. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265: 62-77.
13- Jafari A., Finke P.A., Van deWauw J., Ayoubi S., and Khademi H. 2012. Spatial prediction of USDA-great soil groups in the arid Zarand region, Iran: comparing logistic regression approaches to predict diagnostic horizons and soil types. European Journal of Soil Science 63(2): 284-298.
14- Khamoshi A., Sarmadian F., and Keshavarzi A. 2019. Digital soil mapping using random forest model in Abyek Region, Qazvin Province. Journal of Soil Research (Soil and Water Sciences) 32: 384. (In Persian with English abstract)
15- Kursa M.B., and Rudnicki W.R. 2010. Feature selection with the Boruta package. Journal of Statistical Software 36(11): 1–13.
16- Liaw A., and Wiener M. 2002. Classification and regression by random Forest. R news 2(3): 18-22.
17- Maghsodi Z., Rostaminia M., Faramarzi M., Keshavarzi A., and Rahmani A. 2018. Spatial forecasting of soil units in geographical information systems environment in Ilam Province. Journal of Soil Research (Soil and Water Sciences) 33: 254-268. (In Persian with English abstract)
18- Massawe B.H., Subburayalu S.K., Kaaya A.K., Winowiecki L., and Slater B.K. 2018. Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using machine learning. Geoderma 311: 143-148.
19- Menezes M.D.D., Silva S.H.G., Mello C.R.D., Owens P.R., and Curi N. 2018. Knowledge-based digital soil mapping for predicting soil properties in two representative watersheds. Scientia Agricola 75(2): 144-153.
20- Minasny B., and McBratney A.B. 2016. Digital soil mapping: a brief history and some lessons. Geoderma 264: 301–311.
21- Mosleh Z., Salehi M.H., Jafari A., Borujeni I.E., and Mehnatkesh A. 2016. The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental Monitoring and Assessment 188(3): 195.
22- Mousavi S.R., Sarmadian F., Rahmani A., and Khamoushi S.E. 2019. Digital soil mapping with regression classification approaches by RS and Geomorphometrics covariates in the Qazvin plain, Iran. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences.
23- Nelson R.E. 1982. Carbonate and gypsum. In: Page AL (ed) Methods of soil analysis. American Society of Agronomy, Madison, pp 181–197.
24- Pahlavan Rad M.R., Toomaninan N., Khormali F., Brungard C.W., Bayram Komaki C., and Bogaert P. 2014. Updating soil survey maps using random forest and conditioned Latin hypercube in the loss derived soils of northern Iran. Geoderma 232: 97-106.
25- Rahmani A., Sarmadian F. Mousavi S.R., and Khamoushi S.E. 2019. Digital mapping of some surface soil properties using two random Forest and fuzzy logic approaches (Case Study: part of Kouhin lands, Qazvin Province). 16th Iranian Soil Science Congress. University of Zanjan. Zanjan. September 7th. (In Persian)
26- Soil science division staff. "Soil survey manual". USDA Handbook 18. 2017: 120-131.
27- Soil survey staff. 2014. Keys to soil taxonomy, United States Department of Agriculture. 12nd ed. Natural Resources Conservation Service.
28- Stum A.k., Boettinger J., White M., and Ramse R. 2010. Random forests applied as soil spatial model in arid. In digital soil mapping (pp. 179-190). Springer, Dordrecht.
29- Sumner M.E., and Miller W.P. 1996. Cation exchange capacity and exchange coefficients. Methods of soil analysis part 3—chemical methods, (methodsofsoilan3), 1201-1229.
30- Taghizadeh-Mehrjardi R., Nabiollahi K., Minasny B., and Triantafilis J. 2015. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma 253: 67–77.
31- Van Wambeke A.R. 2000. The Newhall simulation model for estimating soil moisture and temperature regimes. Department of Crop and Soil Sciences. Cornell University, Ithaca, NY. USA.
32- Walkley A., and Black I.A. 1934. An examination of the Degtjareff method for determining soil organic
matter, and a proposed modification of the chromic acid titration method. Soil Science 37(1): 29-38.
33- Yang L., Qi F., Zhu A., Shi J., and An Y. 2016. Evaluation of integrative hierarchical stepwise sampling for digital soil mapping. Soil Science Society of America Journal 80(3): 637-651.
34- Zhao Z., Chow T. L., Rees H. W., Yang Q., Xing Z., and Meng F. 2009. Predict soil texture distributions using an artificial neural network model. Computers and Electronics in Agriculture 65(1):36-48.
35- Zhu A.X., and Band L.E. 1994. A knowledge-based approach to data integration for soil mapping. Canadian Journal of Remote Sensing 20(4): 408-418.
CAPTCHA Image