Document Type : Research Article

Authors

Ferdowsi University of Mashhad

Abstract

Introduction: For water resources monitoring, Evaluation of groundwater quality obtained via detailed analysis of pollution data. The most fundamental analysis is to identify the exact measurement of dangerous zones and homogenous station identification in terms of pollution. In case of quality evaluation, the monitoring improvement could be achieved via identifying homogenous wells in terms of pollution. Presenting a method for clustering is essential in large amounts of quality data for aquifer monitoring and quality evaluation, including identification of homogeneous stations of monitoring network and their clustering based on pollution. In this study, with the purpose of Mashhad aquifer quality evaluation, clustering have been studied based on Euclidean distance and Entropy criteria. Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). SNI as a combined entropy measure for clustering calculated from dividing mutual information of two values (pollution index values) to the joint entropy. These measures apply as similar distance criteria for monitoring stations clustering.
Materials and Methods: First, nitrate data (as pollution index) and electrical conductivity (EC) (as covariate) collected from the related locational situation of 287 wells in statistical period 2002 to 2011. Having identified the outlying data and estimating non-observed points by spatial-temporal Kriging method and then standardizes them, the clustering process was carried out. A similar distance of wells calculated through a clustering process based on Euclidean distance and Entropy (SNI) criteria. This difference explained by characteristics such as the location of wells (longitude & latitude) and the pollution index (nitrate). Having obtained a similar distance of each well to others, the hierarchical clustering was used. After calculating the distance matrix, clustering of 287 monitoring stations (wells) was conducted. The optimal number of clusters was proposed. Finally, in order to compare methods, the validation criteria of homogeneity (linear-moment) were used. The research process, including spatial-temporal Kriging, clustering, silhouette score and homogeneity test was performed using R software (version 3.1.2). R is a programming language and software environment for statistical computing and graphics supported by R foundation for statistical computing.
Results and Discussion: Considering 4 clusters, the silhouette score for Euclidean distance criteria was obtained 0.989 and for entropy (SNI) was 0.746. In both methods, excellent structure was obtained by 4 clusters. Since the values of H1 and H2 are less, clusters will be more homogeneous. So the results show the superiority of clustering based on entropy (SNI) criteria. However, according to the results, it seems there is more homogeneity of clustering with Euclidean distance in terms of geography, but the measure of entropy (SNI) has better performance in terms of variability of nitrate pollution index. To prove the nitrate pollution index effectiveness in clusters with entropy criteria, the removal of nitrate index, the results was influenced by location index. Also, by removing index locations from clustering process it was found that in clusters with Euclidean distance criteria, the influence of nitrate values is much less. Also, compared to Euclidean distance, better performance was obtained by Entropy based on probability occurrence of nitrate values.
Conclusion: Results showed that the best clustering structure will obtain by 4 homogenous clusters. Considering wells distribution and average of the linear-moment, the method based on entropy criteria is superior to the Euclidean distance method. Nitrate variability also played a significant role in identification of homogeneous stations based on entropy. Therefore, we could identify homogenous wells in terms of nitrate pollution index variability based on entropy clustering, which would be an important and effective step in Mashhad aquifer monitoring and evaluation of its quality. Also, in order to evaluate and optimize the monitoring network, it could be emphasized on network optimization necessity and approach selection. Accordingly, less monitoring network clusters lead more homogeneous. Therefore the optimization approach will be justified from increasing to decreasing. In this case the monitoring costs, including drilling, equipment, sampling, maintenance and laboratory analysis, also reduce.

Keywords

1. Akbarzadeh M., Ghahraman B., and Davary, K. 2016. Evaluation of groundwater quality in mashhad aquifer using the indicator kriging based on nitrate pollution. Iranian Journal of Irrigation and Drainage, 1(10):48-62.
2. Anderberg. M. R. 1973. Cluster Analysis for Applications. Academic Press, New York, USA.
3. Bailey, K.D. 1994. Typologies and Taxonomies: An Introduction to Classification Techniques. SAGE Publications, Inc., USA.
4. Burn, D.H. 1990. An appraisal of the “region of influence” approach to flood frequency analysis. Journal of Hydrological Sciences, 35(2):149-165.
5. Burn, D.H. 1990. Evaluation of regional flood frequency analysis with a region of influence approach. Journal of Water Resources Research, 26(10): 2257-2265.
6. Cavadias, G.S., Ouarda, T. B. M. J., Bobee, B. and Girard, C. 2001. A canonical correlation approach to the determination of homogeneous regions for regional flood estimation of ungauged basins. Journal of Hydrological Sciences, 46(4): 499-512.
7. Cover, T.M., and Thomas, J.A. 2006. Elements of Information Theory. 2nd ed., John Wiley & Sons, Inc., New Jersey, USA.
8. Estivill-Castro, V. 2002. Why so many clustering algorithms: A position paper. ACM SIGKDD Explorations Newsletter, 4(1): 65-75.
9. Gan, G., Ma, C., Wu, J. 2007. Data Clustering: Theory, Algorithms and Applications. ASA-SIAM, Philadelphia. USA.
10. Glatfelter, D.R. 1984. Techniques for estimating magnitude and frequency of floods on streams in Indiana. US Geological Survey. Water Resources Investigations Report, 84-4134.
11. Haining, R. J. 1993. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press, Cambridge. UK.
12. Hosking J.R.M., Wallis J.R. 2013. Regional Frequency Analysis (An Approach Based on L-Moments). Translated by: B. Ghahraman. Tanin Ghalam, Mashhad.
13. Hosking, J., Wallis, J. 1993. Some statistics useful in regional frequency analysis. Journal of Water Resources Research, 29(2):271-281.
14. Jingyi, Z. and Hall, M. J. 2004. Regional flood frequency analysis for the Gan-Ming river basin in China. Journal of Hydrology, 296: 98-117.
15. Mao, J., and Jain, A. 1996. A self-organizing network for hyper ellipsoidal clustering (HEC). IEEE Transactions on Neural Networks, 7(1): 16–29.
16. Murphy, K.P. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge.
17. Ngongondo, C.S., Xu, C.Y., Tallaksen, L.M., Alemaw, B., and Chirwa, T. 2011. Regional frequency analysis of rainfall extremes in Southern Malawi using the index rainfall and L-moments approaches. Journal of Stochastic Environmental Research & Risk Assessment, 25:939–955.
18. Rajsekhar, D., Mishra, A., and Singh, V.P. 2011. Drought regionalization of Brazos river using an entropy approach. Proceedings of symposium on data-driven approaches to drought, Purdue University, West Lafayette, Indiana. June 21-22.
19. Rao, A. R. and Srinivas, V. V. 2006. Regionalization of watersheds by hybrid-cluster analysis. Journal of Hydrology, 318(1-4): 37-56.
20. Rianna, M., Ridolfi, E., Lorino, L., Alfonso, L., Montesarchio, V., Di Baldassarre, G., Russo, F., Napolitano, F. 2012. Definition of homogeneous regions through entropy theory. Proceedings of 3rd STAHY International Workshop on Statistical Methods for Hydrology and Water Resources Management. STAHY, Tunis, Tunisia, October 1-2.
21. Rousseeuw, P.J. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20: 53–65.
22. Shahedi M., Sanaiinejad S.H., Ghahraman B. 2013. Regional frequency analysis of annual maximum 1-day and 2-day rainfalls using Clustering and L-moments, case study: Khorasan Razavi province. Journal of Water and Soil, 27(1):80-89.
23. Shannon, C.E. 1948. A mathematical theory of communication. Journal of Bell System Technical, 27: 623-656.
24. Singh, V. P. 1997. The use of entropy in hydrology and water resources. Journal of Hydrological Processes, 11: 587-626.
25. Singh, V.P. 2013. Entropy Theory and its Application in Environmental and Water Engineering. Wiley-Blackwell, John Wiley & Sons, Ltd., UK.
26. Yang, T., Shao, Q., Hao, Z.C., Chen, X., Zhang, Z., Xu, C.Y., and Sun L. 2010. Regional frequency analysis and spatio-temporal pattern characterization of rainfall extremes in the Pearl River Basin, China. Journal of Hydrology, 380:386-405.
CAPTCHA Image