عنوان مقاله [English]
Introduction: For water resources monitoring, Evaluation of groundwater quality obtained via detailed analysis of pollution data. The most fundamental analysis is to identify the exact measurement of dangerous zones and homogenous station identification in terms of pollution. In case of quality evaluation, the monitoring improvement could be achieved via identifying homogenous wells in terms of pollution. Presenting a method for clustering is essential in large amounts of quality data for aquifer monitoring and quality evaluation, including identification of homogeneous stations of monitoring network and their clustering based on pollution. In this study, with the purpose of Mashhad aquifer quality evaluation, clustering have been studied based on Euclidean distance and Entropy criteria. Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). SNI as a combined entropy measure for clustering calculated from dividing mutual information of two values (pollution index values) to the joint entropy. These measures apply as similar distance criteria for monitoring stations clustering.
Materials and Methods: First, nitrate data (as pollution index) and electrical conductivity (EC) (as covariate) collected from the related locational situation of 287 wells in statistical period 2002 to 2011. Having identified the outlying data and estimating non-observed points by spatial-temporal Kriging method and then standardizes them, the clustering process was carried out. A similar distance of wells calculated through a clustering process based on Euclidean distance and Entropy (SNI) criteria. This difference explained by characteristics such as the location of wells (longitude & latitude) and the pollution index (nitrate). Having obtained a similar distance of each well to others, the hierarchical clustering was used. After calculating the distance matrix, clustering of 287 monitoring stations (wells) was conducted. The optimal number of clusters was proposed. Finally, in order to compare methods, the validation criteria of homogeneity (linear-moment) were used. The research process, including spatial-temporal Kriging, clustering, silhouette score and homogeneity test was performed using R software (version 3.1.2). R is a programming language and software environment for statistical computing and graphics supported by R foundation for statistical computing.
Results and Discussion: Considering 4 clusters, the silhouette score for Euclidean distance criteria was obtained 0.989 and for entropy (SNI) was 0.746. In both methods, excellent structure was obtained by 4 clusters. Since the values of H1 and H2 are less, clusters will be more homogeneous. So the results show the superiority of clustering based on entropy (SNI) criteria. However, according to the results, it seems there is more homogeneity of clustering with Euclidean distance in terms of geography, but the measure of entropy (SNI) has better performance in terms of variability of nitrate pollution index. To prove the nitrate pollution index effectiveness in clusters with entropy criteria, the removal of nitrate index, the results was influenced by location index. Also, by removing index locations from clustering process it was found that in clusters with Euclidean distance criteria, the influence of nitrate values is much less. Also, compared to Euclidean distance, better performance was obtained by Entropy based on probability occurrence of nitrate values.
Conclusion: Results showed that the best clustering structure will obtain by 4 homogenous clusters. Considering wells distribution and average of the linear-moment, the method based on entropy criteria is superior to the Euclidean distance method. Nitrate variability also played a significant role in identification of homogeneous stations based on entropy. Therefore, we could identify homogenous wells in terms of nitrate pollution index variability based on entropy clustering, which would be an important and effective step in Mashhad aquifer monitoring and evaluation of its quality. Also, in order to evaluate and optimize the monitoring network, it could be emphasized on network optimization necessity and approach selection. Accordingly, less monitoring network clusters lead more homogeneous. Therefore the optimization approach will be justified from increasing to decreasing. In this case the monitoring costs, including drilling, equipment, sampling, maintenance and laboratory analysis, also reduce.