|Articles and journals | Tariffs | Payments | Your profile|
Experimental comparison of clustering algorithms in the problem of lightning data grouping
Abstract.The authors present the results of an experimental comparison of the cluster analysis of thunderstorm data using the algorithms of k-means, dbscan and hierarchical agglomerative algorithms, where closest neighbor, full and medium coupling methods and the Ward method are used to calculate the intercluster distance. The influence of the normalization parameters on the number of clusters determined by the algorithms under consideration on the test sample is estimated. Data on the time of registration and the coordinates of lightning discharges recorded by the World Wide Lightning Location Network (WWLLN) were used for test purposes. The construction of grouping solutions by the chosen clustering algorithms was carried out with the help of the Nbclust, dbscan, and fpc cluster analysis packages developed in the R language. The article showns that the choice of the values of the normalization parameters has a significant effect on the number of clusters allocated from the sample under consideration using hierarchical clustering algorithms (especially for method of the nearest neighbor). The choice of the normalizing parameters has practically no effect or has a negligible effect on the results of lightning cluster clustering using the k-means and dbscan algorithms. The best agreement with expert judgment was obtained for the dbscan algorithm with normalizing parameters corresponding to linear dimensions of a thunderstorm convective cell of 100 km and a period of time of 30 minutes to an hour.
Keywords: cluster validity, hierarchical algorithms, dbscan, k-means, clustering algorithms, data mining, average silhouette width (asw), data normalization, lightning, WWLLN
Article was received:23-01-2018
This article written in Russian. You can find full text of article in Russian here .