An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Diluvian clustering - A fast, effective algorithm for clustering compositional and other data
Published
Author(s)
Nicholas Ritchie
Abstract
Diluvian clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters which are both compact and diffuse and clusters which have both a large number and a small number of members. Diluvian clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a distance metric. By eliminating the dependence on a distance metric, it is possible to derive reasonable clusters for populations with disparate variances like many in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N^2) typical cases are closer to O(N) where N is the number of discrete data points. On a typical 2014 vintage computer, a typical 20,000 particle data set can be clustered in a fraction of a second.
Citation
Microscopy and Microanalysis
Pub Type
Journals
Keywords
Data mining, Clustering, Electron probe microanalysis, Particle, Spectrum image
Ritchie, N.
(2015),
Diluvian clustering - A fast, effective algorithm for clustering compositional and other data, Microscopy and Microanalysis
(Accessed April 28, 2024)