Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Diluvian clustering - A fast, effective algorithm for clustering compositional and other data

Published

Author(s)

Nicholas Ritchie

Abstract

Diluvian clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters which are both compact and diffuse and clusters which have both a large number and a small number of members. Diluvian clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a distance metric. By eliminating the dependence on a distance metric, it is possible to derive reasonable clusters for populations with disparate variances like many in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N^2) typical cases are closer to O(N) where N is the number of discrete data points. On a typical 2014 vintage computer, a typical 20,000 particle data set can be clustered in a fraction of a second.
Citation
Microscopy and Microanalysis

Keywords

Data mining, Clustering, Electron probe microanalysis, Particle, Spectrum image

Citation

Ritchie, N. (2015), Diluvian clustering - A fast, effective algorithm for clustering compositional and other data, Microscopy and Microanalysis (Accessed April 17, 2024)
Created August 25, 2015, Updated August 26, 2022