I needed of a spatial clustering algorithm that could cope with extremely large data
sets. I wasn't able to find an existing algorithm that suited my needs so I developed a
new one. It is called GVM for *Greedy Variance Minimization*.

The GVM algorithm has the following characteristics:

- The algorithm places no restrictions on the number of items (N), dimensions (D) and clusters (C).
- The algorithm scales linearly with the number items clustered. Specifically the worst case time complexity of the algorithm is N⋅D⋅C log C;
- Memory usage is constant and independent of the number of items clustered.
Specifically, the space complexity is C
^{2}+ D⋅C.

See the navigation for information about the algorithm and its implementation.

- There is an open source Java implementation, written by me.
- Craig de Stigter has also produced a Python implementation called pyGVM.

I'm currently using JProfiler to investigate possible optimizations.