Big data clustering

Hi,

Are there any Knime nodes that implement clustering on big data: 1 to 30 million rows. (with perhaps 100 - 200 attributes). Both unsupervised and supervised methods would be of interest.

Obviously, any distance matrix based methods probably aren't going to work ever with data cashed to disk.

Cheers,

Steve.

Hi Steve,

with the upcoming version KNIME 2.10 (Release today :)) provides a new distance measurement framework (see [1] for details). One of the new feature is that some cluster nodes like the k-Medoid or the Hierarchical Clustering do not mandatory a pre-computed distance matrix any more, so disk space should not be the limit - but the runtime of that algorithms is partial cubic. We are also working on a DBScan node using the new features.

Regards,

Marcel.

[1] http://tech.knime.org/wiki/distance-measure

As soon as I can download KNIME 2.10, I'll give it a go.

Thanks,

Steve.