MDS for K means clustering


Hi there! I wanted to conduct a K-means clustering for a dataset that im reading via a csv. After the clustering, I wanted to project this into a 2d scatterplot so I used MDS, thing is its taking forever to complete and its constantly saying 'Executing… Caching Row #42217 Start Training… ’ (Coincidentally its the last row, and this shouldnt take that low.
Any idea how am I supposed to do this?

Hi there,

My first recommendation would be to increase the amount of RAM available to the Analytics Platform. You can do that by editing the knime.ini:

Sounds like the message means that the data has been loaded ready for training. I just tested this using the default dataset of 5400 data points for the Data Generator and noticed that the node uses a reasonable amount of memory (~1.5Gb) - after a few seconds the percentage indicator of the node starts to slowly increase. I then increased the number of data points to 42000 and noted that it took more than a few minutes for the percentage to start increasing.

For comparison, I also tested the scikit-learn implementation and that was also taking a long time for the large dataset. So my guess is that you will need to either wait, or use a smaller dataset (maybe using the row sampling node).



1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.