MDS for K means clustering

smsali97 · May 10, 2019, 5:25pm

knime_VI3vpWiPKZ

Hi there! I wanted to conduct a K-means clustering for a dataset that im reading via a csv. After the clustering, I wanted to project this into a 2d scatterplot so I used MDS, thing is its taking forever to complete and its constantly saying 'Executing… Caching Row #42217 Start Training… ’ (Coincidentally its the last row, and this shouldnt take that low.
Any idea how am I supposed to do this?

jonfuller · May 14, 2019, 9:54am

Hi there,

My first recommendation would be to increase the amount of RAM available to the Analytics Platform. You can do that by editing the knime.ini: https://www.knime.com/faq#q20

Sounds like the message means that the data has been loaded ready for training. I just tested this using the default dataset of 5400 data points for the Data Generator and noticed that the node uses a reasonable amount of memory (~1.5Gb) - after a few seconds the percentage indicator of the node starts to slowly increase. I then increased the number of data points to 42000 and noted that it took more than a few minutes for the percentage to start increasing.

For comparison, I also tested the scikit-learn implementation and that was also taking a long time for the large dataset. So my guess is that you will need to either wait, or use a smaller dataset (maybe using the row sampling node).

Best,

Jon

system · November 12, 2019, 10:02pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.