My dataset is based on 183000 columns and 600 rows. This data is imported from 15 different excel thorugh the Excel Reader node. I have increased the RAM memory that the KNIME can support to 14000 MB and it is enough to store all the data I have. Now I would like to implement different models and make further arrangements with data. Now, the problem I have is that it takes so much time to open a dialag node from any node. How can I solve this issue?
Thank you so much in advance!!!
I correct myself. It’s only the Random Forest Learner node dialog (taking into account the nodes I have for the moment in my workflow).
Hi @helfortuny -
It sounds like the problem is connected to how wide your dataset is. 183,000 columns is quite a lot, especially relative to your 600 rows. The Random Forest learner is going to have to load all 183K columns for you to make selections from, hence your problem.
I would suggest that some dimensionality reduction is in order here. What type of dataset are you dealing with? Is this perhaps a Document Vector matrix for text analysis, or something else?
Most of data are audio signals.
Ah, OK, thanks. The fundamental question is: do you really need all 183K features? Not only is it going to affect the configuration of the individual node, but also the overall performance of the model, as well as its interpretability.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.