I am running into trouble with out of memory issues in several KNIME learners. I am loading an 11G file with something like 30 features and two classes. The "disk limited" paradigm seems to work fine reading and splitting the data set. The problems occur when I attempt to train a model. I am required to train on 70% of the data, so it is a large data set. To this point, random forest has done well. NN, SVM, bayes, and regression learners have all had memory issues. -I have KNIME running with 15G of RAM, but it seems to top that out and crash the different learners. -I am running them separately - i.e. not in parallel. -I have also attempted to tell the learners to write to disk always, but that did not change the memory behaviour. -Yes, I have done several google searches. I am evaulating KNIME against commericial offerings that can train and deploy models on this type of large data set. My attempts to this point with KNIME have met with limited success, so I though I'd ask for input. Other than reducing the training size (this is the main point of the test), are there any suggestions? Are any of the learners more efficient? Any thoughts on what would happen if I moved to a machine with more RAM? Thanks! My knime.ini has the following changes from default: -Xmx15000m -XX:MaxPermSize=256m -Dknime.expert.mode=true

out of memory with mining nodes

mlauber71 September 21, 2019, 10:08am 3

Performance issues are difficult to track, so I give you my collection of discussions and tips of what to do about it.

Could you be a little bit more specific about what kind and size of data you are dealing with and especially what kind of model you are trying to use (from my experience the Weka nodes are especially ‘hungry’).

In general if your machine is not powerful enough there is litte choice but to upgrade the machine or reduce the data set. The last part can be done in a meaningful way by dimensionality reduction, just to give a few hints:

remove highly correlated variables (that basically contain the same information)
remove variables with little variance
reduce dimensions thru PCA (principal component analysis) or smth.

KNIME performance

Process 900+ CSV files

3 Likes