Work flow is very simple. CSV reader to partition. 80% goes to SMOTE and then onto Random Forest Learner. All data transform was done before KNIME and saved in the CSV.
File is 282MB, 920,000 rows and 50 columns
Estimate after 80% partition and SMOTE is 1.5 million rows.
Random Forest Learners runs up to 10% in 30 seconds and then nothing. 5 minutes later I get the out of memory error, even after telling the node to write to hard drive instead of caching in memory. Looking at Task Manager KNIME runs to about 9GB.
I changed the knime.ini file to 12GB and all the nodes to write to disc, and obviously it’s running a lot slower. SMOTE is looking like it will take an hour to run, and I’m not optimistic the Learner will run.
Do the smote in a separate workflow and store the results. Or try without smote which needs lots of resources. Or if you need balancing try H2O random forest which has balancing opion inlured.