my dataset has 3000 documents with approx. 7000 features. For classification I used the Weka SMO (3.7) Node and the Weka Predictor. The Model creates fine but when I want to run the Predictor it always fails with the following error message:
ERROR Weka Predictor (3.7) 0:536 Internal error: Could not load settings
ERROR Weka Predictor (3.7) 0:536 Unable to clone input data at port 1 (Weka model): null
When I reduce the feature space by removing the unigrams it runs through. Are 7000 features just too many? I tried it on KNIME 2.11.2, 2.12.1 and 3.0.0.
Thanks in advance!
I tested it on my computer as well and it worked 'fine' with 3000 row X 7000 columns. Hence, simply increasing the memory you allow KNIME to use might help (https://tech.knime.org/faq#q4_2).
However, the actual reason why the Weka Predictor consumes that much memory is that it sometimes performs an evaluation of the model and therefor copies all the data into memory, every time. Not very nice.
With the next release, 3.1, in December this problem will be fixed and an option to perform this evaluation or not will be available then. Running the Weka Predictor without evalution will then speed up the prediction considerably and consume way less memory.