Hi,
I am fairly new to KNIME and so I’m sorry if my problem looks/is trivial, but I really can’t find a solution.
After using, successfully, several of the Analytics EXAMPLES, where i would run the example first and then adapt it to my own data files, I got to one where I am getting a “Execute failed: Java heap space” message at the very last node, a Scorer. (I started with the basic “Learning with a Neural Network” example and other than adding a column filter everything else is the same).
I am using KNIME version 4.1.3
My laptop has 24GB of RAM, with a 4 cores CPU - and at setup about 12GB were allocated to the heap space.
I am using a CSV file with 477000 rows and 14 columns - nothing particularly big then.
So, to solve the problem, after some search I did 3 things:
1 - in the nodes I changed the memory policy to write tables to disk
2 - I added the line: -Dknime.table.cache=SMALL to Knime.ini
3 - increased the heap space to about 16GB
(Along the way I restarted KNIME a couple of times, of course)
I am using a filtered /pre-processed data set for Boston_crime as available publicly - the file is about 33MB in size.
About the examples, I followed this path in the KNIME Explorer:
EXAMPLES -> 04_Analytics -> 04_Classification_and_Predictive_modelling-> 02_Example_for_Learning_a_Neural_Network
I am indeed running other software at the same time but at least, for a one-off experiment, I will do as you suggest. But I don’t think this is idea going forward l. I’ve processed the bigger, non-filtered, version of the file in a Jupyter notebook, running Python, for the same NN approach and it worked without a problem.
OK, I tried with the 20GB value and even after closing all apps at the end KNIME just crashed out without a warning. With 19GB I got the same error as before.
So, … no.
Can you share your current version of the workflow? It would be useful to see what your target and features are, along with what you’re filtering. Then I can try to reproduce the problem.
Thank you for sharing the workflow and data with us! I can confirm the Java heap space fills up during the execution of the Scorer node. Your input data contains the class column as Number (Integer) values, which makes the RProp MLP Learner learn a regression model instead of a classification model. Therefore, the Prediction column in the output of the MultiLayerPerceptron Predictor contains >100K classes, which the Scorer tries to compute the confusion matrix and fails…
Changing the type of your class column (and computing the domain of it) forces classification mode and stops the Scorer from eating up all the available memory. Here’s how that would look like:
You are right - the problem was the fact that the Class was indeed a number (although it is just a code, but a number nonetheless).
So, the whole thing worked - many thanks!
In general terms I am still a bit puzzled by the need of the Domain Calculator, but that is because I never saw that “concept” applied to my previous tools (mostly around the Python eco-system).
Is this something specific to how KNIME works?