I have a workflow, reasonably large so far but not as big as some of the sample workflows I've seen.
Most everything has been working fine, if a little slowly, and so I recently ported it from a small laptop onto a larger and significantly more powerful machine. I did this using the "export workflow" and "import workflow" options.
The majority of the workflow executed much faster on the new machine. Except for the Scorer node I am using as part of a simple a Partition -> Linear Regression Learner -> Regression Predictor -> Scorer subflow.
For this node, every time I run it it gets to 50% executed, and then Knime freezes for a little while, and then returns the error "Execute failed: GC overhead limit exceeded".
This is a confusing error for me since I was using linear regressions on basically the same data on my older, less powerful machine and running the Scorer node successfully.
I have already increased the memory in Knime as such:
-XX:MaxPermSize=1024m
-Xmx2048m
However this has not solved the problem. I should note that on my previous machine, I did not need to increase the memory and was able to run the Scorer node, and all nodes, without the error. Sometimes it would take quite long, but it would eventually execute rather than error out.
I am at a loss for what to do, and have tried to search for the rest of the forum for a solution other than increased memory, which has not worked yet.
The max perm size is probably too much, the original size is probably sufficient.
In case the more powerful machine is a 64 bit version and the less powerful is 32 bit, that might explain the requirement for a bit more memory. I would suggest switching the permgen setting to the default and keep the 2048m max memory. If this does not work, increase it further. (Though Scorer should not be very memory-hungry.)
Thanks a bunch, I will try your solutions. You are right that I've gone from 32 -> 64.
I think I figured out another aspect of my issue - I was using a Scorer node against a linear regression. While the scorer node against a classifier might not be that memory intensive, it makes sense in my brain that running the scorer node against a regression could be very heavy. Is that right?
If you are doing regression you should use the "Numeric Scorer" instead. The "Scorer" is for nominal which a limited number of classes only. In the case of regression almost every row will have its own unique value and if you have many rows it's no big suprise that the node needs large amounts of memory because it builds statistics for every distinct value.
Were you able to resolve the issue with the Scorer node. I am kind of facing the same problem with it (50% and then the application freezes)....I am trying to run a simple naive bayesian model on a dataset with 48K instances.
a NB learner followed by a NB predictor and a scorer....
please share any tips or tricks that you may have for me....already tried changing the knime.ini file as well as the enabling the "write to disk" option in the memory tab of the scorer configuration...