I have a very large workflow with lots of nodes and data. Are there any “best practices” I can utilize to help with my memory usage? All of a sudden I am starting to get the Java Heap error. Cleaning up the amount of rows and columns before that doesn’t seem to make too much of an impact. Other suggestions?
Have you seen the suggestions on this page: Optimizing KNIME workflows for performance | KNIME ?
Yeah. That seems to be focused more on execution speed. I am not too concerned about the length of time it takes to execute. I am more focused on it being able to churn through all the data. (Obviously the knime.ini file is optimized).
For example, do workflows that are not open but have “completed” nodes in them effect the amount of space?
And if not, does knime make it easy for me to split a workflow up into several different workflows that are called, run, and once the data is extracted are closed out. I don’t need to review all the data at all the stages.
I have a collection about optimization. Columnar storage and garbage collection might be a thing (you might want to test them first). Also you could force some nodes to be written too disk.
Also a strategically placed Cache node might help.
Indeed one way could be to split the job into sub workflows
But in the end KNIME is a graphic platform based on Java so more power/RAM is always welcome
Perfect. Thanks. Exactly what I am looking for. Just want to make sure I am using all the nodes most effectively for large datasets and workflows.
Between the Columnar storage, the cache nodes, and the trash nodes everything is working quite well now. Do you think I would benefit from pooling nodes into even more metanodes? If the metanode isn’t open, does Knime hold all the values of the nodes inside it in memory?
Also, does it make sense that I would have a Cache node that would give me the Java Heap Error when writing to disk?
Thanks for the help.
I think with a component (similar to a metanode but with much more options) you could tell it to write to disk, that might save some RAM. Writing to disk of course comes with a price tag. You would need time to process it from memory to disk and maybe back, but it could result in a more stable workflow.
As long as a workflow is open KNIME would keep the data ‘ready’ that could either be in memory or on disk. So if you still have issues it might be an option to split the workflow into sub-workflows (which obviously increases the handling costs thru calls and planning). As a side note: with the KNIME server you could orchestrate a cascade of workflows easily
A Cache node also can either be held in memory (faster) or could be told to write its content to disk. The cache per se would not save memory it would collect all the previous transactions into one place which sometimes could make a workflow more stable and might be placed before you export data to an outside file or you write it into a database.
The columnar storage has some further options to tweak (you might want to be careful playing with that). But a word of advise: I have seen problems with the underlying parquet format and KNIME.
Interesting. I’ve also noticed that I am occasionally getting that Java Heap error inside a particular metanode and outside the node, in the main workflow, memory isn’t even close to being an issue.
So, even more interesting, I noticed that I didn’t have problem with some “more complicated” nodes and had severe lag with “easier” nodes so I did some digging and found that if my RowID was a mess (usually from several Joiners, etc) then it was struggling. I added in some well-placed RowId nodes and now have very little lag throughout the system and do not get the Java Heap error.
Sorry for the trouble here. We actually have a ticket in the system to address the problems introduced by complicated RowIDs (AP-15125). I will add a +1 from you on that ticket.
Perfect. Good to know I am not going crazy.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.