Looking for some advice. My PC is usually a Windows Box with 16gb ram… have had some workflows that have used us the heap and processing slows down. I figured if I run Lubuntu or Archlinux the OS would have a smaller footprint and I could have more RAM available.
Is Knime as stable or more on Linux? will I gain performance from lower overhead from OS (RAM and CPU?)
Is it worth it to setup a dual boot? Anything to be aware of?
KNIME AP runs well on Linux and Windows. If you have a setup that allows you to give it more Memory then this can improve processing times, depending on what kind of nodes you are using and what is the performance bottleneck. Before you set up the dualboot though, have you tried out the columnar table backend? That should give you a significant performance increase, and we would love to hear your feedback on it. https://www.knime.com/blog/improved-performance-with-new-table-backend
Perhaps my workflow is not complex enough or I am bottlenecked by something else. I had 4.3M rows of data but only 20 columns anywhere in the process. Using the regular table and new table back (auto) and manual settings to prioritize performance and increase cache size… yielded virtually the same times (except when I allocated more cache than available RAM ). 11min for all except for the over allocation.
I think my bottleneck is in the Decompose Signal Component inside a Loop… not sure if the backend should help with this.
Here is a temporary link to the workflow (incomplete)… I’ll delete in a week.
Hi @gab1one, Parallel Chunk Start seems very interesting but I’m not sure how to use in my scenario… Currently, I am using a Group Loop Start as I need to process all rows with a certain ID to group for the Time Series by store. Ideally, I would want to process in parallel each group but the Parallel Chunk Start doesn’t have a group function and can only use # of rows. Is there a node that does this? If no, I’d like to add this as a feature request.
How I achieve something similar: I make a list of groups, use the parallel loop chunks, and then I do a reference row filter in the begining of the chunk to get only records specific to a group. This is basically the same as it would be with a parallel group chunking loop.
Thanks so much for the suggestion. If I understand correctly this may not work for me as one of the last steps of the look is a “group by” node to sum and aggregate the calculated values. Now having said that maybe I should have a loop inside a loop so I move the group by to the outer loop? and the inner loop has the parallel loop start… hmm… will have to try it.
Learning by doing is probably the best approach albeit the complications start with setting everything up correctly. Be prepared to require a lot of time just for that. I mean it all depends if the looping performance negtivley affects your work, if you can just run it in the background, not much of an issue.
Also it only makes sense that whatever you are calculating is just basic math or an according library is available in python.