as the title says, I run knime 4.0.1 on windows 10 (intel 16 core with 64 gb ram, knime can use 32gb) and a few times a week, knime just crashes, without any error message or anything.I was looking for some generated error text file, but didn’t find any.
Most of the time it happens when the computer is busy doing other things, like other programs also using cpu power and ram (the momery management of knime is another thing…)
Usually it happens in the worst possible moments…
Does your workflow make use of external tools, e.g., running python, R or anything similar to this? Would it be possible for you to share that workflow with us?
Nothing fancy, just some exploratory analysis.
It usually happens when knime gets really close to maximum RAM usage and doesn’t respond anymore.
Unfortunately I am getting quite used to knime not being able to handle “bigger” amounts of data, where bigger in this case are not even 100 million cells ( on a 5 GHz Machine with 32GB RAM allocated to knime, that should not be a problem at all…R or Python handle that easily with 10 times the speed; sorry to say that so bluntly )
Regarding the log-file: there is none, unless it’s hidden in some really deep subfolders.
Aside from the Console output, the View menu has an “Open KNIME log” menu item, and log files are kept on the filesystem in your workspace directory as .metadata/.log and .metadata/knime/knime.log
Regardless of KNIME internal issues I would recommend to use less nodes and review some logic say do first column filter before row one,combine calculation in one Column Expression node instead of number of Math nodes, avoid sorting in the middle of operations and finally after heavy operations use Cache node to offload memory.
When you say crash, do you mean it freezes, or do you mean that it stops executing and there is no longer an indication of a running process in the task bar at the bottom of your screen, or ?
configure some nodes that do heavy work to store only small amounts of data in RAM and otherwise use the disc
Another thing you could try is to organize your workflow so only one node is executed at the same time thru Flow variables. I think in the current configuration there could be parallel executions.
And maybe if this does not help you could send a larger portion of the relevant LOG file.
With crash I meant that it stops executing and there is no more knime running in the task bar (or task manager).
There are 32 GB of 64 GB of Ram allocated to knime
A clean start helps to “start” with most Ram available, but after a few nodes executed, there is no difference. For any big nodes such as grouping I don’t check “process in memory” anyway (in memory processing will only result in java heap space error)
that workflow was just to look at the data, so not optimized for speed. In those cases I execute just one node at a time by hand
regarding column expressions…I will look into that for some advanced transformations, but when it’s just simple math on 40 columns, the math node is a lot faster in computing time and also configuring.
workflows that actually get deployed look a bit different und don’t use parallel execution unless using parallel chunks on tiny bits of data.
any kind of unique counting I try to avoide be by splitting the relevant columns (group by and the column to count), group with no aggregation (just faster than duplicate row filter), group with count and joining it back…maybe there is a better way, but this works (on the current dataset about 100 times faster than grouping everything at once).
but my “problem” is not that some knime nodes are quite slow when it comes to certain operations, but the fact that knime just crashes - I changed auto save without data to every 5 minutes. That’s probably the best fix for now.
Could you try launching knime from a CMD window (or powershell) in the hopes that it will dump logging to that so that when it crashes we have something more verbose than the crashlog you were able to provide?
I also have issue with column expressions node. It consumes a lot of memory (also when it write tables to disc).
Example:Table with 1.4m rows. I have to recode missing values for two columns(MISSING $abc$ => 0.0
TRUE => $abc$)
If I use 2x Rule Engine, it consumes only 200MB RAM
One Column Expressions consumes 7GB.
I have the same issue also if I use other functions (math, string manipulation) in this node.
By reading your last reply I realize that I probably wasn’t clear enough in the beginning.
It’s not that this workflow always causes knime to crash.
knime crashes with every workflow that needs to handle bigger amounts of data from time to time.
it usually starts with knime not responding anymore while executing some node due to no more ram available - even when I don’t process in memory and store tables on disc.
this not responding anymore also makes it impossible to cancel the running node
No, you were clear… but you’re currently able to provide basically nothing helpful in the way of logging - which is why i suggested what i suggested, in hopes that there would be more logging which might provide some better insight.
Since knime hasn’t crashed in a while, I can’t really provide more information.
I think the crashes happen because knime is in “not responding” mode when the internal Ram usage gehts too big, let’s say you do any kind of unique counting in a grouping (you can’t even abort it, because knime does not respond), if you click anything within knime, it’s gone.
So for now I will close this until I can offer more help.