I'm getting a bit frustrated. I tend to work with rather deep and wide datasets. I love knime because it doesn't seem to shrug about a 400k row by 55 col dataset. At least, it used to not care.
It seems to happen with a variety of nodes but the biggest culprits I notice are Groupby and Joiner. Almost EVERYTIME I run them, they'll hang with "Potential Deadlock" on both "SWT Display thread" or "AWT Event Queue". Once and awhile is fine, but literally every time this happens now. And sometimes it might take 30min+ to resolve. A workflow that used to take an hour to complete may now take virtually the entire day.
What changed?
I'm using 64bit, i've increased the memory allocation in the settings file to be 8gigs of ram, and my computer is equipped with 16gb (I need the rest for other apps open at the same time).
I've noticed modest speed improvements if I switch to having all tables in memory. But it'll still hang. One trick that helped a bit is before a GroupBy node, I used a Column Filter.. so that if my Groupby only cares about 7 of the 55 columns, I only include those 7 columns and remove the rest before the Groupby. This didn't hang! Nice. Except, I wonder why this isn't something GroupBy would automatically do--I would think that it would be a tremendous optimization for Groupby to REMOVE all the extra columns that aren't going to be considered, before it starts grouping things. Afterall, it removes those columns at the end anyway, why not just remove them at the beginning? Having the column filter paired with a Groupby just feels ugly and means I need to keep both in sync, which I'm willing to deal with but is unfortunate.
As for joining, I can't think of any way of optimizing that. 99% of my joins are Left Joins. I'll typically use it to add columns to a another column category.. so when Col A = Blue, add Col B = Sky.. so the add set might only be 100 rows, but into a data set with 400k rows and 55 col, adding the 56th col.
Again, I'm positive my workflows worked FASTER in older versions. I've never ever seen this deadlock message until version 3.x
Sorry for the book. Just getting frustrated and I'm hoping for some help. Thank you for this amazing tool.
You raise different points in your post and I try to split them here:
- The "Potential Deadlock" message: That most likely has nothing to do with the actual execution of the nodes. The main execution (e.g. groupby, joining, etc.) happens asynchronously and while this may make the application feel more unresponsive it's unlikely that it causes that message to be printed. Can you attach the log file here -- it should contain a stack trace that some of us developers are able to interpret. Please note, this message was not printed in 2.x (there was no code detecting freezes)
- GroupBy paired with Column Filter. I have not noticed any difference when running the attached workflow in 2.12 and 3.5. The data generator takes ~30s (400k x 55) and the two GroupBy nodes take about 5s (or in 2.12 8s). One of the two is prepended with a column filter (as you claim it gets faster then) but I see no difference. Can you break the existing workflow so that we can replicate the problem?
- Joiner: Have you had a look at the "Cell Replacer" node? That is often a good alternative to the Joiner as it assumes that one of the tables (the dictionary table mapping "Blue" to "Sky") is small and fits into memory.
All these individual points aside the question remains why your workflow now takes a day to execute as opposed to an hour as in 2.12 (I would hope that the log file will clarify things here.)
- Bernd
2018-02-21 13:22:48,427 : WARN : pool-1-thread-1 : KNIMEApplication$3 : : : Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug output.
2018-02-21 13:22:48,428 : WARN : pool-2-thread-1 : KNIMEApplication$4 : : : Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug output.
2018-02-21 13:23:19,666 : WARN : pool-2-thread-1 : KNIMEApplication$4 : : : Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug output.
2018-02-21 13:23:21,478 : WARN : pool-1-thread-1 : KNIMEApplication$3 : : : Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug output.
2018-02-21 13:24:04,313 : WARN : pool-2-thread-1 : KNIMEApplication$4 : : : Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug output.
2018-02-21 13:24:04,313 : WARN : pool-1-thread-1 : KNIMEApplication$3 : : : Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug output.
2018-02-21 13:26:35,934 : WARN : pool-1-thread-1 : KNIMEApplication$3 : : : Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug output.
2018-02-21 13:26:35,948 : WARN : pool-2-thread-1 : KNIMEApplication$4 : : : Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug output.
2018-02-21 13:27:55,883 : WARN : KNIME-Worker-1528 : GroupBy : GroupBy : 2:1258:1411:575:547 : No aggregation column defined
2018-02-21 13:27:55,929 : WARN : KNIME-Worker-1528 : GroupBy : GroupBy : 2:1258:1411:575:547 : No aggregation column defined
2018-02-21 13:28:31,662 : WARN : pool-2-thread-1 : KNIMEApplication$4 : : : Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug output.
2018-02-21 13:28:31,664 : WARN : pool-1-thread-1 : KNIMEApplication$3 : : : Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug output.
2018-02-21 13:29:57,195 : WARN : KNIME-Worker-1531 : Joiner : Joiner : 2:1258:1411:575:552 : Memory is low. I have no chance to free memory. This may cause an endless loop.
Is that the log you're looking for? You'll notice it starts with a Mem is Low error, which is weird since I can do the same thing with no other programs open and it'll still do the same thing... Also, I believe all the above deadlocks happened on the same groupby. So it's not once.. it does this.. recovers.. then hangs again.. several times.
I'm going to test out your workflow, and see if I can rig it to replicate the issue better