I have a Win 7 machine Xeon CPU, 64 Gb of Ram, 4Tb of free disc space (incl. the temp directory)
I end up with a large table (70 million lines, 30 columns).
Whatever I try on it ends up with the GC overhead limit exceeded error and of course the node failing: groupby, row filter, anything…
I tried giving all the ram to knime (up to 64 Gb), same.
I tried switching to use disk only (no memory), same.
I tried chunk parralel, same.
along the way, I get a lot of 'potential deadlock AWT and SWT warnings.
I’m desperate here, I could split the table in two, but because of what I described above, I can’t.
I thought knime could handle big size - though slowing down.
I guess I’d need some more information to debug your problem. So your setup is as follows:
- You’ve set memory to Xmx64G in the knime.ini
- Reading a file results in 70mio rows with 30columns? Which type to the columns have?
- After reading the file you want to directly apply e.g. a Groupby on the entire table or just a subset of it?
- Are there any other chunk loops or similar on your workflow?
What is the minimal workflow to reproduce your problem? Can you attach your log file?
Sorry for the trouble,
Yup, memory up to 64g. (-XmX set in knime.ini)
The columns are mixed strings and doubles
the data has been heavily ‘worked’ (50-ish nodes) before ending up to a big node holding the 70M lines x30 columns and I need to do - for instance - a row filter on it (like filter lines with YEAR = 2015)
No other chunck loops in the flow
the minimal workflow is (node with data) -> (row filter), since this is where it fails each time with GC overhead limit exceeded.
did you try to use the streaming executor to filter the rows / execute your workflow?
Nope, let me try it and I’ll come back to you!
Thanks for your help!