I today stumbled upon an interesting stuff happening in Knime.
So I have a workflow which reads a csv file, processes data, joins it to data read from Knime table file, writes result back to same Knime table file. And again, one time each week.
Somehow, this particular dataset, when further processed, started to get very slow, and was taking up awfully lot of space on HDD, 160MB for 44K records (alternative dataset was 7MB on HDD). The only odd thing I noticed was very long rowids in the table, so, following a wild guess, I tried resetting rowids (applying RowID node with default settings) before writing data to Knime table file. And, whoa, table was fast again, and taking up only 4MB on HDD.
Any thoughts? Bug or feature?
It’s the joiner node which creates from the rowids of both joined tables a new called rowid1_rowid2. This makes it possible to trace back what the original rows were. I would call this a feature.
You decided to include the rowids in the csv (it can be switched off through the “Write row ID” tickmark) and one week later the join adds “a third rowid” to this new rowid. Which you again write down in the csv …
So it is more like a combination of features which slowly ate up your HDD
Uhm, CSV is only the new data, what I write to HDD is a Knime table - with all the bells and whistles.
Okay, so I understood that the overhead is caused by the rows followup mapping hidden behind the combined rowids. Thanks for info.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.