How does KNIME handle processing a CSV file in a workflow? Does it recognize old information and optimize the loading to process only new information?

gonhaddock · May 18, 2023, 8:33pm

Hello @bremels
I am re-reading your post, and maybe my workflow may not be helpful; in the case that you are working with a re-written file, then it is not the same case than the provided workflow, however some techniques can be used in the same way (hashing / outer join / compare).

If the case (csv without uid column) I would reconsider your data acquisition processes.

So more than KNIME intelligently can recognize differences, it is more about the processing that you want to apply, and how to approach it with KNIME.

Therefore, depending on your data structure, and csv user creativity; a valid method to compare tables would be to concatenate (Column Aggregator) the whole row and hash the concatenated text (in the two csv’s comparing time frames). The only hashing algorithm accesible in KNIME that I have in mind is MD5. You can find it in String Manipulation node. If the complexity of the data requires a more robust hashing (SHA-256…), you would probably need to code it in R/Py.

BR