Dataset Version Control

Tjaart · June 5, 2024, 4:40pm

Hi Forum,

I’m trying to find a way to do version control on a set of data that is updated very regularly. So not the workflow in Knime but the data that is imported. For example, within the dataset there are a number of fields and these fields consists of different field types for example Text Fields, Picklists, Date Fields etc. These fields can be changed from time-to-time and I have to evaluate which fields have changed when the data is imported. This is where I need to do version control, in order to determine which fields have been added, deleted, edited etc.

I have tried doing this with a series of joiner nodes where I was able to join the old dataset to the newest dataset and evaluate the differences in the fields.

Is there a better way of doing this?

ipazin · June 6, 2024, 8:18am

Hello @Tjaart,

your approach using Joiner nodes is a valid one. Is it good enough and is there a better one? Well that depends on how fast, maintainable and understandable it is and whether it serves the purpose.

Another approach could be Table Difference Finder in combination with Reference Row Filters/Joiners which will ensure you have same rows in both tables. Check part of my workflow where I do “version control”

Hope this helps!

Br,
Ivan

Tjaart · June 10, 2024, 11:49am

Thank you @ipazin this helps!

system · September 8, 2024, 11:49am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.