Original data vs workflow output

Hi all,

I have several financial spread sheets that i run through a a workflow and standardize and concatenate etc., including row filter etc. Is there a way to check that final output contains all the rows (with exception of filtered) ? i.e i want to know that none of the originals rows have been lost because of some of the nodes actions?

Is there a node/ loop that will tell me all rows that have been removed? A reconciliation all the data is there?

@JanHazel welcome to the KNIME forum.

You could use a reference row filter from your result to the original data and of all rows are removed you have a test that they are all still there.

1 Like

Thank you mlauber71. I am maybe not as technically efficient as i am not from a programming background. I realised it may be easier if i put a picture of my workflow on screen:

Essentially, what i am trying to do, is check that every row i have on the original 6 reports picked up in excel reader (the original raw data) is then accounted for, either in my final concatenated table or via row filter, and that no rows dropped off inadvertedly. Hopefully this helps?

1 Like

If you want to just see the number of rows, then you can open the output port views of your first and penultimate nodes to compare the number of rows at the beginning and end of that specific branch.

If you want something more complex, then maybe the Table Difference Finder node might work.

1 Like

Hello @JanHazel,

I see your point and there are no dedicated node(s) for this kind of check in KNIME. And thinking about it doesn’t seem easy to have it as it highly depends on data. I would suggest to build comparison on your own. For this to work you would need to have or build a unique identifier for each row from your input files. Upon processing of your data you shouldn’t drop those identifiers as you want to compare them vs original ones. The comparison itself can be done using Joiner (Labs) node or above mentioned Table Difference Finder.

Welcome to KNIME Community! Hope this helps and in case of any questions feel free to ask!

Br,
Ivan

1 Like

@JanHazel what I then would recommend is as @ipazin said create a unique identifier forf each line using maybe the filename and the RowID and then check your final results against a reference table of all these IDs - maybe with the Reference Row Filter.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.