Two tables comparison with some conditions

Hello KNIMErs,

My task is to compare two tables with hundreds of columns and thousands of rows, however I’m showing dummy data below to keep it simple.
Input tables #1 & #2 have the same structure of columns. Columns cover:

  • Index as unique identifier.
  • Non-Check column with data.
  • Check columns named ‘*Check’. They contain evaluation values ‘0’ (meaning correct entry in appropriate non-Check column) or ‘1’ (meaning incorrect entry in appropriate non-Check column). ‘0’ and ‘1’ are integers (not strings).
  • Check and non-Check columns are ‘paired’. It means thet are named in a similar way: ‘Something’ and ‘Something Check’.

Table #1 is a reference table

Table #1 - Reference table
Index Place Number Place Check Number Check
aaaa ewrr 34636 1 0
bbbb trye 457567 0 0
cccc ryr 32354 1 1
eeee gqewrt 587868 0 0

Table #2 is a table that needs to be compared against reference table

Table #2 - Table to be checked
Index Place Number Place Check Number Check
aaaa wewrr 34636 0 0
bbbb trye 57567 0 1
cccc ryr 32354 1 1
dddd htyhfhh 2335 0 0

The conditions for creating the outcome table are:

Conditions
Check value in Table #1 (reference) Check value in Table #2
(to be checked)
Action
0 0 Filter-out index (row) & don’t show in outcome table
1 0 Filter-out index (row) & don’t show in outcome table
1 1 Filter-out index (row) & don’t show in outcome table
0 1 Keep index (row): show index entry, show 1 ‘*Check’ value, show appropriate non-Check entry & remove other entries

Table #3 is an expected outcome table that contains all entries meeting the above conditions:

Table #3 - Expected outcome table
Index Place Number Place Check Number Check
bbbb 57567 1

I have tried to utilize suggestions found on this forum, especially this workflow Compare Two Tables – KNIME Community Hub to solve my problem however my results are still far from what I expect.
Could you please suggest some solutions I could apply?

Thank you in advance,
Kaz

Hey @Kazimierz

I am not fully sure if I understood your logic correctly. You want to compare rows on an index level, right?

And you only want to check the value in case PlaceCheck#1 is 0 and PlaceCheck#2 is 1 and same for the NumberCheck#1 and NumberCheck#2? If this condition is met you want to compare the actual value and if it is not the same you want to output this row?

I created a small workflow who could do the described logic. It first joins the two tables and afterwards do the check for 0,1 and if that is the case it checks also if the value is the same. It will include 0,1 combinations of ?Check#1 and ?Check#2 as long as the values for the respective col is unequal.
compare_things.knwf (11.0 KB)

1 Like

Hey @laaaarsi
Thank you for your workflow. It works, it’s elegant, and it’s OK for a couple of columns with data.

The thing is that the expression in ‘Rule-based Row Filter’ would be quite long for hundreds of column. Additionally, it would be quite challenging to modify that expression in case of any change in input dataset.

I’ve made some progress in the meantime, thus I feel a bit closer to the solution. However, no working workflow yet.

Happy KNIMEing,
Kaz

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.