data set comparison

How can we apply the following scenario:
data_set_1 has a key column and several columns. data_set_2 has an identical structure of data_set_1. We want to

  • what common keys among them.
  • what different keys among them in two direction. data_set_1 - data_set_2 and data_set_2 - data_set_1.
  • compare between each column in data_set_1 with its mirror in data_set_2.

The first two points have been established.
The last one is remaining.

1 Like

Hi there,

what should be the output of this (last point) comparison?

Br,
Ivan

1 Like

something like that attached figure but over all the columns.

another thing, I want a node to compare columns which take two inputs.

Hi there @ahmed_gomaa,

The best thing to me seems to use Column Expressions node. Prior to that you need to join your tables based on RowID to have all columns you want to compare in one table. Then in Column Expressions node create one rule like this for every two columns you want to compare and result will be true or false. Define your output column name in node dialog to differentiate column pair comparisons.

There is no node to compare columns from two different data sets.

One alternative could be to use Column Comparator node in loop with prior defined column pairs for comparison but his also requires joining data sets…

Br,
Ivan

3 Likes

Thanks. It worked. It is very useful

1 Like

how to apply that using Column Comparator in loop ?

Actually, I want to make that automatically instead of specify the columns manually within the column expression node.

Hi there,

In Column Expressions node you can also specify column by its index. So column(0) is addressing first column in your table. If you have a fix number of columns you can use that trick.

To use Column Comparator in a loop you can try something like this:

And here is workflow on KNIME Hub. It works if tables have same structures. Take a look and if any questions feel free to ask.

Br,
Ivan

2 Likes

The number of columns may be changed. So, the comparison be something like this

For loop i = 0 to columns_number - 1

Column i == Column i (#1)
Hence, no need to specify the columns whether by name or by index.

it works. many thanks.

1 Like

Glad it helped.
Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.