data set comparison

ahmed_gomaa · September 29, 2019, 12:18pm

How can we apply the following scenario:
data_set_1 has a key column and several columns. data_set_2 has an identical structure of data_set_1. We want to

what common keys among them.
what different keys among them in two direction. data_set_1 - data_set_2 and data_set_2 - data_set_1.
compare between each column in data_set_1 with its mirror in data_set_2.

ahmed_gomaa · September 30, 2019, 9:25am

The first two points have been established.
The last one is remaining.

ipazin · September 30, 2019, 2:53pm

Hi there,

what should be the output of this (last point) comparison?

Br,
Ivan

ahmed_gomaa · October 1, 2019, 9:46am

something like that attached figure but over all the columns.

another thing, I want a node to compare columns which take two inputs.

ipazin · October 2, 2019, 3:47pm

Hi there @ahmed_gomaa,

The best thing to me seems to use Column Expressions node. Prior to that you need to join your tables based on RowID to have all columns you want to compare in one table. Then in Column Expressions node create one rule like this for every two columns you want to compare and result will be true or false. Define your output column name in node dialog to differentiate column pair comparisons.

There is no node to compare columns from two different data sets.

One alternative could be to use Column Comparator node in loop with prior defined column pairs for comparison but his also requires joining data sets…

Br,
Ivan

ahmed_gomaa · October 3, 2019, 9:31am

Thanks. It worked. It is very useful

ahmed_gomaa · October 3, 2019, 9:40am

how to apply that using Column Comparator in loop ?

ahmed_gomaa · October 3, 2019, 12:10pm

Actually, I want to make that automatically instead of specify the columns manually within the column expression node.

ipazin · October 3, 2019, 12:39pm

Hi there,

In Column Expressions node you can also specify column by its index. So column(0) is addressing first column in your table. If you have a fix number of columns you can use that trick.

To use Column Comparator in a loop you can try something like this:

And here is workflow on KNIME Hub. It works if tables have same structures. Take a look and if any questions feel free to ask.

Br,
Ivan

ahmed_gomaa · October 7, 2019, 9:58am

The number of columns may be changed. So, the comparison be something like this

For loop i = 0 to columns_number - 1

Column i == Column i (#1)
Hence, no need to specify the columns whether by name or by index.

ahmed_gomaa · October 9, 2019, 7:56am

it works. many thanks.

ipazin · October 9, 2019, 12:57pm

Glad it helped.
Br,
Ivan

system · October 16, 2019, 12:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.