Comparing two tables => identifying different rows => combining tables

Hi,

 

I would appreciate help to achieve the following workflow:

 

1) I have 2 tables, with different row content but same column structure. I need to compare these tables and identify which rows are different. For instance, the tables will have 90% of same rows, but I need to know which rows are exclusive in each table.

 

2) The step 1 above is to a content check. After that, I need to combine the 2 tables in just 1 table, if possible creating a new column that identify the table of origin (if the register exists in table 1, table 2, or both tables). I tried the ‘concatenate’ node but it joins all rows of both tables, duplicating the content in an undesirable way.

 

Many thanks in advance,

Cadu

Hi Cadu, 

Have a look at the node decriptioni for the Joiner node.  This should do what you need.  Also worth being aware of is the reference row filter.

Regards,

Aaron

Hi Aaron,

 

Thanks for answer.

 

I tried many ways the 'joiner' node but I couldn't achieve success. I would appreciate further discussion.

 

I have two tables, being that they have duplicated two rows (title A, title B) and each table have different rows (title C in TABLE 1 / title D in TABLE 2)

 

TABLE 1

title      author

title A  author A

title B  author B

title C  author C

 

TABLE 2

title      author

title A  author A

title B  author B

title D  author D

 

Outputs I am expecting from Knime (workflow attached):

 

1) Find Duplicates:

title A  author A

title B  author B

 

2) Find Differences:

title C  author C

title D  author D

 

3) Combination of Tables:

title A  author A

title B  author B

title C  author C

title D  author D

 

The first problem I found using the ‘joiner’ node is that the columns are duplicated, but I need them as showed in (3) above.

 

Title                 Author             Title (#1)         Author (#1)

title A              author A         title A              author A

title B              author B          title B              author B

title C              author C          ?                      ?

?                      ?                      title D              author D

 

Many thanks,

Cadu

I am not 100% sure you want this, but here is a sample workflow. It shows how to do this with the title as distinguising feature and also when you want to use both title and author.

Hi Gabor,

 

Many thanks. Your workflow fit totally my need! Concatenate and GroupBy were the missing nodes for me.

 

Based on your workflow I did another one, but instead of using 'create collection column' I used 'rule engine' to create a column in each table which refers to 'source1' or 'source2'. By using 'GroupBy' and aggreagating the new column as 'unique concatenate' I could identify in same table what is unique, intersect, also obtaing the combination of the two tables.

 

This workflow follows attached.

 

Best,

Cadu

Thanks help me a lot!