Union of 2 tables that share several column

Hi,

Sorry if this subject has already been treated, but I can't fin it.

I have 2 tables, they have 2 identical column, the third is different. They don't have the same number of rows. I'd like to make a "union" using the common columns, add the different columns and get all the data in one table.

So I use the "union" node and select the common columns, but it doesn't work as I wish. I think, the easiest way to explain the result is an example.

Table1: 

 

id smile ic50
mol1 c1ccccc1 258
mol2 c1ccncc1 123
mol3 ccocc 987

Table2:

 

id smile origin
mol1 c1ccccc1 extern
mol2 c1ccncc1 intern
mol4 C1CCCCC1 extern
mol5 ch4 intern

Result:

 

id smile ic50 origin
mol1 c1ccccc1 258 extern
mol2 c1ccncc1 123 intern
mol3 ccocc 987 ?
? ? ? extern
? ? ? intern

My expectation:

 

id smile ic50 origin
mol1 c1ccccc1 258 extern
mol2 c1ccncc1 123 intern
mol3 ccocc 987 ?
mol4 C1CCCCC1 ? extern
mol5 ch4 ? intern

I hope it's clear. I don't see any other solution.

Thanks for your help

Nico

2 Likes

Sounds like a problem for the Joiner node, but I am not sure if this helps in this case. I would recommend to have a look into the Reference Column Filter node. This node will give you all column that match or don't match with a so-called reference table. The result need to be "merged" with the original table. How would you merge rows that appear in one but not in the other table?

I would use the Joiner node, and choose to Join by the smiles column (so change it from RowID to smiles)

You choose how you want the behaviour of the join to operate.

If you only want the output to contain a table where the molecules where in both tables, choose Inner Join. If you want all entries in the output, regardless of whether there was a match in both tables or not, choose Full Outer Join.

All other columns will be automatically appended, but if you only want specific columns, go to the "Column Selection" tab in the Joiner node and hand pick the desired columns.

 

Hope this helps,

Simon.

Hi,

Firstly, sorry for my previous post, I have used the table fonction to make my example, the display is very bad. You could see below a better representation of my cases.

Table1 (left table):  

id, smiles, ic50

mol1, c1ccccc1, 258

mol2, c1ccncc1, 123

mol3, ccocc, 987

Table2 (right table):

id, smiles, origin

mol1, c1ccccc1, extern

mol2, c1ccncc1, intern

mol4, C1CCCCC1, extern

mol5, ch4, intern

Result:

id, smiles, ic50, origin

mol1, c1ccccc1, 258, extern

mol2, c1ccncc1, 123, intern

mol3, ccocc, 987, ?

?, ?, ?, extern

?, ?, ?, intern

My expectation:

id, smiles, ic50, origin

mol1, c1ccccc1, 258, extern

mol2, c1ccncc1, 123, intern

mol3, ccocc, 987, ?

mol4, C1CCCCC1, ?, extern

mol5, ch4, ?, intern

Then, I have tried several ways to join my 2 tables (using several joining columns, selecting differents number of columns from the left or the right table...), but each time I lose the data from the right table.

The better thing I get is when I duplicate the columns I want.

Something like this:

id, smiles, ic50, origin, id(*), smiles(*)

mol1, c1ccccc1, 258, extern, mol1, c1ccccc1

mol2, c1ccncc1, 123, intern, mol2, c1ccncc1

mol3, ccocc, 987, ?, mol3, ccocc

?, ?, ?, extern, mol4, C1CCCCC1

?, ?, ?, intern, mol5, ch4

It's not very nice...

 

 

 

 

As far as I can tell simply selecting the full outer join option in the joiner node should give the desired output. Select smiles columns for the join.

Simon.

I tried, but that didn't solve the problem

Did you select the "duplicate" column that you want to include from the second table within the dialog tab "Column Selection"?

Yes the "always include all columns" options are checked for both tables. I have tried the 3 different ways of handling duplicate column and it's not better.

Next try. I guess you are complaining about the join columns which contain missing values (?) in this case no match in the left/right table was found. This is a known limitation of the Joiner and can be fixed using the Column Merger afterwards.

1 Like

Did you un-check the 'filter joined columns' boxes for the left and right tables?

Yest I did

And, is it working using the Column Merger?

You have to do it for each ColumnPair separately.

The "merger column" node solves the problem (I mean it hides the problem of duplication columns).

Thank you to all of you!

Nico

1 Like

What if I have multiple column pairs (i.e. 100)?  Is there a way to use a loop instead of using 100 column merger nodes?

Unfortunately, I don't see any easy way to do that...