make several joins at the same time

Nico1990 · February 17, 2012, 5:54pm

Hello!

It's my first topic, I hope I am in the right place.

I have a table-1 with 3 columns that contain an index number corresponding to a row in a table-2. I'd like to join the two tables depending on the indexes. I'm not sure to be clear so I add a small schema

table-1:

| row1 | row'1 | row'2 | row'3 |

| row2 | row'4 | row'5 | row'6 |

table-2:

| row1' | a |

| row'2 | b |

| row'3 | c |

Result:

| row1 | row'1 | a | row'2 | b | row'3 | c |

| row2 | row'4 | | row'5 | |row'6 | |

I know I could use the node Join three time, but I'm sure that it does exist an automatic way to do what I want. The other other reason is I could have more columns in the future and I don't want to change my workflow.

Thank you for your help.

Nicolas

richards99 · February 18, 2012, 11:26am

Yes, you could probably do this in an automated way using flow variables.

Take table 1, and use the "Extract Column HEader" node, from the top port you will now have the column names. Now attach the "Transpose" node, now all the column names are listed in one column per row. Now attach a "Table Row to Variable" node, this will effectively take one of the columns at a time.

Now add a Joiner node which connects up to Table 1 at the top port, and Table 2 at the bottom port. Right click on the Joiner node and choose "Show Flow Variables", now attach the "Table Row to Variable" node to the left red blob on the Joiner node. This is now going to be passing each column name into the Joiner node one at a time, but we still need to set this up in the node.

So lets configure the Joiner node, click Add Row. Now go to the flow variables tab, and open "lefttablejoin" dropdown and next to "0" choose Column Header. Now go back to the Joiner tab, and in the Right Table section, choose the column name in Table 2 that the data is to be joined and matched up with. After the Joiner node, add a Loop End node.

This should be it.

Simon.

Nico1990 · February 20, 2012, 10:41am

Thanks a lot for your reply!

I have a problem with the "Loop End" node. Id doesn't work and I don't really know why.

The warning message is: "Errors overwritting node settings with flow variables: unable to parse ".../workspace_knime" (variable "knime.workspace") as boolean expression (setting parameter "addIterationColumn")

I don't understand what it means!

Thanks

Nicolas

richards99 · February 20, 2012, 12:38pm

Hi it sounds like the flow variable has not been setup correctly in the flow tab of the joiner node, make sure in the dropdown next to "s 0" you are choosing "Column Header".

After testing the workflow, I needed to make some modifications, instead of a Loop End node you need a "Loop End Column Append" node. You will also need a column filter node and rowID node and Column rename node inserted to get everything to work.

Attached is an example workflow using your testset.

Simon.

multiplejoins.zip

Nico1990 · February 20, 2012, 3:45pm

what you did is exactly what I want!

So I'm trying to reproduce your workflow but for a reason I can't explain the node "TableRow To Variable Loop Start" doesn't work in the same way.

In your case when you display the node result in the value corresponding to Column Header is column4 so the fourth column, but in my case it's the first column. Moreover it's not possible changing this in the node setting.

Maybe the reason is my table-1 contains heterogenous types of value?

Actually if I copy/past your workflow and just change the two table, it doesn't work anymore...

I use Knime 2.4.1, is there a big difference with the 2.5 version?

richards99 · February 20, 2012, 5:18pm

The TableRow to Variable node takes each column name in turn, so when the workflow is fully run, it will be using the last column name, i.e. column4. If the workflow is only being started to be run (the first loop iteration) it will be column1. The node creates a loop between this tablerow to variable node and the loop end node where it keeps taking each column name one at a time, starting with the left (the first one) and finishing with the right most column.

In KNIME 2.4.1,I'm not sure the Column Rename (Regex) node existed then, so this maybe why it doesnt work when you copy and paste the workflow to KNIME 2.4.1. Other than that, the rest of the nodes should work in the same way.

If you are trying to recreate the workflow yourself, take note that the Column Filter node, Joiner node, and Column Rename Regex node all use flow variables which are seen in the flow variables tab so you can see what is happening.

The workflow should work with different types of tables. So long as the columns being compared are of the same type,i.e. string cells, then it should work fine. so the columns in table 1 is string and the first column in table 2 for matching against is also string, the joiner node can only join up columns of the same datatype. The subsequent data column in table 2 can be anything, string, numerical etc, it shouldnt matter.

Simon.

Nico1990 · February 20, 2012, 5:27pm

Okay. In my case, the workflow is also fully run (I think) but it uses the first column name. Because of this, the join can't be done correctly.

If you want to see, I join the example to compare with yours.

Nicolas

multiple_joins_modif.zip

richards99 · February 20, 2012, 6:03pm

Yes this is because you are trying to match up column 1 in table 1 which is a string cell with column 1 in table 2 which is a Smiles cell. They will need to be of the same datatype for the joiner to work, so for example they will both need to be as string. ALso the Joiner node is using a flow variable for the left match whcih is why it cant be changed, and the right match should be set to the first column of table 2,i.e. the column which is to match up with the table 1 column.

If its the integer columns to be matched, you need to filter out the String column from table 1 first. The workflow you sent is modified to allow this.

Simon.

multiple_joins_modif.zip

Nico1990 · February 20, 2012, 6:12pm

yes I understand but the thing is I don't want to match the columns that don't contain int.

What I still don't understand, is why the "TableRow to Variable Loop Start" acts differently in your example. We have both 4 columns, in your case there are 3 "CurrentIteration" and the column header is "column4". In mine, 0 iteration and sdf_compound as column header which is the first column of my table. It's not possible to modify this manually?

Nicolas

richards99 · February 20, 2012, 6:50pm

Mine says iteration 3 as the workflow has already completed 4 loops, 0,1,2,3. Yours went wrong on the first loop iteration, so its still on 0. You probably need to have a play with some of the example KNIME workflows to understand the flow variables and loops more.

In my example and your example, the tablerow to variable loop start is behaving exactly the same way. Its just not got past the first iteration due to an error.

Which columns do you want to match, it takes the column names one at a time from the output of the transpose node starting at the top which is iteration 0. If there are certain columns you dont want to be used in the matching, then use a column filter node before the transpose node to remove those you dont want matched.

Simon.

richards99 · February 20, 2012, 7:20pm

Okay, attached then is the workflow which takes the first columns for matching (the sdf structure names), but of course in your example, the names are different in both tables so there are no matches for the output. There are also no further iterations possible in this example as table 1 has no other suitable columns for matching for table 2. Is this what you are after?

Simon.

multiple_joins_modif.zip

Nico1990 · February 21, 2012, 9:15am

okay I think now I get it!

Thank a lot for your help

Nicolas