Merging of two rows

Mrs_Sippi · September 17, 2015, 2:54pm

Hi,

I created one DB out of several DBs with Knime. Now I have the following problem:

I would like to combine for example these two rows:

?

C14826

12(13)-EpOME

CCPPLLJZDQAOHD-UHFFFAOYSA-N

5356421

LMFA02000038

?

38299

C14826

HMDB04702

C14826

?

C18H32O3

?

CCCCC[C@@H]1O[C@@H]1C\C=C/CCCCCCCC(O)=O

1/C18H32O3/c1-2-3-10-13-16-17(21-16)

12(13)-EpOME

?

C14826

?

So that the whole information is all displayed in one row, like:

C14826

12(13)-EpOME

CCPPLLJZDQAOHD-UHFFFAOYSA-N

5356421

LMFA02000038

C18H32O3

?

CCCCC[C@@H]1O[C@@H]1C\C=C/CCCCCCCC(O)=O

1/C18H32O3/c1-2-3-10-13-16-17(21-16)

12(13)-EpOME

38299

C14826

HMDB04702

Thank you very much for your help!

Geo · September 18, 2015, 12:09am

If you want to handle the problem at the source, you will need to perform an INNER JOIN, LEFT OUTER JOIN or RIGHT OUTER JOIN (Database Joiner node) ON the common identifiers (e.g. the two columns containing value C14826) somewhere in your workflow.

If you only want to "repair" the extracted data, you should use the following workflow:

- Column Splitter to obtain to two tables ;

- apply RowID node on each output table (declaring the C14826 value containing column as rowID) ;

- use Joiner to merge both tables again based on rowIDs.

Note: the RowID step is not necessary but it makes things clearer.

Mrs_Sippi · September 18, 2015, 9:42am

How should I use the column splitter?

Mrs_Sippi · September 18, 2015, 9:55am

The left, right outer join and the inner join only duplicate the rows, which I like to combine...

Iris · September 18, 2015, 2:24pm

Hi, you can also use the group by node for this task.

You need to declare a grouping column (hence identifying rows belonging together) e.g. with the math node and floor($$ROWINDEX$$/2).

now group on this newly generated column. Aggregate all columns with First and without the missing option checked. And maybe check the option Keep original names.

Cheers, Iris

Mrs_Sippi · September 18, 2015, 2:41pm

Perfect would be an "ignore empty" button?! So that the joiner is able to join rows, based on the ID I would like to match, and automatically fill up the empty cells....

Mrs_Sippi · September 18, 2015, 3:00pm

Hi Iris,

your option is great, but unfortunately the "few" rows i like to combine are placed randomly in the table, so when I skip every second row i will loose information. :/

I think, the problem is because of my three matching criterias, which all have to match. Unfortunately the ID in bold (which is one of the three joining criteria) is sometimes listed without fitting to the other criteria, because the cells are empty or the ID is not unique for one compound. Therefore a "ignore empty" button would be perfect. Does something like his exist already?

Geo · September 19, 2015, 2:14am

Actually, I realize that I have wrongly assumed how RowID handles missing values - I'd have expected it to remove them but it does not. So forget the RowID node and replace it with the Missing Value node:

Column Splitter -> apply Missing Value to each partition -> Joiner

Use Column Splitter to put the columns 2 to 5 into partition 1 and the columns 1 and 6 into partition 2. In each partition, you should now have two rows: one row with only missing values, another row with only non-missing values.

Now apply Missing Value node by choosing for each column to remove the rows with only missing values. You will end up with 1 row in partition 1 and 1 row in partition 2.

Use Joiner to re-combine the two non-missing rows by specifying the unique matching criteria - use INNER JOIN. The result will be one row with the original columns 1 or 2 as well as columns 3 to 6.

That is if you want to stick with the Split, Apply & Combine approach ...