Merging of two rows

Hi,

I created one DB out of several DBs with Knime. Now I have the following problem: 

I would like to combine for example these two rows:

? C14826 12(13)-EpOME CCPPLLJZDQAOHD-UHFFFAOYSA-N 5356421 LMFA02000038 ? ? ? ? ? 38299 C14826 HMDB04702

C14826

? ? ? ? ? C18H32O3 ? CCCCC[C@@H]1O[C@@H]1C\C=C/CCCCCCCC(O)=O 1/C18H32O3/c1-2-3-10-13-16-17(21-16) 12(13)-EpOME ? C14826 ?

  So that the whole information is all displayed in one row, like:

 

C14826 C14826 12(13)-EpOME CCPPLLJZDQAOHD-UHFFFAOYSA-N 5356421 LMFA02000038 C18H32O3 ? CCCCC[C@@H]1O[C@@H]1C\C=C/CCCCCCCC(O)=O 1/C18H32O3/c1-2-3-10-13-16-17(21-16) 12(13)-EpOME 38299 C14826 HMDB04702

 

Thank you very much for your help! 

If you want to handle the problem at the source, you will need to perform an INNER JOIN, LEFT OUTER JOIN or RIGHT OUTER JOIN (Database Joiner node) ON the common identifiers (e.g. the two columns containing value C14826) somewhere in your workflow.

If you only want to "repair" the extracted data, you should use the following workflow:

- Column Splitter to obtain to two tables ;

- apply RowID node on each output table (declaring the C14826 value containing column as rowID) ;

- use Joiner to merge both tables again based on rowIDs.

Note: the RowID step is not necessary but it makes things clearer.

 

How should I use the column splitter? 

The left, right outer join and the inner join only duplicate the rows, which I like to combine...

Hi, you can also use the group by node for this task.

You need to declare a grouping column (hence identifying rows belonging together) e.g. with the math node and floor($$ROWINDEX$$/2).

now group on this newly generated column. Aggregate all columns with First and without the missing option checked. And maybe check the option Keep original names.

Cheers, Iris

1 Like

Perfect would be an "ignore empty" button?! So that the joiner is able to join rows, based on the ID I would like to match, and automatically fill up the empty cells....

Hi Iris,

your option is great, but unfortunately the "few" rows i like to combine are placed randomly in the table, so when I skip every second row i will loose information. :/

I think, the problem is because of my three matching criterias, which all have to match. Unfortunately the ID in bold (which is one of the three joining criteria) is sometimes listed without fitting to the other criteria, because the cells are empty or the ID is not unique for one compound. Therefore a "ignore empty" button would be perfect. Does something like his exist already?

Actually, I realize that I have wrongly assumed how RowID handles missing values - I'd have expected it to remove them but it does not. So forget the RowID node and replace it with the Missing Value node:

Column Splitter -> apply Missing Value to each partition -> Joiner

Use Column Splitter to put the columns 2 to 5 into partition 1 and the columns 1 and 6 into partition 2. In each partition, you should now have two rows: one row with only missing values, another row with only non-missing values.

Now apply Missing Value node by choosing for each column to remove the rows with only missing values. You will end up with 1 row in partition 1 and 1 row in partition 2.

Use Joiner to re-combine the two non-missing rows by specifying the unique matching criteria - use INNER JOIN. The result will be one row with the original columns 1 or 2 as well as columns 3 to 6.

That is if you want to stick with the Split, Apply & Combine approach ...