How to order multiple cells in a row by size

Ulla · July 25, 2016, 12:19pm

Hello there!

As a relatively new user of KNIME, I am not sure whether my problem requires a large rule-engine or just a simple node that I don't know of. So here it goes:

What I currently have:

A table in which each row has multiple columns, e.g.:

ID	a	b	c	d	e	f
234	2	4	6	3	14	12
...

What I would like to have:

An order of the values in columns a to f for each ID, so that the highest value is first place, the second highest second place, etc.

For example:

(I used P(x) to signify that it contains the Position of the corresponding column for each row.)

ID	a	b	c	d	e	f	P(a)	P(b)	Pc)	P(d)	P(e)	P(f)
234	2	4	6	3	14	12	6	4	3	5	1	2
...

I could find position 1 and position 7 by using the Max() and the Min() function in the Math Node and then a Rule-Engine to match the max- or min-values to the corresponding cells.

But what about the other positions. Do you have any ideas?

I am looking forward to your answers! (This is my very first KNIME post! :))

All the best,

Ulla.

marco_ghislanzoni · July 25, 2016, 2:21pm

Hi Ulla,

welcome to KNIME!

The core task you want to perform can be achieved with the Rank node. Ranking the vector a ... f in each line in descending order will give you exactly the P(a) ... P(f) sequence you are looking for.

The only caveat is that the Rank node works on columns, while the data you want to rank are organized into rows. This means you need to work on a row by row basis, transpose your data to turn them into a column, rank them, re-transpose and append the ranking results to the original row.

Doing so requires a basic loop (Chunk Loop), but it is not too difficult per se.

I would suggest you to try assembling your workflow following the indications above and post it here if you get stuck.

Last hint: you may face a couple of challenges linked to the way the Transpose node names the columns after having transposed a line and the way it handles the original ID column. Using the Unpivoting and Pivoting nodes instead may lead to a more compact and intuitive workflow. Your call.

Cheers,
Marco.

Ulla · July 27, 2016, 2:19pm

Hi Marco,

thank you very much! I will try the versions you suggested and report back next week!

I'm excited!

Cheers,

Ulla.

Geo · July 27, 2016, 11:07pm

An alternative is to use Unpivot, then Sorter (by Varname and then by value), and finally use Pivot to get back to the original format. After Pivot, you'll probably want to use Rename (regex) to get rid of the added suffix.

Ulla · August 2, 2016, 1:02pm

Hey guys,

thank you very much for your suggestions.

In my version of the KNIME the Rank Node is not yet available, but I will definitely try this solution when my SysAdmin will have updated the program.

For the meantime, I used Geo's suggestion:

Unpivoting > Sorter > Pivoting > Column rename (regex)

But that did not leave me with a rank number after Pivoting, so I inserted:

Unpivoting > Sorter > Math Formula > Rule Engine > Pivoting > Column rename (regex)

In the Math Formula node, I calculate the modulo of each rowindex divided by six (because I have six columns "a" to "f"). Equivalent to counting 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 for all the rows.

Then, in the Rule Engine, I assigned the rank numbers according to the modulo. So every sixth row (which after sorting is always the row per ID with the highest value) receives rank "1", every fifth row receives rank "2" and so one.

Worked like a charm!

Thanks a lot guys!