keep half of the matrix

Nico1990 · August 28, 2013, 4:25pm

Hi,

I have a matrix of correlation 100x100. I only want the upper triangle. How can I do that in Knime?

Thanks

Nico

Ergonomist · August 30, 2013, 10:39am

Fiddling with the Unpivot node should do the trick. Unfortunately I cannot give you any clearer guidance, as I dislike the UI logic this node applies and always try to avoid using it. I supposed there's no way of you avoiding it iun this case, though...

Then again, looping over all columns and all rowns in a nested fashion should work, effectively you go walk across the matrix cell by cell parsing value, row and colum, and then dump the duplicates via GroupBy. Not terrbily efficient, but acceptable for a 100 x100 matrix. 1000 x 1000 might be overkill.

- E

Nico1990 · August 30, 2013, 11:50am

I do not see how looping over columns and rows will hep. Could you detail this step please?

Meanwhile, I used the unpivote node and then the column aggregator node with sorted list option on what were before row names and column names. With the Groupby node, I removed the duplicates and finally the pivoting node returned me half of the matrix.

It works with a simple example where column and row names are A, B, C,... but in my real case I use PDB codes and I do not know why I do not get a nice semi-matrix (probably because it is difficult sorting PDB codes). However, even if I can not check, I am pretty sure I removed the duplicates.

Ergonomist · August 30, 2013, 1:20pm

Nico,

Looping would be "Column List Loop Start" followed by "Chunk Loop Start" with chunk size 1, then "Extract column header" and "Row ID", joining that to the value and closing the two loops. But never mind this approach, it allows you to put the entire matrix into a single value column, but the aggregation from there is less trivial than I thought (just gave it a try).

Regarding being complete, I guess the simple answer is to check whether your rowcount equals (n²-n)/2 for any n x n matrix. Or am I missing something?

Cheers

E

Nico1990 · August 30, 2013, 1:47pm

Thank you for these explanations.

Both technics work but the aggregation is faster.

Regarding the formula, if I want to keep the diagonal it becomes (n^2+n)/2, right?

Nico

Ergonomist · September 9, 2013, 3:48pm

Correct, that should be your control when keeping the diagonal.

E