Groups of columns to new rows?

guycuthbert · November 15, 2013, 1:12pm

I've searched the forums for this topic, but am new to KNIME and may be expressing the problem poorly...

I have a data file which contains a set of columns A-M which I want to preserve in an output table, plus 40 columns which are 10 groups of 4 e.g.

A | B | C | D ... L | M | N1 | O1 | P1 | Q1 | N2 | O2 | P2 | Q2 ... N10 | O10 | P10 | Q10

I have used a Splitter node to partition the data A-M and then N1-Q10; this works well (although I also want to carry across column A (my key) e.g.

A | N1 | O1 | P1 | Q1 | N2 | O2 | P2 | Q2 ... N10 | O10 | P10 | Q10

I then want to take each row and turn 40 columns x 1 row into 4 columns x 10 rows e.g.

A | N | O | P | Q (1)
A | N | O | P | Q (2)
A | N | O | P | Q (3)
:
A | N | O | P | Q (10)

So I have 2 questions:

Can I duplicate a column (A) so that I can carry it into both output ports of the Splitter?
What's the best way to split 40 columns x 1 row into 4 columns x 10 rows... and add column (A) to each?

My current approach (after the Splitter) is:

Transpose 40 columns x N rows to N columns x 40 rows
Chunk Loop Start to split into 10 groups of 4 rows
- Transpose to 4 columns x N rows
- Rename columns (to remove 1, 2 etc.) ending up with N | O | P | Q
End Loop

The first Transpose is pretty quick (completes in 60-70s) but the Transpose inside the loop achieves 10% in 20 minutes... suggesting someting is very wrong!

I have looked at Column List Loop Start, but can't figure out how to use it for this scenario - it would seem to involved a lot less processing, if I can use it?

I hope that all makes sense?!

Thanks, Guy

Aaron_Hart · November 15, 2013, 6:31pm

Hey Guy,

Can you post some example data before and after transformation? It doesn't need to be real (or 40 columns) but this will give us something to start with.

Regards, Aaron

richards99 · November 15, 2013, 11:10pm

First thing, instead of splitter node, why not have two column filter nodes both coming off the prior node so you can include column A both times?

the next part is more tricky. how about using extract the column headers node, from top port connect transposing node, this generates a column from these names. So the rows of the first column read A, N1, O1, P1 etc. Now use row splitter to separate top row out (containing A). From bottom port containing all the remainder names (N1, O1 etc) connect upto chunk loop start and choose chunk size of 4, now use concatenate node joining from the row splitter earlier and from chunk loop start node, so you now have after this node in the the first column the rows A, N1, O1, P1, Q1 only. Now use rowid node to transfer these row cells into the RowID column. Now transpose this so these are now as column names. Now connect a reference column filter node from these columns and your table from much earlier and choose include setting in the node. You should now have your first set of desired data in the output. Now use column rename regex node, in pattern search string enter ([A-Z]). (including the .) For the replacement text enter $1

This basically gets rid of the number suffix so they can all be merged together after each loop.

Now choose loop end. I hope that will work for you.

simon.