Specific rows into columns

C-Rey · February 24, 2022, 5:05pm

Hi everyone,

I have been looking and trying for a while now but I cannot transform the data in the way I am aiming for.

Below is a visualisation of it.
I’d like to point out it would report duplicates (ie. C has two times 1). I’d be happy with “fillers” if columns cannot be of different size, as long as they can be identified and filtered downstream.

Any input, node suggestion or keyword to name this kind of operation would be appreciated.

Thanks!

Best,
Clément

aworker · February 24, 2022, 5:17pm

Hi @C-Rey,

This can be done using a -Pivoting- node:

Please let us know if you need any further help to configure the -Pivoting- node.

Best

Ael

C-Rey · February 24, 2022, 5:50pm

Thanks a lot Ael, it is exactly the operation I was after.
I plugged the node into my workflow and with a bit of work I got it right.

By any chance, do you know if there is a way to skip the grouping and maintain raw values?
In my case I am interested in looking at the spread directly in addition to SD or other metrics.

If not, I surely can craft a unique tag per record and use that to avoid grouping.

Thanks again!

Clément

aworker · February 24, 2022, 5:56pm

Glad it helped.

Sorry I didn’t understand what you mean by that. Could you please show us the final table you would like to obtain please ?

Thanks

Ael

Daniel_Weikert · February 25, 2022, 3:37pm

You can add a “helper column” e.g with the rank node and group on this.
input

flow

output

hope that helps
br

C-Rey · February 25, 2022, 4:10pm

It might be my configuration of the Pivoting node and I need more experience with it.

My understanding of it so far is that eventually it will process values similarly to the "Group By node.
In my case keeping the raw values rather than descriptive stats about them (mean/median + SD/MAD) would be helpful.
ie. For A the pivoted table would report “Grouped value” of A instead of 1 - 2 - 3.

Daniel just below (thanks!) introduced a Rank node which I believe will provide that extra variable to keep all values separated.
There will be a single value for each rank in every column, therefore grouping by Column and Ranking would preserve raw values.

Apologies if my explanation is not crystal clear, I find the overall transformation I am going for a bit peculiar and can’t put precise words on it.
Though the Pivot node is the key, with some workarounds it got me where I needed - thank you!

C-Rey · February 25, 2022, 4:20pm

Thanks Daniel! I did not know this Rank node, it sure is very handy.

Would you mind sharing the configuration you use for the pivoting node?
I am trying to replicate the result but I am struggling to visualise the impact of: groups - pivot - aggregation.

aworker · February 25, 2022, 4:21pm

No problem at all @C-Rey and thanks a lot for clarifying what you wanted to add as information. Indeed, the configuration -Pivoting- node is not obvious. The -Groupby- node is much more intuitive and every time I have the choice among using the -Pivoting- or the -Grouping- nodes, definitely I use the latter.

Please get back in touch if you need further information. Glad to further help.

Best wishes,
Ael

Daniel_Weikert · February 25, 2022, 5:03pm

Hi I currently have KNIME not open but I grouped by the rank (created with the rank node) used the A-B-C Column as pivot fields ( = new columns) and use First as the aggregation type. The ranking was done for each group (A-B-C Column. The group can be specified within the rank node)
hope that helps. Let me know
best regards and take care

C-Rey · February 25, 2022, 5:42pm

Ah yes, I did not see how much control I had over the Ranking node.

I think I have replicated your results, however I feel it might not handle duplicates values very well:

Column C reports a single 7 as equal values will share the same rank within a group.

As a workaround, I have found the RowID node which can inject the RowID into a column (Rank) - thus creating a unique tag for each value:

On larger dataset it will create a massive amount of missing values. It is not the most elegant solution but it puts the data in the exact format I need for the next step. If time allows it I’ll see if I can deal with those but it is not an issue: PRISM is the destination for this data and it can be set to skip empty cells.

Thanks for your help!

Daniel_Weikert · February 27, 2022, 3:44pm

Yes correct,
I first used the Counter Node which achieves the same result and then switched to Rank for this use case.
Glad that you find the right adjustment for your use case
br

system · March 6, 2022, 3:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.