Compute for column values based on conditional substrings in the column names

I have a dataframe with hundreads of columns.
Just for example purposes I’m going to present a toy dataframe.

TPT_A_2 | TPT_B_2 | TPT_C_2 | TPT_A_4 | TPT_B_4 | TPT_C_4 | TPT_A_6 | TPT_B_6 | TPT_C_6 | 
 100        100       100       200       200      200       400       400        400   

I want to compute the mean for those variables with the same initial substrings as name (TPT_A, TPT_B…) that end with 2 and 4.
So I would get something like:

TPT_A_mean | TPT_B_mean | TPT_C_mean | TPT_A_6 | TPT_B_6 | TPT_C_6 | 
  150           150          150         400      400        400  

In R you would get this data like this:

row1 <- c("TPT_A_2", "TPT_B_2", "TPT_C_2","TPT_A_4", "TPT_B_4", "TPT_C_4", "TPT_A_6", "TPT_B_6", "TPT_C_6")
row2 <- c(100, 100, 100, 200, 200, 200, 400, 40, 400)   
data <- as.data.frame(rbind(row1, row2))
colnames(data) <- as.character(data[1,])
data <- data[-1,]
1 Like

Hi @RoyBatty296,

at first I had something in mind that could be descriped as grouped unpivoting, but since the example dataset has only one row, my solution is a lot simpler.

  1. transpose the table
  2. split into mean rows and kind rows
  3. calc mean for each group
  4. reunite rows
  5. transpose back


grouped_pivot.knwf (35.1 KB)

3 Likes

Hi @Thyme

First, thank you very much for your answer.
Could you please provide some insights on the grouped unpivoting solution? As I said, this was a toy example, and the real datas has many rows.

It’s funny that you ask, after posting I had this idea on how to generalise my solution and I went to the “kitchen” right away. Here it is:

  1. split into mean and kind columns
  2. unpivot everything into a single column, no extras
  3. calc mean for each group
  4. transpose
  5. reunite columns (resulting table has empty rows)

This is not what I had in mind, however. Story thyme:
While reading your question, I had this spark in my head: Group Loop Start. Except that it groups rows, not columns. I would have to emulate that behaviour, so the obvious way was to extract the column names, boil it down to the groups, and use a Table Row to Variable Loop Start instead.
Then I could filter the table to one group with that variable.
Not sure how I’d have continued then. I wasn’t even aware of the postfix filter or the 1-row shortcut at that moment.
It might be useful though. Now I know the logical thing would be the unpivot node, but that guy doesn’t even need the Group Columns Loop mechanism.


grouped_unpivot.knwf (67.9 KB)

2 Likes

For people in the future, who want to know how my “Column Group Loop Start” looks like, I actually had to implement one in another question:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.