Calculating differences in columns when some rows have more columns than others

Hi!
I’m working with the R Snippet and the “changepoint” package to loop over stores and find change points in demand. Some stores have 1 change point, others have 2, some have a high amount like 7 changes in demand. The output looks like the following:
image
(The first column is a unique key/id number)

I’m wondering what the best way to calculate the difference in means for the row’s most recent change point (changepoint_meanN) from its first change point (changepoint_mean1). Given that each row has a different most recent change point is where my challenge lies.

I’ve though of using a loop approach where I take each row separately, filter out missing columns (therefore I’m only left with the key/id and its changepoints), then do a calculation… But I haven’t been able to find something like using the max column index and using that in a formula.

My expected I would want something like this:

Thanks for your help!

Hi @Snowy

I played around with this a little and came up with something simple. It’s not that elegant, and it might break with a larger dataset, but it might get you started:

I used your idea as a starting point:

  1. Each row is isolated
  2. changepoint mean columns that have missing values are removed
  3. The Column Expressions node has the formula:
    column(numberOfColumns()-1)-column(2)
    where
    • column(2) returns the value of the 3rd column, which is changepoint_mean1
    • numberOfColumns()-1 counts the number of columns in the current row, then subtracts 1 to generate a column index for last column in that row
    • column(numberOfColumns()-1) returns the value in the last column
  4. Columns are resorted so that the difference column appears last

Final result:

Things to note:

  1. Exclude both the column with the unique key/ID number as well as the 2nd column (change…) from the Missing Value Column Filter
  2. Specify the correct datatype (double floating point number) for the newly-generated column in the Column Expressions node
  3. You’ll need to tell the Loop End node to allow changing table specifications
3 Likes

Hi there!

Nice one @elsamuel! Like the use case for Column Expressions node :slight_smile:

Alternatively one can use Column Aggregator with method Last on all changepoint columns with missing option not checked and then use new column in calculation.

Br,
Ivan

4 Likes

Thanks to you both @ipazin and @elsamuel ! Both solutions work nicely, but I have incorporated ipazin’s solution into my workflow since it avoids a loop. I always forget how handy the Column Aggregator node is.

Snowy

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.