Calculating differences in columns when some rows have more columns than others

Snowy · June 2, 2020, 7:42pm

Hi!
I’m working with the R Snippet and the “changepoint” package to loop over stores and find change points in demand. Some stores have 1 change point, others have 2, some have a high amount like 7 changes in demand. The output looks like the following:

(The first column is a unique key/id number)

I’m wondering what the best way to calculate the difference in means for the row’s most recent change point (changepoint_meanN) from its first change point (changepoint_mean1). Given that each row has a different most recent change point is where my challenge lies.

I’ve though of using a loop approach where I take each row separately, filter out missing columns (therefore I’m only left with the key/id and its changepoints), then do a calculation… But I haven’t been able to find something like using the max column index and using that in a formula.

My expected I would want something like this:

Thanks for your help!

elsamuel · June 3, 2020, 3:19am

Hi @Snowy

I played around with this a little and came up with something simple. It’s not that elegant, and it might break with a larger dataset, but it might get you started:

I used your idea as a starting point:

Each row is isolated
changepoint mean columns that have missing values are removed
The Column Expressions node has the formula:
column(numberOfColumns()-1)-column(2)
where
- column(2) returns the value of the 3rd column, which is changepoint_mean1
- numberOfColumns()-1 counts the number of columns in the current row, then subtracts 1 to generate a column index for last column in that row
- column(numberOfColumns()-1) returns the value in the last column
Columns are resorted so that the difference column appears last

Final result:

Things to note:

Exclude both the column with the unique key/ID number as well as the 2nd column (change…) from the Missing Value Column Filter
Specify the correct datatype (double floating point number) for the newly-generated column in the Column Expressions node
You’ll need to tell the Loop End node to allow changing table specifications

ipazin · June 3, 2020, 1:59pm

Hi there!

Nice one @elsamuel! Like the use case for Column Expressions node

Alternatively one can use Column Aggregator with method Last on all changepoint columns with missing option not checked and then use new column in calculation.

Br,
Ivan

Snowy · June 3, 2020, 7:24pm

Thanks to you both @ipazin and @elsamuel ! Both solutions work nicely, but I have incorporated ipazin’s solution into my workflow since it avoids a loop. I always forget how handy the Column Aggregator node is.

Snowy

system · June 10, 2020, 7:24pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.