# Calculating differences in columns when some rows have more columns than others

Hi!
I’m working with the R Snippet and the “changepoint” package to loop over stores and find change points in demand. Some stores have 1 change point, others have 2, some have a high amount like 7 changes in demand. The output looks like the following:

(The first column is a unique key/id number)

I’m wondering what the best way to calculate the difference in means for the row’s most recent change point (changepoint_meanN) from its first change point (changepoint_mean1). Given that each row has a different most recent change point is where my challenge lies.

I’ve though of using a loop approach where I take each row separately, filter out missing columns (therefore I’m only left with the key/id and its changepoints), then do a calculation… But I haven’t been able to find something like using the max column index and using that in a formula.

My expected I would want something like this:

Hi @Snowy

I played around with this a little and came up with something simple. It’s not that elegant, and it might break with a larger dataset, but it might get you started:

I used your idea as a starting point:

1. Each row is isolated
2. changepoint mean columns that have missing values are removed
3. The Column Expressions node has the formula:
`column(numberOfColumns()-1)-column(2)`
where
• `column(2)` returns the value of the 3rd column, which is changepoint_mean1
• `numberOfColumns()-1` counts the number of columns in the current row, then subtracts 1 to generate a column index for last column in that row
• `column(numberOfColumns()-1)` returns the value in the last column
4. Columns are resorted so that the difference column appears last

Final result:

Things to note:

1. Exclude both the column with the unique key/ID number as well as the 2nd column (change…) from the Missing Value Column Filter
2. Specify the correct datatype (double floating point number) for the newly-generated column in the Column Expressions node
3. You’ll need to tell the Loop End node to allow changing table specifications
3 Likes

Hi there!

Nice one @elsamuel! Like the use case for Column Expressions node

Alternatively one can use Column Aggregator with method Last on all changepoint columns with missing option not checked and then use new column in calculation.

Br,
Ivan

4 Likes

Thanks to you both @ipazin and @elsamuel ! Both solutions work nicely, but I have incorporated ipazin’s solution into my workflow since it avoids a loop. I always forget how handy the Column Aggregator node is.

Snowy

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.